Patterns for reshape in R
I have a dataframe that I want to reshape; my reshape code:
matchedlong <- reshape(matched, direction = 'long',
varying = c(29:33, 36:3943),
v.names = c("Math34", "TFCIn"),
times = 2006:2009, idvar = "schoolnum")
in matched
columns 36 to 39 are logical (TRUE
FALSE
) but in matchedlong
they have turned into numbers somehow .... No clear pattern to the numbers.
what is causing this?
Sample data:
example.data <- structure(list(Grade_Range_2008 = structure(c(14L, 14L, 40L,
40L, 36L, 13L), .Label = c("3-5, UE", "4-5, UE", "4-8, UE, US",
"5-10, UE, US", "5-8, 10, UE, US", "5-8, UE, US", "5-9, UE, US",
"6-11, US", "6-12, UE, US", "6-7, UE, US", "6-8, 10, UE, US",
"6-8, UE", "6-8, UE, US", "6-9, UE, US", "6, UE", "7-10, US",
"7-8, US", "8-Jun", "8-May", "K-3", "K-3, UE", "K-4, UE", "K-5",
"K-5, UE", "K-6, UE", "K-8", "K-8, UE", "K-8, UE, US", "K, 2-5, UE",
"N/A", "PK-3, UE", "PK-4, UE", "PK-5, 10, UE", "PK-5, 7-9, UE, US",
"PK-5, 8, UE", "PK-5, UE", "PK-6, 10, UE", "PK-6, UE", "PK-8, UE",
"PK-8, UE, US"), class = "factor"), X__of_Yrs_in_school = c(0L,
0L, 0L, 0L, 0L, 0L), Total_Enrollment_2008 = c(348L, 444L, 636L,
495L, 319L, 410L), Free_Lunch_pct_2008 = c(75L, 89L, 94L, 89L,
89L, 91L), Reduced_Lunch_pct_2008 = c(6L, 6L, 3L, 4L, 5L, 4L),
Stability_pct_2008 = c(89L, 93L, 100L, 98L, 92L, 81L),
Limited_Eng__Prof__pct_2008 = c(8L,
20L, 8L, 10L, 19L, 19L), Am__Ind_pct_2008 = c(1L, 2L, 0L,
2L, 0L, 2L), Black_pct_2008 = c(41L, 39L, 28L, 33L, 32L,
38L), Hispanic_pct_2008 = c(55L, 59L, 70L, 61L, 65L, 57L),
Asian_pct_2008 = c(2L, 1L, 0L, 2L, 1L, 1L), White_pct_2008 = c(2L,
0L, 1L, 2L, 1L, 2L), Multi_pct_2008 = c(0L, 0L, 0L, 0L, 0L,
0L), w_o_Valid_Cert__N_2008 = c(4L, 0L, 1L, 0L, 1L, 1L),
w_o_Valid_Cert__pct_2008 = c(11L, 0L, 2L, 0L, 3L, 5L),
Teaching_Out_of_Certification_N_ = c(7L,
7L, 2L, 13L, 3L, 4L), Teaching_Out_of_Certification_pc = c(20L,
15L, 4L, 25L, 9L, 18L), X_3_yrs__Exp_N_2008 = c(12L, 13L,
5L, 12L, 5L, 5L), X_3_yrs__Exp_pct_2008 = c(34L, 28L, 11L,
24L, 15L, 23L), Masters_Plus_N_2008 = c(6L, 11L, 15L, 10L,
16L, 8L), Masters_Plus___2008 = c(17L, 23L, 32L, 20L, 47L,
36L), Core_Classes_N_2008 = c(78L, 142L, 49L, 91L, 22L, 49L
), Core_Not_Taught_by_HQ_Teachers_p = c(23L, 6L, 2L, 24L,
9L, 20L), Number_of_Classes_N_2008 = c(93L, 193L, 56L, 119L,
33L, 68L), Clases_Not_taught_by_App__Cert__ = c(18L, 18L,
2L, 37L, 3L, 13L), Clases_Not_taught_by_App__Cert_0 = c(19L,
9L, 4L, 31L, 9L, 19L), Turnover_Rate_of_Teachers_with__ = c(31开发者_高级运维L,
56L, 20L, 32L, 0L, 50L), Turnover_Rate_all_Teachers_pct_2 = c(42L,
29L, 17L, 30L, 14L, 49L), Math_Level_3_4_pct_2006 = c(5.1,
16.4, 58.2, 34.4, 48.9, 12.4), Math_Level_3_4_pct_2007 = c(15.2,
22.1, 65.7, 29.9, 70.5, 22.6), Math_Level_3_4_pct_2008 = c(29.9,
43.2, 69.8, 41.2, 78.9, 38.5), Math_Level_3_4_pct_2009 = c(50.7,
49.7, 80.7, 47.1, 83.9, 51.6), Att__pct_2005 = c(0.83, 0.86,
0.89, 0.9, 0.89, 0.87), Susp__pct_2005 = c(6L, 15L, 1L, 4L,
0L, 3L), schoolnum = c(4013, 4045, 4096, 4101, 4102, 4117
), In_2006 = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
In_2007 = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), In_2008 = c(FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE), In_2009 = c(FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE), weights = c(1, 1, 1, 1, 1, 1
)), .Names = c("Grade_Range_2008", "X__of_Yrs_in_school",
"Total_Enrollment_2008", "Free_Lunch_pct_2008", "Reduced_Lunch_pct_2008",
"Stability_pct_2008", "Limited_Eng__Prof__pct_2008", "Am__Ind_pct_2008",
"Black_pct_2008", "Hispanic_pct_2008", "Asian_pct_2008", "White_pct_2008",
"Multi_pct_2008", "w_o_Valid_Cert__N_2008", "w_o_Valid_Cert__pct_2008",
"Teaching_Out_of_Certification_N_", "Teaching_Out_of_Certification_pc",
"X_3_yrs__Exp_N_2008", "X_3_yrs__Exp_pct_2008", "Masters_Plus_N_2008",
"Masters_Plus___2008", "Core_Classes_N_2008",
"Core_Not_Taught_by_HQ_Teachers_p",
"Number_of_Classes_N_2008", "Clases_Not_taught_by_App__Cert__",
"Clases_Not_taught_by_App__Cert_0", "Turnover_Rate_of_Teachers_with__",
"Turnover_Rate_all_Teachers_pct_2", "Math_Level_3_4_pct_2006",
"Math_Level_3_4_pct_2007", "Math_Level_3_4_pct_2008",
"Math_Level_3_4_pct_2009",
"Att__pct_2005", "Susp__pct_2005", "schoolnum", "In_2006", "In_2007",
"In_2008", "In_2009", "weights"), row.names = c(1L, 4L, 7L, 8L,
11L, 12L), class = "data.frame")
A column must be all of one data type; you can't mix logical and numeric.
Not sure how you would even do "long" analysis on multiple different data types because usually those are the same variables with different groupings. If you need to, try converting your logical values to numeric first (with as.numeric
).
While you're not using the reshape
package, Hadley made this point in his discussion of the melt()
function, which is performing the same task (see this paper, for instance):
In the current implementation [of melt], there is only one assumption that melt makes: all measured values must be of the same type, e.g., numeric, factor, date. We need this assumption because the molten data is stored in an R data frame, and the value column can be only one type. Most of the time this is not a problem as there are few cases where it makes sense to combine different types of variables in the cast output.
Edit:
I think you may be trying to do two things at once. Is this what you want?
a <- reshape(example.data[,-c(36:39)], direction = 'long', varying = c(29:32), v.names = c("Math34"), times = 2006:2009, idvar = "schoolnum")
b <- reshape(example.data[,-c(29:32)], direction = 'long', varying = c(36:39)-4, v.names = c("TFCIn"), times = 2006:2009, idvar = "schoolnum")
c <- merge(a,b)
精彩评论