Pandas tidy data, spread variables from one column, gather from another
My Problem
I need to turn the dataframe below into a tidy format, where each row will be a unique ['GEOG_CODE','COUNTRY'] - 'YEAR' pairing, and there are two variables, defined by Group1.
Using Hadley Wickham's notation for tidy data:
The observations are defined by the Location-Time pairings.
The variables are defined by the column Group1
The values are currently stored for different Years in columns ['2016' '2017' '2018'].
In R I would want to:
gather the values from the columns ['2016' '2017' '2018'].
spread the values from Group1.
see Garrett Grolemund's explanation here
For my problem:
Location is defined by the ['GEOG_CODE','COUNTRY'].
Values at different times are defined in the columns ['2016' '2017' '2018'].
Variables are defined by Group1 == A or Group1 == B.
I want to have each row as a Location-Time pair, with two variables. One for Group1 = A, one for Group1 = B
I have this
toy_data = {
'GEOG_CODE':['123','234','567','901'],
'COUNTRY':['England' for _ in range(4)],
'Group1':['A','A','B','B'],
'2016':np.arange(0,4),
'2017':np.arange(0,4),
'2018':np.arange(0,4),
}
in_df = pd.DataFrame(toy_data)
in_df
Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3
I want this
So I want the output to look like the dataframe below with columns for each of the values in 'Group1'
outcome_data = {
'GEOG_CODE': np.tile(['123','234','567','901'],3),
'COUNTRY':['England' for _ in range(4*3)],
'year':np.tile([2016,2017,2018],4),
'low_A':np.tile(np.arange(0,4),3),
'low_B':np.tile(np.arange(0,4),3),
}
out = pd.DataFrame(outcome_data)
out
Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3
I tried df.melt()
I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.
id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'
melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt
Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3
JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)