Python - Pandas: select first observation per group -


i want adapt former sas code python using dataframe framework. in sas use type of code (assume columns sorted group_id group_id takes values 1 10 there multiple observations each group_id):

data want;set have; group_id; if first.group_id c=1; else c=0; run; 

so goes on here select first observations each id , create new variable c takes value 1 , 0 others. dataset looks this:

group_id c 1        1   1        0 1        0 2        1 2        0 2        0 3        1 3        0 3        0 

how can in python using dataframe? assume start group_id vector only.

if you're using 0.13+ can use cumcount groupby method:

in [11]: df out[11]:     group_id 0         1 1         1 2         1 3         2 4         2 5         2 6         3 7         3 8         3  in [12]: df.groupby('group_id').cumcount() == 0 out[12]:  0     true 1    false 2    false 3     true 4    false 5    false 6     true 7    false 8    false dtype: bool 

you can force dtype int rather bool:

in [13]: df['c'] = (df.groupby('group_id').cumcount() == 0).astype(int) 

Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -