Python - Pandas: select first observation per group -
i want adapt former sas code python using dataframe
framework. in sas use type of code (assume columns sorted group_id group_id takes values 1 10 there multiple observations each group_id):
data want;set have; group_id; if first.group_id c=1; else c=0; run;
so goes on here select first observations each id , create new variable c
takes value 1
, 0
others. dataset looks this:
group_id c 1 1 1 0 1 0 2 1 2 0 2 0 3 1 3 0 3 0
how can in python using dataframe
? assume start group_id
vector only.
if you're using 0.13+ can use cumcount
groupby method:
in [11]: df out[11]: group_id 0 1 1 1 2 1 3 2 4 2 5 2 6 3 7 3 8 3 in [12]: df.groupby('group_id').cumcount() == 0 out[12]: 0 true 1 false 2 false 3 true 4 false 5 false 6 true 7 false 8 false dtype: bool
you can force dtype int rather bool:
in [13]: df['c'] = (df.groupby('group_id').cumcount() == 0).astype(int)
Comments
Post a Comment