Python - Pandas: select first observation per group -

- April 15, 2010

i want adapt former sas code python using dataframe framework. in sas use type of code (assume columns sorted group_id group_id takes values 1 10 there multiple observations each group_id):

data want;set have; group_id; if first.group_id c=1; else c=0; run;

so goes on here select first observations each id , create new variable c takes value 1 , 0 others. dataset looks this:

group_id c 1        1   1        0 1        0 2        1 2        0 2        0 3        1 3        0 3        0

how can in python using dataframe? assume start group_id vector only.

if you're using 0.13+ can use cumcount groupby method:

in [11]: df out[11]:     group_id 0         1 1         1 2         1 3         2 4         2 5         2 6         3 7         3 8         3  in [12]: df.groupby('group_id').cumcount() == 0 out[12]:  0     true 1    false 2    false 3     true 4    false 5    false 6     true 7    false 8    false dtype: bool

you can force dtype int rather bool:

in [13]: df['c'] = (df.groupby('group_id').cumcount() == 0).astype(int)

Search This Blog

KBPS

Python - Pandas: select first observation per group -

Comments

Post a Comment

Popular posts from this blog

node.js - StackOverflow API not returning JSON -

python - Subclassed QStyledItemDelegate ignores Stylesheet -

php - Laravel 4.1 to Heroku: SQLSTATE[HY000] [2002] No such file or directory -