csv - How to select columns conditionally in a data frame in R -
how can find mean/median (any other such thing) of women? have tried few piece of code access women data in particular unsuccessful. appreciated.
> jalal <- read.csv("jalal.csv", header=true,sep=",") > which(jalal$sex==f) integer(0) > jalal age sex weight eye.color hair.color 1 23 f 93.8 blue black 2 21 m 180.8 amber gray 3 22 f 196.5 hazel gray 4 22 m 256.2 amber black 5 21 m 219.6 blue gray 6 16 f 152.1 blue gray 7 21 f 183.3 gray chestnut 8 18 m 179.1 brown blond 9 15 m 206.1 blue white 10 19 m 211.6 brown blond 11 20 f 209.4 blue white 12 21 m 194.0 brown auburn 13 22 f 204.1 green black 14 21 f 157.4 hazel red 15 15 f 238.0 green gray 16 20 f 154.8 gray gray 17 16 f 245.8 gray gray 18 23 m 198.2 gray red 19 19 m 169.1 green brown 20 24 m 198.0 green gray > subset(jalal, subset=(sex =f)) -> females > females [1] age sex weight eye.color hair.color <0 rows> (or 0-length row.names) > subset(jalal, subset=(sex ==f)) -> females > females [1] age sex weight eye.color hair.color <0 rows> (or 0-length row.names)
here's what's in jalal.csv:
"age","sex","weight","eye.color","hair.color" 23,"f",93.8,"blue","black" 21,"m",180.8,"amber","gray" 22,"f",196.5,"hazel","gray" 22,"m",256.2,"amber","black" 21,"m",219.6,"blue","gray" 16,"f",152.1,"blue","gray" 21,"f",183.3,"gray","chestnut" 18,"m",179.1,"brown","blond" 15,"m",206.1,"blue","white" 19,"m",211.6,"brown","blond" 20,"f",209.4,"blue","white" 21,"m",194,"brown","auburn" 22,"f",204.1,"green","black" 21,"f",157.4,"hazel","red" 15,"f",238,"green","gray" 20,"f",154.8,"gray","gray" 16,"f",245.8,"gray","gray" 23,"m",198.2,"gray","red" 19,"m",169.1,"green","brown" 24,"m",198,"green","gray"
you're looking aggregate
. here forumla returns median age , weight sex:
aggregate(cbind(age, weight) ~ sex, data=jalal, fun=median) ## sex age weight ## 1 f 20.5 189.9 ## 2 m 21.0 198.1
to data frame containing women, here syntax [
:
jalal[jalal$sex == 'f',]
note quotes around 'f'
. bare f
means false
. that's why second subset
expression fails.
subset(jalal, subset=(sex =='f')) ## age sex weight eye.color hair.color ## 1 23 f 93.8 blue black ## 3 22 f 196.5 hazel gray ## 6 16 f 152.1 blue gray
...
in comment, requested method mean values women blue eyes. first approach filter data frame blue-eyed people:
aggregate(cbind(age, weight) ~ sex, data=jalal[jalal$eye.color == 'blue',], fun=mean) ## sex age weight ## 1 f 19.66667 151.7667 ## 2 m 18.00000 212.8500
but seems hackish, after all, we're not filtering data frame on women. here formula gives mean age , weight, sex , eye color. this, can find mean of blue-eyed women, green-eyed men, etc.:
aggregate(cbind(age, weight) ~ sex + eye.color, data=jalal, fun=mean) ## sex eye.color age weight ## 1 m amber 21.50000 218.5000 ## 2 f blue 19.66667 151.7667 ## 3 m blue 18.00000 212.8500 ## 4 m brown 19.33333 194.9000 ## 5 f gray 19.00000 194.6333 ## 6 m gray 23.00000 198.2000 ## 7 f green 18.50000 221.0500 ## 8 m green 21.50000 183.5500 ## 9 f hazel 21.50000 176.9500
note rows 2 , 3 here match results in prior expression.
Comments
Post a Comment