csv - How to select columns conditionally in a data frame in R -


how can find mean/median (any other such thing) of women? have tried few piece of code access women data in particular unsuccessful. appreciated.

> jalal <- read.csv("jalal.csv", header=true,sep=",") > which(jalal$sex==f) integer(0) > jalal    age sex weight eye.color hair.color 1   23   f   93.8      blue      black 2   21   m  180.8     amber       gray 3   22   f  196.5     hazel       gray 4   22   m  256.2     amber      black 5   21   m  219.6      blue       gray 6   16   f  152.1      blue       gray 7   21   f  183.3      gray   chestnut 8   18   m  179.1     brown      blond 9   15   m  206.1      blue      white 10  19   m  211.6     brown      blond 11  20   f  209.4      blue      white 12  21   m  194.0     brown     auburn 13  22   f  204.1     green      black 14  21   f  157.4     hazel        red 15  15   f  238.0     green       gray 16  20   f  154.8      gray       gray 17  16   f  245.8      gray       gray 18  23   m  198.2      gray        red 19  19   m  169.1     green      brown 20  24   m  198.0     green       gray > subset(jalal, subset=(sex =f)) -> females > females [1] age        sex        weight     eye.color  hair.color <0 rows> (or 0-length row.names) > subset(jalal, subset=(sex ==f)) -> females > females [1] age        sex        weight     eye.color  hair.color <0 rows> (or 0-length row.names) 

here's what's in jalal.csv:

"age","sex","weight","eye.color","hair.color" 23,"f",93.8,"blue","black" 21,"m",180.8,"amber","gray" 22,"f",196.5,"hazel","gray" 22,"m",256.2,"amber","black" 21,"m",219.6,"blue","gray" 16,"f",152.1,"blue","gray" 21,"f",183.3,"gray","chestnut" 18,"m",179.1,"brown","blond" 15,"m",206.1,"blue","white" 19,"m",211.6,"brown","blond" 20,"f",209.4,"blue","white" 21,"m",194,"brown","auburn" 22,"f",204.1,"green","black" 21,"f",157.4,"hazel","red" 15,"f",238,"green","gray" 20,"f",154.8,"gray","gray" 16,"f",245.8,"gray","gray" 23,"m",198.2,"gray","red" 19,"m",169.1,"green","brown" 24,"m",198,"green","gray" 

you're looking aggregate. here forumla returns median age , weight sex:

aggregate(cbind(age, weight) ~ sex, data=jalal, fun=median) ##   sex  age weight ## 1   f 20.5  189.9 ## 2   m 21.0  198.1 

to data frame containing women, here syntax [:

jalal[jalal$sex == 'f',] 

note quotes around 'f'. bare f means false. that's why second subset expression fails.

subset(jalal, subset=(sex =='f')) ##    age sex weight eye.color hair.color ## 1   23   f   93.8      blue      black ## 3   22   f  196.5     hazel       gray ## 6   16   f  152.1      blue       gray 

...

in comment, requested method mean values women blue eyes. first approach filter data frame blue-eyed people:

aggregate(cbind(age, weight) ~ sex, data=jalal[jalal$eye.color == 'blue',], fun=mean) ##   sex      age   weight ## 1   f 19.66667 151.7667 ## 2   m 18.00000 212.8500 

but seems hackish, after all, we're not filtering data frame on women. here formula gives mean age , weight, sex , eye color. this, can find mean of blue-eyed women, green-eyed men, etc.:

aggregate(cbind(age, weight) ~ sex + eye.color, data=jalal, fun=mean) ##   sex eye.color      age   weight ## 1   m     amber 21.50000 218.5000 ## 2   f      blue 19.66667 151.7667 ## 3   m      blue 18.00000 212.8500 ## 4   m     brown 19.33333 194.9000 ## 5   f      gray 19.00000 194.6333 ## 6   m      gray 23.00000 198.2000 ## 7   f     green 18.50000 221.0500 ## 8   m     green 21.50000 183.5500 ## 9   f     hazel 21.50000 176.9500 

note rows 2 , 3 here match results in prior expression.


Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -