row - R: extract maximum value in vector under certain conditions -
i'm trying large data set denotes person's career history in firm. want see maximum number of years person worked manager, under condition person in sales category prior becoming boss (regardless of how many years prior was). data looks following: job2 dummy variable indicating whether person manager, cumu_job2 denotes cumulative years person in manager position (only sequential cumulation considered).
id name year job job2 cumu_job2 1 jane 1980 worker 0 0 1 jane 1981 manager 1 1 1 jane 1982 sales 0 0 1 jane 1983 sales 0 0 1 jane 1984 manager 1 1 1 jane 1985 manager 1 2 1 jane 1986 boss 0 0 2 bob 1985 worker 0 0 2 bob 1986 sales 0 0 2 bob 1987 manager 1 1 2 bob 1988 manager 1 2 2 bob 1989 boss 0 0 by extracting maximum years person worked, under condition person had history of working in sales want data have column denotes information:
id name year job job2 cumu_job2 cumu_max 1 jane 1983 sales 0 0 1 jane 1986 boss 0 2 2 bob 1986 sales 0 0 2 bob 1989 boss 0 2 so believe requires 2 steps - first need extract case when person move sales boss, , store maximum value each person in new vector cumu_max based on cumu_job2.
this complex process, suggestions appreciated...!
i have considered why answer below using dplyr not work, , here think - example showed people became boss after becoming manager, have data points looks kevin:
id name year job job2 cumu_job2 1 jane 1980 worker 0 0 1 jane 1981 manager 1 1 1 jane 1982 sales 0 0 1 jane 1983 sales 0 0 1 jane 1984 manager 1 1 1 jane 1985 manager 1 2 1 jane 1986 boss 0 0 2 bob 1985 worker 0 0 2 bob 1986 sales 0 0 2 bob 1987 manager 1 1 2 bob 1988 manager 1 2 2 bob 1989 boss 0 0 3 kevin 1991 manager 1 1 3 kevin 1992 manager 1 2 3 kevin 1993 sales 0 0 4 kevin 1994 boss 0 0 so in end, want
id name year job cumu_job2 cumu_max 1 jane 1983 sales 0 0 1 jane 1986 boss 0 2 2 bob 1986 sales 0 0 2 bob 1989 boss 0 2 3 kevin 1993 sales 0 2 3 kevin 1994 boss 0 2 the dplyr solution spits out ones went sales - manager - boss without taking account possibility of manager - sales - boss (which more observed in data set).
this may not cover cases in actual data (mostly) looking for. note added jill should excluded according conditions.
require(dplyr) dat <- read.table(header = true, text = "id name year job job2 cumu_job2 1 jane 1980 worker 0 0 1 jane 1981 manager 1 1 1 jane 1982 sales 0 0 1 jane 1983 sales 0 0 1 jane 1984 manager 1 1 1 jane 1985 manager 1 2 1 jane 1986 boss 0 0 2 bob 1985 worker 0 0 2 bob 1986 sales 0 0 2 bob 1987 manager 1 1 2 bob 1988 manager 1 2 2 bob 1989 boss 0 0 3 jill 1989 worker 0 0 3 jill 1990 boss 0 0") dat %.% group_by(id) %.% mutate( all_jobs = sum(unique(job) %in% c("sales","manager","boss")), cumu_max = max(cumu_job2) ) %.% filter(all_jobs == 3, job %in% c("sales","boss")) source: local data frame [5 x 8] groups: id id name year job job2 cumu_job2 all_jobs cumu_max 1 1 jane 1982 sales 0 0 3 2 2 1 jane 1983 sales 0 0 3 2 3 1 jane 1986 boss 0 0 3 2 4 2 bob 1986 sales 0 0 3 2 5 2 bob 1989 boss 0 0 3 2
Comments
Post a Comment