row - R: extract maximum value in vector under certain conditions -
i'm trying large data set denotes person's career history in firm. want see maximum number of years person worked manager
, under condition person in sales
category prior becoming boss
(regardless of how many years prior was). data looks following: job2
dummy variable indicating whether person manager
, cumu_job2
denotes cumulative years person in manager
position (only sequential cumulation considered).
id name year job job2 cumu_job2 1 jane 1980 worker 0 0 1 jane 1981 manager 1 1 1 jane 1982 sales 0 0 1 jane 1983 sales 0 0 1 jane 1984 manager 1 1 1 jane 1985 manager 1 2 1 jane 1986 boss 0 0 2 bob 1985 worker 0 0 2 bob 1986 sales 0 0 2 bob 1987 manager 1 1 2 bob 1988 manager 1 2 2 bob 1989 boss 0 0
by extracting maximum years person worked, under condition person had history of working in sales
want data have column denotes information:
id name year job job2 cumu_job2 cumu_max 1 jane 1983 sales 0 0 1 jane 1986 boss 0 2 2 bob 1986 sales 0 0 2 bob 1989 boss 0 2
so believe requires 2 steps - first need extract case when person move sales
boss
, , store maximum value each person in new vector cumu_max
based on cumu_job2
.
this complex process, suggestions appreciated...!
i have considered why answer below using dplyr
not work, , here think - example showed people became boss after becoming manager, have data points looks kevin
:
id name year job job2 cumu_job2 1 jane 1980 worker 0 0 1 jane 1981 manager 1 1 1 jane 1982 sales 0 0 1 jane 1983 sales 0 0 1 jane 1984 manager 1 1 1 jane 1985 manager 1 2 1 jane 1986 boss 0 0 2 bob 1985 worker 0 0 2 bob 1986 sales 0 0 2 bob 1987 manager 1 1 2 bob 1988 manager 1 2 2 bob 1989 boss 0 0 3 kevin 1991 manager 1 1 3 kevin 1992 manager 1 2 3 kevin 1993 sales 0 0 4 kevin 1994 boss 0 0
so in end, want
id name year job cumu_job2 cumu_max 1 jane 1983 sales 0 0 1 jane 1986 boss 0 2 2 bob 1986 sales 0 0 2 bob 1989 boss 0 2 3 kevin 1993 sales 0 2 3 kevin 1994 boss 0 2
the dplyr solution spits out ones went sales - manager - boss without taking account possibility of manager - sales - boss (which more observed in data set).
this may not cover cases in actual data (mostly) looking for. note added jill should excluded according conditions.
require(dplyr) dat <- read.table(header = true, text = "id name year job job2 cumu_job2 1 jane 1980 worker 0 0 1 jane 1981 manager 1 1 1 jane 1982 sales 0 0 1 jane 1983 sales 0 0 1 jane 1984 manager 1 1 1 jane 1985 manager 1 2 1 jane 1986 boss 0 0 2 bob 1985 worker 0 0 2 bob 1986 sales 0 0 2 bob 1987 manager 1 1 2 bob 1988 manager 1 2 2 bob 1989 boss 0 0 3 jill 1989 worker 0 0 3 jill 1990 boss 0 0") dat %.% group_by(id) %.% mutate( all_jobs = sum(unique(job) %in% c("sales","manager","boss")), cumu_max = max(cumu_job2) ) %.% filter(all_jobs == 3, job %in% c("sales","boss")) source: local data frame [5 x 8] groups: id id name year job job2 cumu_job2 all_jobs cumu_max 1 1 jane 1982 sales 0 0 3 2 2 1 jane 1983 sales 0 0 3 2 3 1 jane 1986 boss 0 0 3 2 4 2 bob 1986 sales 0 0 3 2 5 2 bob 1989 boss 0 0 3 2
Comments
Post a Comment