i have following table in r dataframe
i write logic generates "keep" column. each person flag accounts has transaction newer 4 days, since first access. first line new account person flag it. second line dates 2 days apart keep too. third line 11 days since first saw account not flag it. same logic goes next person. flag accounts less 4 days old.
i have rebuilt data frame, try solution:
library(lubridate) library(dplyr) df <- data.frame(person = c(rep("abc",3), rep("eee", 5)), date = c("4/1/2016", "4/3/2016", "4/12/2016", "5/3/2016", "5/4/2016","5/4/2016","5/6/2016", "5/10/2016"), account = c("123","123","123","222","222","333","222","333"), stringsasfactors = f) df$date2 <- mdy(df$date)
the best solution, suggested @thelatemail:
df %>% group_by(person) %>% mutate(keep=as.numeric(date2 - first(date2) <= 4)) %>% select(-date2)
result:
person date account keep 1 abc 4/1/2016 123 1 2 abc 4/3/2016 123 1 3 abc 4/12/2016 123 0 4 eee 5/3/2016 222 1 5 eee 5/4/2016 222 1 6 eee 5/4/2016 333 1 7 eee 5/6/2016 222 1 8 eee 5/10/2016 333 0
my more convoluted original solution (useful if account creation date not in first line each person):
df %>% group_by(person) %>% slice(which.min(date2)) %>% select(person, date2) %>% rename(account_create = date2) %>% merge(df, ., = "person") %>% mutate(keep = as.numeric(date2 - account_create <= 4)) %>% select(-c(date2, account_create))
Comments
Post a Comment