i set unique email addresses more 2 similar start years in dataframe in r na in new column.
start_year email 2016 a@a.com 2016 a@a.com 2016 a@a.com 2015 a@a.com 2015 a@a.com 2014 a@a.com 2015 b@b.com 2014 b@b.com 2014 b@b.com 2015 c@c.com
result (a@a.com has 3 similar start years 2016 , therefore set na in new column):
start_year email email_new 2016 a@a.com na 2016 a@a.com na 2016 a@a.com na 2015 a@a.com na 2015 a@a.com na 2014 a@a.com na 2015 b@b.com b@b.com 2014 b@b.com b@b.com 2014 b@b.com b@b.com 2015 c@c.com c@c.com
so far have gives error: select() inputs must resolve integer column positions.:
result <- df %>% group_by(email) %>% select(length(unique(start_year)) > 2)
any appreciated.
using dplyr
, far understood it, have 2 conditions converting email
na.
1)at least 3 of start_year
same
2) there more 2 observations
df %>% group_by(email) %>% mutate(new = ifelse(length(which(table(start_year) > 2)) > 0 & n()>2, 'na', as.character(email))) #source: local data frame [7 x 3] #groups: email [3] # start_year email new # <int> <chr> <chr> #1 2016 a@a.com <na> #2 2016 a@a.com <na> #3 2016 a@a.com <na> #4 2015 b@b.com b@b.com #5 2014 b@b.com b@b.com #6 2014 b@b.com b@b.com #7 2015 c@c.com c@c.com
adding 2014
b@b.com
, making 3 same years email, then
df1 %>% group_by(email) %>% mutate(new = ifelse(length(which(table(start_year) > 2)) > 0 & n()>2, 'na', as.character(email))) #source: local data frame [8 x 3] #groups: email [3] # start_year email new # <dbl> <chr> <chr> #1 2016 a@a.com na #2 2016 a@a.com na #3 2016 a@a.com na #4 2015 b@b.com na #5 2014 b@b.com na #6 2014 b@b.com na #7 2014 b@b.com na #8 2015 c@c.com c@c.com
data
dput(df) structure(list(start_year = c(2016l, 2016l, 2016l, 2015l, 2014l, 2014l, 2015l), email = structure(c(1l, 1l, 1l, 2l, 2l, 2l, 3l ), .label = c("a@a.com", "b@b.com", "c@c.com"), class = "factor")), .names = c("start_year", "email"), class = "data.frame", row.names = c(na, -7l)) dput(df1) structure(list(start_year = c(2016, 2016, 2016, 2015, 2014, 2014, 2014, 2015), email = structure(c(1l, 1l, 1l, 2l, 2l, 2l, 2l, 3l), .label = c("a@a.com", "b@b.com", "c@c.com"), class = "factor")), row.names = c(na, -8l), .names = c("start_year", "email"), class = "data.frame")
Comments
Post a Comment