R select email addresses with > 2 similar start dates in a year -


i set unique email addresses more 2 similar start years in dataframe in r na in new column.

start_year email 2016       a@a.com 2016       a@a.com 2016       a@a.com 2015       a@a.com 2015       a@a.com 2014       a@a.com 2015       b@b.com 2014       b@b.com 2014       b@b.com 2015       c@c.com 

result (a@a.com has 3 similar start years 2016 , therefore set na in new column):

start_year email    email_new 2016       a@a.com  na 2016       a@a.com  na 2016       a@a.com  na 2015       a@a.com  na 2015       a@a.com  na 2014       a@a.com  na 2015       b@b.com  b@b.com 2014       b@b.com  b@b.com 2014       b@b.com  b@b.com 2015       c@c.com  c@c.com 

so far have gives error: select() inputs must resolve integer column positions.:

result <- df %>%   group_by(email) %>%   select(length(unique(start_year)) > 2) 

any appreciated.

using dplyr, far understood it, have 2 conditions converting email na.

1)at least 3 of start_year same

2) there more 2 observations

 df %>%      group_by(email) %>%      mutate(new = ifelse(length(which(table(start_year) > 2)) > 0 & n()>2, 'na', as.character(email)))  #source: local data frame [7 x 3] #groups: email [3]  #  start_year   email     new #       <int>   <chr>   <chr> #1       2016 a@a.com    <na> #2       2016 a@a.com    <na> #3       2016 a@a.com    <na> #4       2015 b@b.com b@b.com #5       2014 b@b.com b@b.com #6       2014 b@b.com b@b.com #7       2015 c@c.com c@c.com 

adding 2014 b@b.com, making 3 same years email, then

df1 %>%       group_by(email) %>%       mutate(new = ifelse(length(which(table(start_year) > 2)) > 0 & n()>2, 'na', as.character(email)))  #source: local data frame [8 x 3] #groups: email [3]  #  start_year   email     new #       <dbl>   <chr>   <chr> #1       2016 a@a.com      na #2       2016 a@a.com      na #3       2016 a@a.com      na #4       2015 b@b.com      na #5       2014 b@b.com      na #6       2014 b@b.com      na #7       2014 b@b.com      na #8       2015 c@c.com c@c.com 

data

dput(df) structure(list(start_year = c(2016l, 2016l, 2016l, 2015l, 2014l,  2014l, 2015l), email = structure(c(1l, 1l, 1l, 2l, 2l, 2l, 3l ), .label = c("a@a.com", "b@b.com", "c@c.com"), class = "factor")), .names = c("start_year",  "email"), class = "data.frame", row.names = c(na, -7l))  dput(df1) structure(list(start_year = c(2016, 2016, 2016, 2015, 2014, 2014,  2014, 2015), email = structure(c(1l, 1l, 1l, 2l, 2l, 2l, 2l,  3l), .label = c("a@a.com", "b@b.com", "c@c.com"), class = "factor")), row.names = c(na,  -8l), .names = c("start_year", "email"), class = "data.frame") 

Comments