r - Calculate group mean with the same grouping factors several times -

i have genetic data. quite big, 17 000 genetic markers (snps) , 700 individuals. these snps can assigned founder. want calculate average probability per 'founder segment'. segment defined part of chromosome assigned 1 founder uninterrupted.

in example below have 3 segments.
in end want know average probability on snps within segment.

chromosome snp founder probability  1       1     7      0.6   1       2     7      0.5   1       3     7      0.7   1       4     2      0.5   1       5     2      0.8   1       6     7      0.6   1       7     7      0.5

i can group dplyr, don't want first segment of founder 7 other segment founder 7.

so want:

chromosome snp founder probability average  1       1     7      0.6        0.6  1       2     7      0.5        0.6  1       3     7      0.7        0.6  1       4     2      0.5        0.65  1       5     2      0.8        0.65  1       6     7      0.6        0.55  1       7     7      0.5        0.55

how can calculate group mean when have same grouping factors several times?

with dplyr can compare adjacent elements of 'founder' create grouping variable along 'chromosome', , mean of 'probability'

library(dplyr) library(data.table) df1 %>%   group_by(chromosome, grp1 =  cumsum(founder!=lag(founder, default = founder[n()]))) %>%   mutate(average = mean(probability)) # chromosome   snp founder probability  grp1 average #       <int> <int>   <int>       <dbl> <int>   <dbl> #1          1     1       7         0.6     0    0.60 #2          1     2       7         0.5     0    0.60 #3          1     3       7         0.7     0    0.60 #4          1     4       2         0.5     1    0.65 #5          1     5       2         0.8     1    0.65 #6          1     6       7         0.6     2    0.55 #7          1     7       7         0.5     2    0.55

or using data.table, convert 'data.frame' 'data.table' (setdt(df1)), grouped 'chromome' , run-length-type id (rleid) of 'founder', assign (:=) mean of "probability" "average" column.

library(data.table) setdt(df1)[, average := mean(probability) , .(chromosome, grp1 = rleid(founder))]

Thr

Search This Blog

r - Calculate group mean with the same grouping factors several times -

Comments

Post a Comment