r - Find and remove rows that are identical in 3 columns and differ in 1 -


i have binned data in intervals (of 100000) using 2 different frames: 0 100000 , onwards, , 50000 150000 , onwards. joined both dataframes, using 1 column identifier frames (represented in column "x100kb").

for purpose, if 2 rows (edit: don't need sequent each other; since data not ordered "chr" , "x100kb" right now) differ in "x100kb" 0.5 (preferably comparing whole numbers +0.5; eg: 60 60.5, 65 65.5; etc) have same values in "chr" , "occurrences_norm" , "occurrences_tum"; equal , want remove 1 of them. thing coming mind loops, obviusly not productive...

data example:

       chr    x100kb occurrences_norm    occurrences_tum   fold 19064 chr17   61.5               17               0 14.05333 38799  chr5  526.0               16               0 13.96587 38800  chr5  526.5               16               0 13.96587 39946  chr5 1113.5               16               0 13.96587 2377   chr1 1426.0               15               0 13.87277 21859 chr18  733.5               15               0 13.87277 20538 chr18   24.0               14               0 13.77324 21863 chr18  735.5               14               0 13.77324 37699  chr4 1835.5               14               0 13.77324 39924  chr5 1102.5               14               0 13.77324 21506 chr18  550.5               13               0 13.66633 21862 chr18  735.0               13               0 13.66633 22258 chr19  151.5               13               0 13.66633 38972  chr5  613.0               13               0 13.66633 41707  chr6  194.5               13               0 13.66633 2380   chr1 1427.5               12               0 13.55087 20541 chr18   25.5               12               0 13.55087 21252 chr18  421.0               12               0 13.55087 27384  chr2 2243.0               12               0 13.55087 39990  chr5 1135.5               12               0 13.55087 

in example, 3rd row removed.

i read question in different way. thought need compare 2 sequent rows. example, check row 1 & 2, row 2 & 3, , on. thought condition difference in x100kb 0.5, not large 0.5. thought running 4 logical checks, using shift(), 1 way achieve goal.

setdt(df1)[!((abs(x100kb - shift(x100kb, type = "lag", fill = -inf)) == 0.5) &              (chr == shift(chr, type = "lag")) &              (occurrences_norm == shift(occurrences_norm, type = "lag")) &              (occurrences_tum == shift(occurrences_tum, type = "lag")))            ]  #      chr x100kb occurrences_norm occurrences_tum     fold # 1: chr17   61.5               17               0 14.05333 # 2:  chr5  526.0               16               0 13.96587 # 3:  chr5 1113.5               16               0 13.96587 # 4:  chr1 1426.0               15               0 13.87277 # 5: chr18  733.5               15               0 13.87277 # 6: chr18   24.0               14               0 13.77324 # 7: chr18  735.5               14               0 13.77324 # 8:  chr4 1835.5               14               0 13.77324 # 9:  chr5 1102.5               14               0 13.77324 #10: chr18  550.5               13               0 13.66633 #11: chr18  735.0               13               0 13.66633 #12: chr19  151.5               13               0 13.66633 #13:  chr5  613.0               13               0 13.66633 #14:  chr6  194.5               13               0 13.66633 #15:  chr1 1427.5               12               0 13.55087 #16: chr18   25.5               12               0 13.55087 #17: chr18  421.0               12               0 13.55087 #18:  chr2 2243.0               12               0 13.55087 #19:  chr5 1135.5               12               0 13.55087 

Comments