regex - Extracting parts of a string with regular expression -


i've bunch of strings contains pattern, want extract. looks following:

  str <- "regular expression language (abcdfe-bb)" 

so 2 new columns, 1 "abcdfe" part, , other after - part, in case "bb".

i use extracting these pieces: (it's variation on parse.one function ?regexpr)

getmatchingpatterns <- function(data, pattern) {     result <- gregexpr(pattern, data, perl = true)    do.call(rbind,lapply(seq_along(data), function(i) {    if(any(result[[i]] == -1)) return("")    st <- data.frame(attr(result[[i]], "capture.start"))    le <- data.frame(attr(result[[i]], "capture.length") - 1)     mapply(function(start,leng) substring(data[i], start, start + leng), st,  le)  })) } 

then define perl style pattern name each variable, in case (and big assumption, based on 1 example),

pattern <- "\\((?<abcpart>.*?)-(?<bpart>.*?)\\)"

so first part i'm naming abcpart, , second 1 bpart

then call above function pattern:

> getmatchingpatterns(str,pattern)       abcpart  bpart    [1,] "abcdfe" "bb"  

it returns in matrix form, convertible data.frame, data.table etc...

the above function find matches given pattern, beware how general pattern is.


Comments