i've bunch of strings contains pattern, want extract. looks following:
str <- "regular expression language (abcdfe-bb)"
so 2 new columns, 1 "abcdfe" part, , other after - part, in case "bb".
i use extracting these pieces: (it's variation on parse.one
function ?regexpr)
getmatchingpatterns <- function(data, pattern) { result <- gregexpr(pattern, data, perl = true) do.call(rbind,lapply(seq_along(data), function(i) { if(any(result[[i]] == -1)) return("") st <- data.frame(attr(result[[i]], "capture.start")) le <- data.frame(attr(result[[i]], "capture.length") - 1) mapply(function(start,leng) substring(data[i], start, start + leng), st, le) })) }
then define perl style pattern name each variable, in case (and big assumption, based on 1 example),
pattern <- "\\((?<abcpart>.*?)-(?<bpart>.*?)\\)"
so first part i'm naming abcpart
, , second 1 bpart
then call above function pattern:
> getmatchingpatterns(str,pattern) abcpart bpart [1,] "abcdfe" "bb"
it returns in matrix form, convertible data.frame, data.table etc...
the above function find matches given pattern, beware how general pattern is.
Comments
Post a Comment