regex - String together all Punctation and symbols into one string -


i trying create vector dataframe has of punctation , symbols form [:punct:] class in r. there way print out contents of class rather trying string of characters? seems have escape each character , paste them manually string, seems extremely tedious.

these symbols:

! " # $ % & ’ ( ) * + , - . / : ; < = > ? @ [  ] ^ _ ` { | } ~. #code far symbols <- c(' ! " # $ % & ’ ( ) * + , - . / : ; < = > ? @ [  ] ^ _ ` { | } ~. ') 

any appreciated. thanks.

you can convert raw character , grep predefined classes:

(rch <- as.raw(0:255)) # [1] 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 ...  (ch <- rawtochar(rch, true)) # [1] "" "\001" "\002" "\003" "\004" "\005" "\006" "\a"   "\b"   "\t"   "\n"   "\v"   "\f"  ...   ## change locale avoid warnings sys.setlocale('lc_all','c')  dput(grep('[[:punct:]]', ch, value = true)) # c("!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",",  #   "-", ".", "/", ":", ";", "<", "=", ">", "?", "@", "[", "\\",  #   "]", "^", "_", "`", "{", "|", "}", "~") 

?regex describes these classes:

[:alnum:] alphanumeric characters: [:alpha:] , [:digit:].

[:alpha:] alphabetic characters: [:lower:] , [:upper:].

[:blank:] blank characters: space , tab, , possibly other locale-dependent characters such non-breaking space.

[:cntrl:] control characters. in ascii, these characters have octal codes 000 through 037, , 177 (del). in character set, these equivalent characters, if any.

[:digit:] digits: 0 1 2 3 4 5 6 7 8 9.

[:graph:] graphical characters: [:alnum:] , [:punct:].

[:lower:] lower-case letters in current locale.

[:print:] printable characters: [:alnum:], [:punct:] , space.

[:punct:] punctuation characters: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.

[:space:] space characters: tab, newline, vertical tab, form feed, carriage return, space , possibly other locale-dependent characters.

[:upper:] upper-case letters in current locale.

[:xdigit:] hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 b c d e f b c d e f.

so can repeat above of these

dput(grep('[[:space:]]', ch, value = true)) # c("\t", "\n", "\v", "\f", "\r", " ")  dput(grep('[[:alnum:]]', ch, value = true)) # c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b",  # "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o",  # "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "a", "b",  # "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o",  # "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z") 

you can use pcre

dput(grep('\\s', ch, value = true)) # c("\t", "\n", "\v", "\f", "\r", " ")  dput(grep('\\v|\\h', ch, value = true, perl = true)) # c("\t", "\n", "\v", "\f", "\r", " ", "\205", "\240")  dput(grep('\\p{p}', ch, value = true, perl = true)) # c("!", "\"", "#", "%", "&", "'", "(", ")", "*", ",", "-", ".",  # "/", ":", ";", "?", "@", "[", "\\", "]", "_", "{", "}", "\241",  # "\247", "\253", "\266", "\267", "\273", "\277") 

or define own, etc

dput(grep('[\x20-\x7e]', ch, value = true)) dput(grep('[a-c]', ch, value = true)) 

Comments