i trying create vector dataframe has of punctation , symbols form [:punct:] class in r. there way print out contents of class rather trying string of characters? seems have escape each character , paste them manually string, seems extremely tedious.
these symbols:
! " # $ % & ’ ( ) * + , - . / : ; < = > ? @ [ ] ^ _ ` { | } ~. #code far symbols <- c(' ! " # $ % & ’ ( ) * + , - . / : ; < = > ? @ [ ] ^ _ ` { | } ~. ') any appreciated. thanks.
you can convert raw character , grep predefined classes:
(rch <- as.raw(0:255)) # [1] 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 ... (ch <- rawtochar(rch, true)) # [1] "" "\001" "\002" "\003" "\004" "\005" "\006" "\a" "\b" "\t" "\n" "\v" "\f" ... ## change locale avoid warnings sys.setlocale('lc_all','c') dput(grep('[[:punct:]]', ch, value = true)) # c("!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", # "-", ".", "/", ":", ";", "<", "=", ">", "?", "@", "[", "\\", # "]", "^", "_", "`", "{", "|", "}", "~") ?regex describes these classes:
[:alnum:]alphanumeric characters:[:alpha:],[:digit:].
[:alpha:]alphabetic characters:[:lower:],[:upper:].
[:blank:]blank characters: space , tab, , possibly other locale-dependent characters such non-breaking space.
[:cntrl:]control characters. in ascii, these characters have octal codes 000 through 037, , 177 (del). in character set, these equivalent characters, if any.
[:digit:]digits: 0 1 2 3 4 5 6 7 8 9.
[:graph:]graphical characters: [:alnum:] , [:punct:].
[:lower:]lower-case letters in current locale.
[:print:]printable characters:[:alnum:],[:punct:], space.
[:punct:]punctuation characters: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.
[:space:]space characters: tab, newline, vertical tab, form feed, carriage return, space , possibly other locale-dependent characters.
[:upper:]upper-case letters in current locale.
[:xdigit:]hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 b c d e f b c d e f.
so can repeat above of these
dput(grep('[[:space:]]', ch, value = true)) # c("\t", "\n", "\v", "\f", "\r", " ") dput(grep('[[:alnum:]]', ch, value = true)) # c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", # "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", # "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "a", "b", # "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", # "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z") you can use pcre
dput(grep('\\s', ch, value = true)) # c("\t", "\n", "\v", "\f", "\r", " ") dput(grep('\\v|\\h', ch, value = true, perl = true)) # c("\t", "\n", "\v", "\f", "\r", " ", "\205", "\240") dput(grep('\\p{p}', ch, value = true, perl = true)) # c("!", "\"", "#", "%", "&", "'", "(", ")", "*", ",", "-", ".", # "/", ":", ";", "?", "@", "[", "\\", "]", "_", "{", "}", "\241", # "\247", "\253", "\266", "\267", "\273", "\277") or define own, etc
dput(grep('[\x20-\x7e]', ch, value = true)) dput(grep('[a-c]', ch, value = true))
Comments
Post a Comment