i trying create vector dataframe has of punctation , symbols form [:punct:] class in r. there way print out contents of class rather trying string of characters? seems have escape each character , paste them manually string, seems extremely tedious.
these symbols:
! " # $ % & ’ ( ) * + , - . / : ; < = > ? @ [ ] ^ _ ` { | } ~. #code far symbols <- c(' ! " # $ % & ’ ( ) * + , - . / : ; < = > ? @ [ ] ^ _ ` { | } ~. ')
any appreciated. thanks.
you can convert raw character , grep
predefined classes:
(rch <- as.raw(0:255)) # [1] 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 ... (ch <- rawtochar(rch, true)) # [1] "" "\001" "\002" "\003" "\004" "\005" "\006" "\a" "\b" "\t" "\n" "\v" "\f" ... ## change locale avoid warnings sys.setlocale('lc_all','c') dput(grep('[[:punct:]]', ch, value = true)) # c("!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", # "-", ".", "/", ":", ";", "<", "=", ">", "?", "@", "[", "\\", # "]", "^", "_", "`", "{", "|", "}", "~")
?regex
describes these classes:
[:alnum:]
alphanumeric characters:[:alpha:]
,[:digit:]
.
[:alpha:]
alphabetic characters:[:lower:]
,[:upper:]
.
[:blank:]
blank characters: space , tab, , possibly other locale-dependent characters such non-breaking space.
[:cntrl:]
control characters. in ascii, these characters have octal codes 000 through 037, , 177 (del). in character set, these equivalent characters, if any.
[:digit:]
digits: 0 1 2 3 4 5 6 7 8 9.
[:graph:]
graphical characters: [:alnum:] , [:punct:].
[:lower:]
lower-case letters in current locale.
[:print:]
printable characters:[:alnum:]
,[:punct:]
, space.
[:punct:]
punctuation characters: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.
[:space:]
space characters: tab, newline, vertical tab, form feed, carriage return, space , possibly other locale-dependent characters.
[:upper:]
upper-case letters in current locale.
[:xdigit:]
hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 b c d e f b c d e f.
so can repeat above of these
dput(grep('[[:space:]]', ch, value = true)) # c("\t", "\n", "\v", "\f", "\r", " ") dput(grep('[[:alnum:]]', ch, value = true)) # c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", # "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", # "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "a", "b", # "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", # "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z")
you can use pcre
dput(grep('\\s', ch, value = true)) # c("\t", "\n", "\v", "\f", "\r", " ") dput(grep('\\v|\\h', ch, value = true, perl = true)) # c("\t", "\n", "\v", "\f", "\r", " ", "\205", "\240") dput(grep('\\p{p}', ch, value = true, perl = true)) # c("!", "\"", "#", "%", "&", "'", "(", ")", "*", ",", "-", ".", # "/", ":", ";", "?", "@", "[", "\\", "]", "_", "{", "}", "\241", # "\247", "\253", "\266", "\267", "\273", "\277")
or define own, etc
dput(grep('[\x20-\x7e]', ch, value = true)) dput(grep('[a-c]', ch, value = true))
Comments
Post a Comment