i'm trying read table on site:
http://spacefem.com/pregnant/due.php?use=edd&m=09&d=10&y=16
i use rvest
, error:
library(rvest) read_html("http://spacefem.com/pregnant/due.php?use=edd&m=09&d=10&y=16")
error: name spoiler:3tbt4d3m not xml namespace compliant [202]
what error mean, , there can around it?
i've gotten far pinpointing internal function causing error: xml2:::doc_parse_raw
. however, xml2:::doc_parse_raw
call internal c code, making debugging of issue substantially more difficult.
another option use htmltidy
(need use v0.3.0 or higher means—as of date of answer—using development version vs cran version until cran 0.3.0+) "clean" document:
library(rvest) library(htmltidy) # devtools::install_github("hrbrmstr/htmltidy") library(httr) url <- "http://spacefem.com/pregnant/due.php?use=edd&m=09&d=10&y=16" # site not returning content me w/o more browser-like user agent res <- get(url, user_agent("mozilla/5.0 (linux; android 6.0; nexus 5 build/mra58n) applewebkit/537.36 (khtml, gecko) chrome/46.0.2490.76 mobile safari/537.36")) cleaned <- tidy_html(content(res, as="text", encoding="utf-8"), list(tidydoctype="html5")) pg <- read_html(cleaned)
Comments
Post a Comment