i trying scrape data, using r, site: http://www.soccer24.com/kosovo/superliga/results/#
i can following:
library(rvest) doc <- html("http://www.soccer24.com/kosovo/superliga/results/")
but stumped on how axtually data. because actual data on website seems generated javascript. can is
html_text(doc)
but gives long blurp of weird text (which include data, interspersed odd code , it's not @ clear how parse that.
what want extract match data (date, time, teams, result) of matches. no other data needed site.
can provide hints how extract data site?
using selenium
phantomjs
library(rselenium) pjs <- phantom() remdr <- remotedriver(browsername = "phantomjs") appurl <- "http://www.soccer24.com/kosovo/superliga/results/#" remdr$open() remdr$navigate(appurl)
if want press more data button until not visible (all matches presumed showing):
webelem <- remdr$findelement("css", "#tournament-page-results-more a") while(webelem$iselementdisplayed()[[1]]){ webelem$clickelement() sys.sleep(5) webelem <- remdr$findelement("css", "#tournament-page-results-more a") } doc <- htmlparse(remdr$getpagesource()[[1]])
remove unwanted round data , use xml::readhtmltable
simplicity
# remove unwanted rounds html. there end of season games. # these presented in seperate table. invisible(doc["//table/*/tr[@class='event_round']", fun = removenodes]) appdata <- readhtmltable(doc, = seq(length(doc["//table"])-1), stringsasfactors = false, trim = true) if(!is.data.frame(appdata)){appdata <- do.call(rbind, appdata)} row.names(appdata) <- null names(appdata) <- c("blank", "date", "hteam", "ateam", "score") pjs$stop() > head(appdata) blank date hteam ateam score 1 01.04. 18:00 ferronikeli ferizaj 4 : 0 2 01.04. 18:00 istogu hajvalia 2 : 1 3 01.04. 18:00 kosova vushtrri trepca mitrovice 1 : 0 4 01.04. 18:00 prishtina drenica 3 : 0 5 31.03. 18:00 besa peje drita 1 : 0 6 31.03. 18:00 trepca 89 vellaznimi 2 : 0 > tail(appdata) blank date hteam ateam score 115 17.08. 22:00 besa peje trepca 89 3 : 3 116 17.08. 22:00 ferronikeli hajvalia 2 : 5 117 17.08. 22:00 trepca mitrovice ferizaj 1 : 0 118 17.08. 22:00 vellaznimi drenica 2 : 1 119 16.08. 22:00 kosova vushtrri drita 0 : 1 120 16.08. 22:00 prishtina istogu 2 : 1
carry out further formatting needed.
Comments
Post a Comment