web scraping - stumped on how to scrape the data from this site (using R) -


i trying scrape data, using r, site: http://www.soccer24.com/kosovo/superliga/results/#

i can following:

library(rvest) doc <- html("http://www.soccer24.com/kosovo/superliga/results/") 

but stumped on how axtually data. because actual data on website seems generated javascript. can is

html_text(doc) 

but gives long blurp of weird text (which include data, interspersed odd code , it's not @ clear how parse that.

what want extract match data (date, time, teams, result) of matches. no other data needed site.

can provide hints how extract data site?

using selenium phantomjs

library(rselenium) pjs <- phantom() remdr <- remotedriver(browsername = "phantomjs") appurl <- "http://www.soccer24.com/kosovo/superliga/results/#" remdr$open() remdr$navigate(appurl) 

if want press more data button until not visible (all matches presumed showing):

webelem <- remdr$findelement("css", "#tournament-page-results-more a") while(webelem$iselementdisplayed()[[1]]){   webelem$clickelement()   sys.sleep(5)   webelem <- remdr$findelement("css", "#tournament-page-results-more a") } doc <- htmlparse(remdr$getpagesource()[[1]]) 

remove unwanted round data , use xml::readhtmltable simplicity

# remove unwanted rounds html. there end of season games. # these presented in seperate table. invisible(doc["//table/*/tr[@class='event_round']", fun = removenodes]) appdata <- readhtmltable(doc, = seq(length(doc["//table"])-1), stringsasfactors = false, trim = true) if(!is.data.frame(appdata)){appdata <- do.call(rbind, appdata)} row.names(appdata) <- null names(appdata) <- c("blank", "date", "hteam", "ateam", "score") pjs$stop() > head(appdata) blank         date           hteam            ateam score 1       01.04. 18:00     ferronikeli          ferizaj 4 : 0 2       01.04. 18:00          istogu         hajvalia 2 : 1 3       01.04. 18:00 kosova vushtrri trepca mitrovice 1 : 0 4       01.04. 18:00       prishtina          drenica 3 : 0 5       31.03. 18:00       besa peje            drita 1 : 0 6       31.03. 18:00       trepca 89       vellaznimi 2 : 0  > tail(appdata)     blank         date            hteam     ateam score 115       17.08. 22:00        besa peje trepca 89 3 : 3 116       17.08. 22:00      ferronikeli  hajvalia 2 : 5 117       17.08. 22:00 trepca mitrovice   ferizaj 1 : 0 118       17.08. 22:00       vellaznimi   drenica 2 : 1 119       16.08. 22:00  kosova vushtrri     drita 0 : 1 120       16.08. 22:00        prishtina    istogu 2 : 1 

carry out further formatting needed.


Comments