tag:blogger.com,1999:blog-5718736365000160223.post2834832167917880237..comments2024-01-03T07:49:59.707-08:00Comments on Statistics et al.: R Package Spotlight - stringrJack Davishttp://www.blogger.com/profile/16675781493059962337noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-5718736365000160223.post-653800310527329392015-02-02T00:31:44.685-08:002015-02-02T00:31:44.685-08:00Hi Sharon,
Thanks for the tip about rvest! I can ...Hi Sharon,<br /><br />Thanks for the tip about rvest! I can see now how that would work a lot better in a case like this where the data is already compiled into an HTML table with its own labels.<br /><br />My experience with cricinfo is mostly with the play-by-play commentary, which is less structured, so I hadn't even considered an HTML-based scraping approach. Thank you!<br />Jack Davishttps://www.blogger.com/profile/16675781493059962337noreply@blogger.comtag:blogger.com,1999:blog-5718736365000160223.post-21263081158884014432015-02-01T21:10:07.581-08:002015-02-01T21:10:07.581-08:00Useful explainer of the strinr package, but FYI th...Useful explainer of the strinr package, but FYI this task is a lot easier with Hadley Wickham's rvest package and the incredibly useful Selectorgadget browser add-on. Four lines of code do the trick:<br /><br />library(rvest)<br />pagehtml <- html("http://www.espncricinfo.com/ci/engine/match/287876.html")<br />playerhtml <- html_nodes(pagehtml, ".to-bat .playerName , .batsman-name .playerName")<br />playernames <- html_text(playerhtml)<br /><br />Here's an explanation of how to use Selectorgadget to get the CSS page selectors you want and then how to use those in rvest: http://cran.r-project.org/web/packages/rvest/vignettes/selectorgadget.htmlSharonhttps://www.blogger.com/profile/08824851988700682739noreply@blogger.com