i wrote simple spider links of hikes. seems isn't looking @ urls @ scrape site:
[scrapy] info: crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
here's simple spider:
from scrapy.spiders import spider scrapy.selector import selector oregon_hikes_scrapper.items import hikelinkitem endpoints = [ 'from="%27%27peter_iredale%27%27&to=bonney_meadows-hidden_meadows_trail_junction', \ 'from=bonney_meadows-hidden_meadow_trail_junction&to=clatsop_loop_hike', ] class orhikespider(spider): name ='or_hikes' allowed_domains = "oregonhikers.org" start_url = [ "http://www.oregonhikers.org/field_guide/special:allpages&" + l l in endpoints ] def parse(self, response): hikes = selector.xpath('//*[@id="mw-content-text"]/table[2]/tbody/tr[1]/td[1]/div/a') hike in hikes: item = hikelinkitem() item['hike'] = hike.xpath('@title').extract() item['link'] = hike.xpath('@href').extract() yield item
syntax error:
start_urls instead of start_url
Comments
Post a Comment