i have simple scraper running. trying scrape search results letter q sam.gov:
from selenium import webdriver selenium.webdriver.common.by import selenium.webdriver.support.ui import webdriverwait selenium.webdriver.support import expected_conditions ec bs4 import beautifulsoup import re import sys reload(sys) sys.setdefaultencoding('utf8') letter = 'q' driver = webdriver.phantomjs() driver.set_window_size(1120, 550) driver.get("http://sam.gov") #element = webdriverwait(driver, 10).until( # ec.presence_of_element_located((by.id, "pbg220e071f_2de75f_2d417d_2d9c61_2d027d324c8fec:_viewroot:j_id12:search1")) # ) #element.click() driver.find_element_by_id('pbg220e071f_2de75f_2d417d_2d9c61_2d027d324c8fec:_viewroot:j_id12:search1').click() driver.find_element_by_id(letter).send_keys(letter) driver.find_element_by_id('regsearchbutton').click() def crawl(): bsobj = beautifulsoup(driver.page_source, "html.parser") tablelist = bsobj.find_all("table", {"class":"width100 menu_header_top_emr"}) tdlist = bsobj.find_all("td", {"class":"menu_header width100"}) table in tablelist: item = table.find_all("span", {"class":"results_body_text"}) print item[0].get_text().strip() + ', ' + item[1].get_text().strip() if driver.find_element_by_id('anch_16'): crawl() driver.find_element_by_id('anch_16').click() print "going next page" else: crawl() print "done last page" driver.quit()
when run gives weird error bothering me:
traceback (most recent call last):
file "save.py", line 22, in <module> driver.find_element_by_id('pbg220e071f_2de75f_2d417d_2d9c61_2d027d324c8fec:_viewroot:j_id12:search1').click() file "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 269, in find_element_by_id return self.find_element(by=by.id, value=id_) file "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 752, in find_element 'value': value})['value'] file "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute self.error_handler.check_response(response) file "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.nosuchelementexception: message: {"errormessage":"unable find element id 'pbg220e071f_2de75f_2d417d_2d9c61_2d027d324c8fec:_viewroot:j_id12:search1'","request":{"headers":{"accept":"application/json","accept-encoding":"identity","connection":"close","content-length":"153","content-type":"application/json;charset=utf-8","host":"127.0.0.1:40423","user-agent":"python-urllib/2.7"},"httpversion":"1.1","method":"post","post":"{\"using\": \"id\", \"sessionid\": \"eb7dfa50-70a7-11e6-b125-9ff4e2dbd485\", \"value\": \"pbg220e071f_2de75f_2d417d_2d9c61_2d027d324c8fec:_viewroot:j_id12:search1\"}","url":"/element","urlparsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userinfo":"","authority":"","protocol":"","source":"/element","querykey":{},"chunks":["element"]},"urloriginal":"/session/eb7dfa50-70a7-11e6-b125-9ff4e2dbd485/element"}} screenshot: available via screen
i have since tried using implicit wait of 60 right after initialize browser. no luck
i have tried webdriverwait (commented out in code right below driver.get("http://sam.gov")
, , gave me @ timeoutexception.
the weird thing if print driver.page_source
right after call, source fine , contains following code contains element id searching for. there no frame or iframe either.
<a id="pbg220e071f_2de75f_2d417d_2d9c61_2d027d324c8fec:_viewroot:j_id12:search1" href="#" title="search records" onclick="if(typeof jsfcljs == 'function'){jsfcljs(document.getelementbyid('pbg220e071f_2de75f_2d417d_2d9c61_2d027d324c8fec:_viewroot:j_id12'),{'pbg220e071f_2de75f_2d417d_2d9c61_2d027d324c8fec:_viewroot:j_id12:search1':'pbg220e071f_2de75f_2d417d_2d9c61_2d027d324c8fec:_viewroot:j_id12:search1'},'');}return false" class="button">
id locator of element looks dynamically generated, should try different locator.
you can try using css_selector
below:-
driver.find_element_by_css_selector("a.button[title='search records']").click()
or using webdriverwait
as:-
element = webdriverwait(driver, 10).until(ec.presence_of_element_located((by.css_selector, "a.button[title='search records']"))) element.click()
note :- before finding element make sure it's not inside frame/iframe
. if it's inside frame/iframe
need switch frame/iframe
before finding element driver.switch_to_frame("frame/iframe id or name")
Comments
Post a Comment