i have 2 different for
loops run same number of times , produce string @ each iteration. (i scraping html file) want string first loop merge/concatenate/append string second loop each iteration ( tricky part )here code have:
from bs4 import beautifulsoup bsobj = beautifulsoup(open("samfull.html"), "html.parser") tablelist = bsobj.find_all("table", {"class":"width100 menu_header_top_emr"}) tdlist = bsobj.find_all("td", {"class":"menu_header width100"}) table in tablelist: first_part_of_row_string = '' item = table.find_all("span", {"class":"results_body_text"}) in range(len(item)): first_part_of_row_string += (item[i].get_text().strip() + ", ") td in tdlist: second_part_of_row_string = '' items = td.find_all("span", {"class":"results_body_text"}) in range(len(items)): second_part_of_row_string += (items[i].get_text().strip() + ", ")
to give example:
sample results for table in tablelist
loop are:
a,b, 1,2, father, mother,
and for td in tdlist
loop are:
c, d, e, 3, 4, 5, son, daughter, twin,
i want combine first_part_of_row_string
of each iteration second_part_of_row_string
of each iteration well
so want print out this:
a, b, c, d, e, 1, 2, 3, 4, 5 father, mother, son, daughter, twin,
which first_part_of_row_string + second_part_of_row_string
of each iteration of both loops
the length of tablelist , tdlist same both loops return same number of rows. have in 1 loop if td in same table being referred in tablelist, unfortunately not. in html table class specified in tablelist definition followed table has no class contains td class specified in tdlist. sample occurrence of html included below. whole page thousands of line, putting on seperate link.link
<table cellspacing="0" cellpadding="0" style="margin-left: auto; margin-right: auto;" class="width100 menu_header_top_emr"> <tbody> <tr> <td style="width:80px;"> <div style="width:70px;background-color:#b2ee98; border:1px solid grey; padding:2px 5px 2px 5px; text-align:center;">entity</div> </td> <td style="padding-left:5px;"> <span class="results_body_text"><h5 style="vertical-align: middle;">rascal x-press, inc.</h5></span> </td> <td style="width:130px;"> <div class="right"> <span class="results_title_text">status:</span> <span class="results_body_text"> submitted </span> </div> </td> <td style="width:22px;"> <a href="" class="more_duns_link_emr right" style="display: inline;"><img id="more_duns_link_emr" src="/samsearch/styles/img/expand-small-blue.png" style="padding:8px 8px 8px 2px;" alt="expand search result rascal x-press, inc."></a> <a href="" class="hide_duns_link_emr off right" style="display: none;"><img id="hide_duns_link_emr" src="/samsearch/styles/img/collapse-small-blue.png" style="padding:8px 8px 8px 2px;" alt="collapse search result rascal x-press, inc."></a> </td> </tr> </tbody> </table> <table> <tbody> <tr> <td class="menu_header width100"> <table> <tr> <td style="width:25%;"> <span class="results_title_text">duns:</span> <span class="results_body_text"> 012361296</span> </td> <td style="width:25%;"> </td> <!-- label cage when territory listed country --> <td style="width:27%;"> <span class="results_title_text">cage code:</span> <span class="results_body_text"></span> </td> <td style="width:15%" rowspan="2"> <input type="button" value="view details" title="view details rascal x-press, inc." class="center" style="height:25px; width:90px; vertical-align:middle; margin:7px 3px 7px 3px;" onclick="viewentry('4420848', '1472652382619')" /> </td> </tr> <tr> <td colspan="2"> <span class="results_title_text">has active exclusion?: </span> <span class="results_body_text"> no </span> </td> <td> <span class="results_title_text">dodaac:</span> <span class="results_body_text"></span> </td> </tr> <tr> <td colspan="2"> <span class="results_title_text">expiration date:</span> <span class="results_body_text"> </span> </td> <td colspan="2"><span class="results_title_text">delinquent federal debt?</span> <span class="results_body_text"> no </span> </td> </tr> <tr> <td colspan="2"><span class="results_title_text">purpose of registration:</span> <span class="results_body_text"> federal assistance awards </span> </td> </tr> </table> <div class="off_duns_emr" style="display: none;"> <table class="resultbox1 menu_header width100" style="margin-left: auto; margin-right: auto;" cellpadding="2"> <tbody> <tr> <td colspan="3"><span class="results_title_text">address:</span> <span class="results_body_text">1372 state hwy 37</span></td> </tr> <tr> <td style="width:212px;"><span class="results_title_text">city:</span> <span class="results_body_text">west frankfort</span></td> <td style="width:200px;"><span class="results_title_text">state/province:</span> <span class="results_body_text">il</span></td> </tr> <tr> <td style="width:130px;"><span class="results_title_text">zip code:</span> <span class="results_body_text">62896-5007</span></td> <td style="width:200px;"><span class="results_title_text">country:</span> <span class="results_body_text">united states</span></td> </tr> </tbody> </table> </div> </td> </tr> </tbody> </table></td> </tr> </tbody> </table> </li> </td> </tr> </table>
there lots of ways you're asking for, here's simple one:
tablelist = [ ["a", "b"], ["1", "2"], ["father", "mother"] ] tdlist = [ ["c", "d", "e"], ["3", "4", "5"], ["son", "daughter", "twin"] ] len_list = max(len(tablelist), len(tdlist)) in range(len_list): print ", ".join(tablelist[i] + tdlist[i])
Comments
Post a Comment