html - extract class name from tag beautifulsoup python -
i have following html code:
<td class="image"> <a href="/target/tt0111161/" title="target text 1"> <img alt="target img" height="74" src="img src url" title="image title" width="54"/> </a> </td> <td class="title"> <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161"> </span> <a href="/target/tt0111161/"> other text </a> <span class="year_type"> (2013) </span>
i trying use beautiful soup parse elements tab-delimited file. got great , have:
for td in soup.select('td.title'): span = td.select('span.wlb_wrapper') if span: print span[0].get('data-tconst') # `tt0082971`
now want "target text 1" .
i've tried things above text such as:
for td in soup.select('td.image'): #trying select <td class="image"> tag img = td.select('a.title') #from inside td try inside tag has word title if img: print img[2].get('title') #if finds anything, want return text in class 'title'
if you're trying different td based on class (i.e. td class="image" , td class="title" can use beautiful soup dictionary different classes.
this find td class="image" in table.
from bs4 import beautifulsoup page = """ <table> <tr> <td class="image"> <a href="/target/tt0111161/" title="target text 1"> <img alt="target img" height="74" src="img src url" title="image title" width="54"/> </a> </td> <td class="title"> <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161"> </span> <a href="/target/tt0111161/"> other text </a> <span class="year_type"> (2013) </span> </td> </tr> </table> """ soup = beautifulsoup(page) tbl = soup.find('table') rows = tbl.findall('tr') row in rows: cols = row.find_all('td') col in cols: if col.has_key('class') , col['class'][0] == 'image': hrefs = col.find_all('a') href in hrefs: print href.get('title') elif col.has_key('class') , col['class'][0] == 'title': spans = col.find_all('span') span in spans: if span.has_key('class') , span['class'][0] == 'wlb_wrapper': print span.get('data-tconst')
Comments
Post a Comment