html - extract class name from tag beautifulsoup python -

- May 15, 2012

i have following html code:

    <td class="image">       <a href="/target/tt0111161/" title="target text 1">        <img alt="target img" height="74" src="img src url" title="image title" width="54"/>       </a>      </td>      <td class="title">       <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">       </span>       <a href="/target/tt0111161/">        other text       </a>       <span class="year_type">        (2013)       </span>

i trying use beautiful soup parse elements tab-delimited file. got great , have:

for td in soup.select('td.title'):  span = td.select('span.wlb_wrapper')  if span:      print span[0].get('data-tconst') # `tt0082971`

now want "target text 1" .

i've tried things above text such as:

for td in soup.select('td.image'): #trying select <td class="image"> tag img = td.select('a.title') #from inside td try inside tag has word title if img:     print img[2].get('title') #if finds anything, want return text in class 'title'

if you're trying different td based on class (i.e. td class="image" , td class="title" can use beautiful soup dictionary different classes.

this find td class="image" in table.

from bs4 import beautifulsoup  page = """ <table>     <tr>         <td class="image">            <a href="/target/tt0111161/" title="target text 1">             <img alt="target img" height="74" src="img src url" title="image title" width="54"/>            </a>           </td>           <td class="title">            <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">            </span>            <a href="/target/tt0111161/">             other text            </a>            <span class="year_type">             (2013)            </span>         </td>     </tr> </table> """ soup = beautifulsoup(page) tbl = soup.find('table') rows = tbl.findall('tr') row in rows:     cols = row.find_all('td')     col in cols:         if col.has_key('class') , col['class'][0] == 'image':             hrefs = col.find_all('a')             href in hrefs:                 print href.get('title')          elif col.has_key('class') , col['class'][0] == 'title':             spans = col.find_all('span')             span in spans:                 if span.has_key('class') , span['class'][0] == 'wlb_wrapper':                     print span.get('data-tconst')

Search This Blog

KBPS

html - extract class name from tag beautifulsoup python -

Comments

Post a Comment

Popular posts from this blog

node.js - StackOverflow API not returning JSON -

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -