html - Using Beautiful soup to analyze table in python -


so i've got table:

<table border="1" style="width: 100%">   <caption></caption>   <col>   <col>   <tbody> <tr>   <td>pig</td>   <td>house type</td> </tr> <tr>   <td>pig a</td>   <td>straw</td> </tr> <tr>   <td>pig b</td>   <td>stick</td> </tr> <tr>   <td>pig c</td>   <td>brick</td> </tr> 

and trying return json string of table pairs so:

[["pig a", "straw"], ["pig b", "stick"], ["pig c", "brick"]] 

however, code can't seem rid of html tags:

stable = soup.find('table')  cells = [ ] rows = stable.findall('tr') tr in rows[1:4]:     # process body of table     row = []     td = tr.findall('td')     #td = [el.text el in soup.tr.finall('td')]     row.append( td[0])     row.append( td[1])     cells.append( row )   return cells 

#eventually, i'd this: #h = json.dumps(cells) #return h

my output this:

[[<td>pig a</td>, <td>straw</td>], [<td>pig b</td>, <td>stick</td>], [<td>pig c</td>, <td>brick</td>]]

use text property inner text of element:

row.append(td[0].text) row.append(td[1].text) 

Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

node.js - StackOverflow API not returning JSON -