html - Using Beautiful soup to analyze table in python -
so i've got table:
<table border="1" style="width: 100%"> <caption></caption> <col> <col> <tbody> <tr> <td>pig</td> <td>house type</td> </tr> <tr> <td>pig a</td> <td>straw</td> </tr> <tr> <td>pig b</td> <td>stick</td> </tr> <tr> <td>pig c</td> <td>brick</td> </tr>
and trying return json string of table pairs so:
[["pig a", "straw"], ["pig b", "stick"], ["pig c", "brick"]]
however, code can't seem rid of html tags:
stable = soup.find('table') cells = [ ] rows = stable.findall('tr') tr in rows[1:4]: # process body of table row = [] td = tr.findall('td') #td = [el.text el in soup.tr.finall('td')] row.append( td[0]) row.append( td[1]) cells.append( row ) return cells
#eventually, i'd this: #h = json.dumps(cells) #return h
my output this:
[[<td>pig a</td>, <td>straw</td>], [<td>pig b</td>, <td>stick</td>], [<td>pig c</td>, <td>brick</td>]]
use text
property inner text of element:
row.append(td[0].text) row.append(td[1].text)
Comments
Post a Comment