c# - Null reference exception when try to get link by class in HtmlAgilityPack -
i have asp.net mvc application , html page parse using htmlagilitypack, when try looping elements have next error in foreach: object reference not set instance of object
. code next. know mistake? i'm new using htmlagilitypack.
part of html:
<li class="b-serp-item i-bem" onclick="return {"b-serp-item":{}}"> <i class="b-serp-item__favicon" style="background-position: 0 -0px"></i> <h2 class="b-serp-item__title"> <b class="b-serp-item__number">1</b> <a class="b-serp-item__title-link" href="http://googlescraping.com/google-scraper.php">google</a> </h2> </li>
code
datetime dt = datetime.now; string dtf = string.format("{0:u}", dt); string wp = "page" + dtf + ".html"; htmldocument hd = new htmldocument(); hd.load(wp); string output = ""; foreach (htmlnode node in hd.documentnode.selectnodes("//a[@class='b-serp-item__title-link']")) { output += node.getattributevalue("href", null) + " "; }
html output shared in google drive: https://drive.google.com/file/d/0b3-m-r5ce0gostlzugltt1vbb00/edit?usp=sharing
i ran code 1 slight change, used htmldocument.loadhtml(stringcontents)
instead of htmldocument.load(path)
, works flawlessly.
i suspect code unable find file path. ensure file exists using file.exists(wp)
, consider using qualified path instead of file name using wp = path.getfullpath(wp)
.
or read contents first using string contents = file.readalltext(wp);
grab contents , use loadhtml
method on htmldocument
.
Comments
Post a Comment