parsing - Java reading information from a website using Jsoup -
i've gone through multiple posts parsing , such. of responses saw recommending person use library or else. problem right creating algorithm fetch exact information want. purpose fetch 2 statuses weather website school closings. started using jsoup recommended need it.
webpage: click here
image: click here
example of webpage source: click here
i figure out how line of text within webpage since know name of school im looking for, 2 lines down status need. easy if each school had status either closed or two-hour delay can't make search that. want ideas or answers on how can apporach this. going 2 times because wanting 2 schools. have names can use them need status.
here example of want do. (sudo code)
document doc = connect(to url); element schoolname1 = doc.lookfortext(htmllinehere/schoolname); string status1 = schoolname.getnext().text();//suppose gets line right after should status , cleans off html.
this have right now
public static schoolclosing lookupdebug() throws ioexception { final arraylist<string> status = new arraylist<string>(); try { //connects wanted website document doc = jsoup.connect("http://www.10tv.com/content/sections/weather/closings.html").get(); //selects/fetches line of code want element schoolname = doc.html("<td valign="+"top"+">athens city schools</td>"); //an array of strings going add text need when final arraylist<string> temp = new arraylist<string>(); //checking if fetching text system.out.println(schoolname.text()); //add text array temp.add(schoolname.text()); (int = 0; <= 1; i++) { final string[] tempstatus = temp.get(i).split(" "); status.add(tempstatus[0]); } } catch (final ioexception e) { throw new ioexception("there problem loading school closing status"); } return new schoolclosing(status); }
document doc = jsoup.connect( "http://www.10tv.com/content/sections/weather/closings.html") .get(); (element tr : doc.select("#closings tr")) { element tds = tr.select("td").first(); if (tds != null) { string county = tr.select("td:eq(0)").text(); string schoolname = tr.select("td:eq(1)").text(); string status = tr.select("td:eq(2)").text(); system.out.println(string.format( "county: %s, schoolname: %s, status: %s", county, schoolname, status)); } }
output:
county: athens, schoolname: beacon school, status: two-hour delay county: franklin, schoolname: city of grandview heights, status: snow emergency through 8pm thursday county: franklin, schoolname: electrical trades center, status: evening activities cancelled county: franklin, schoolname: hilock fellowship church, status: pm services cancelled county: franklin, schoolname: international christian center, status: evening activities cancelled county: franklin, schoolname: maranatha baptist church, status: pm services cancelled county: franklin, schoolname: masters commission new covenant church, status: bible study cancelled county: franklin, schoolname: new life christian fellowship, status: activities cancelled county: franklin, schoolname: epilepsy foundation of central ohio, status: evening activities cancelled county: franklin, schoolname: washington ave united methodist church, status: evening activities cancelled
or in loop:
for (element tr : doc.select("#closings tr")) { system.out.println("----------------------"); (element td : tr.select("td")) { system.out.println(td.text()); } }
that gives:
---------------------- athens beacon school two-hour delay ---------------------- franklin city of grandview heights snow emergency through 8pm thursday ---------------------- franklin electrical trades center evening activities cancelled ...
Comments
Post a Comment