parsing - Java reading information from a website using Jsoup -


i've gone through multiple posts parsing , such. of responses saw recommending person use library or else. problem right creating algorithm fetch exact information want. purpose fetch 2 statuses weather website school closings. started using jsoup recommended need it.

webpage: click here

image: click here

example of webpage source: click here

i figure out how line of text within webpage since know name of school im looking for, 2 lines down status need. easy if each school had status either closed or two-hour delay can't make search that. want ideas or answers on how can apporach this. going 2 times because wanting 2 schools. have names can use them need status.

here example of want do. (sudo code)

document doc = connect(to url); element schoolname1 = doc.lookfortext(htmllinehere/schoolname);  string status1 = schoolname.getnext().text();//suppose gets line right after should status , cleans off html. 

this have right now

public static schoolclosing lookupdebug() throws ioexception {         final arraylist<string> status = new arraylist<string>();          try {             //connects wanted website             document doc = jsoup.connect("http://www.10tv.com/content/sections/weather/closings.html").get();             //selects/fetches line of code want             element schoolname = doc.html("<td valign="+"top"+">athens city schools</td>");             //an array of strings going add text need when             final arraylist<string> temp = new arraylist<string>();             //checking if fetching text             system.out.println(schoolname.text());             //add text array             temp.add(schoolname.text());             (int = 0; <= 1; i++) {                 final string[] tempstatus = temp.get(i).split(" ");                 status.add(tempstatus[0]);             }         } catch (final ioexception e) {             throw new ioexception("there problem loading school closing status");         }         return new schoolclosing(status);     } 

document doc = jsoup.connect(         "http://www.10tv.com/content/sections/weather/closings.html")         .get(); (element tr : doc.select("#closings tr")) {     element tds = tr.select("td").first();     if (tds != null) {         string county = tr.select("td:eq(0)").text();         string schoolname = tr.select("td:eq(1)").text();         string status = tr.select("td:eq(2)").text();         system.out.println(string.format(                 "county: %s, schoolname: %s, status: %s", county,                 schoolname, status));     } } 

output:

county: athens, schoolname: beacon school, status: two-hour delay county: franklin, schoolname: city of grandview heights, status: snow emergency through 8pm thursday county: franklin, schoolname: electrical trades center, status: evening activities cancelled county: franklin, schoolname: hilock fellowship church, status: pm services cancelled county: franklin, schoolname: international christian center, status: evening activities cancelled county: franklin, schoolname: maranatha baptist church, status: pm services cancelled county: franklin, schoolname: masters commission new covenant church, status: bible study cancelled county: franklin, schoolname: new life christian fellowship, status: activities cancelled county: franklin, schoolname: epilepsy foundation of central ohio, status: evening activities cancelled county: franklin, schoolname: washington ave united methodist church, status: evening activities cancelled 

or in loop:

for (element tr : doc.select("#closings tr")) {     system.out.println("----------------------");     (element td : tr.select("td")) {         system.out.println(td.text());     } } 

that gives:

---------------------- athens beacon school two-hour delay ---------------------- franklin city of grandview heights snow emergency through 8pm thursday ---------------------- franklin electrical trades center evening activities cancelled ... 

Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

node.js - StackOverflow API not returning JSON -