I am attempting to build a fantasy football web site, I am however not having much luck developing a way to extract data, from a web site to retrieve football stats. I can pull in a web site, and get to the point where the QB stats start. My question is there a way to strip all the html tags and just be left with the players name and numbers(stats)? I skip down 366 lines to get to the example below. Was thinking maybe, to tokenize using > as the delimiiter, then tokenizing again to strip away the rest of the tag using < as the delimiter. Any suggestions are much appreciated, thanks.
CODE
---------------------------------------------------------------------
import java.net.*;
import java.io.*;
public class WebRipper
{
public static void main(String[] argv)
{
int count = 0;
try {
// Create a URL for the desired page
URL url = new URL("http://www.fftoday.com/stats/playerstats.php?Season=2006&GameWeek=1&PosID=10&LeagueID=1");
// Read all the text returned by the server
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null)
{
if(count > 366)
{
System.out.println(str);
}// end if
count ++;
}// end while
in.close();
}// end try
catch (MalformedURLException e) {}
catch (IOException e) {}
}// end main
}// end WebRipper
---------------------------------------------------------------------
OUTPUT(this is just a example of the first few lines of output)
<TD CLASS="sort1" ALIGN="LEFT" BGCOLOR="#ffffff"> 1. <A HREF="playerprofile.php?PlayerID=1607&LeagueID=1">Donovan McNabb</A></TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">PHI</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">1</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">1</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">24</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">35</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">314</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">3</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">1</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">4</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">7</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">0</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#e0e0e0">28.4</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">28.4</TD>
</TR>
CODE
---------------------------------------------------------------------
import java.net.*;
import java.io.*;
public class WebRipper
{
public static void main(String[] argv)
{
int count = 0;
try {
// Create a URL for the desired page
URL url = new URL("http://www.fftoday.com/stats/playerstats.php?Season=2006&GameWeek=1&PosID=10&LeagueID=1");
// Read all the text returned by the server
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null)
{
if(count > 366)
{
System.out.println(str);
}// end if
count ++;
}// end while
in.close();
}// end try
catch (MalformedURLException e) {}
catch (IOException e) {}
}// end main
}// end WebRipper
---------------------------------------------------------------------
OUTPUT(this is just a example of the first few lines of output)
<TD CLASS="sort1" ALIGN="LEFT" BGCOLOR="#ffffff"> 1. <A HREF="playerprofile.php?PlayerID=1607&LeagueID=1">Donovan McNabb</A></TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">PHI</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">1</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">1</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">24</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">35</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">314</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">3</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">1</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">4</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">7</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">0</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#e0e0e0">28.4</TD>
<TD CLASS="sort1" ALIGN="center" BGCOLOR="#ffffff">28.4</TD>
</TR>