Word Frequency in News Articles
Hello everyone. I was wondering if anyone could point me in the right direction. I am trying to write a Java method which will take, as an argument, a string, 'searchText' and an array of strings, 'keyWords'.
It will then open a URL connection to google news or some other news website and perform a search using the given 'searchText' string. Then, it will open the first 50 news articles returned and count the number of times each of the 'keyWords' strings show up in each article.
I am fairly experienced with Java programming for local applications, but I have never tried to do anything extensive with web access.
Basically, I feel capable of opening a URL connection to the website, but I'm not sure how to:
1. Execute the search
2. Open/download the contents of each returned web page
3. Isolate the main body of the article from the banners, sidebars, comments, etc.
Once I have the raw content from the articles, it should be a breeze.
Can anyone point me in the right direction or give me some pointers? I would really appreciate it.
Re: Word Frequency in News Articles