Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 5 of 5

Thread: How to extract text from web

  1. #1
    Junior Member
    Join Date
    Jan 2011
    Posts
    8
    My Mood
    Confused
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Question How to extract text from web

    hi all,
    i wanted to know whats the best way to extract text from a website? and then load it into a database?
    at present i am using a web crawler to get the text from web. is there any other way of doing this?


  2. #2
    Member
    Join Date
    Dec 2010
    Posts
    46
    Thanks
    0
    Thanked 10 Times in 10 Posts

    Default Re: How to extract text from web

    Quote Originally Posted by HelloAll View Post
    hi all,
    i wanted to know whats the best way to extract text from a website?
    you can see an example here

  3. #3
    Junior Member
    Join Date
    Jan 2011
    Posts
    8
    My Mood
    Confused
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Default Re: How to extract text from web

    Thanks for the example.
    But i need to crawl text not just from given url..but from url's within a given website(url).

  4. #4
    Member
    Join Date
    Dec 2010
    Posts
    46
    Thanks
    0
    Thanked 10 Times in 10 Posts

    Default Re: How to extract text from web

    well, once you can work with getting the text from 1 url, you can parse the text, search for further links, and then do a url connection to get contents from those links found. you have to do some recursive stuff here.

  5. #5
    mmm.. coffee JavaPF's Avatar
    Join Date
    May 2008
    Location
    United Kingdom
    Posts
    3,336
    My Mood
    Mellow
    Thanks
    258
    Thanked 286 Times in 225 Posts
    Blog Entries
    4

    Default Re: How to extract text from web

    How about this in the code snippers forum:

    http://www.javaprogrammingforums.com...bsite-url.html

    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.net.URL;
    import java.net.URLConnection;
     
    public class GrabHTML {
     
     public static void Connect() throws Exception{
     
      //Set URL
      URL url = new URL("http://website here.com");
      URLConnection spoof = url.openConnection();
     
      //Spoof the connection so we look like a web browser
      spoof.setRequestProperty( "User-Agent", "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
      BufferedReader in = new BufferedReader(new InputStreamReader(spoof.getInputStream()));
      String strLine = "";
     
      //Loop through every line in the source
      while ((strLine = in.readLine()) != null){
     
       //Prints each line to the console
       System.out.println(strLine);
      }
     
      System.out.println("End of page.");
     }
     
     public static void main(String[] args){
     
      try{
       //Calling the Connect method
       Connect();
      }catch(Exception e){
     
      }
     }
    }

    It will grab the HTML source of a webpage. You can then process it as you wish..
    Please use [highlight=Java] code [/highlight] tags when posting your code.
    Forum Tip: Add to peoples reputation by clicking the button on their useful posts.

    Looking for a Java job? Visit - Java Programming Careers

Similar Threads

  1. java program to copy a text file to onother text file
    By francoc in forum File I/O & Other I/O Streams
    Replies: 3
    Last Post: April 23rd, 2010, 03:10 PM
  2. Text in Swing
    By whoismrsaxon in forum AWT / Java Swing
    Replies: 4
    Last Post: March 26th, 2010, 07:22 AM
  3. How to extract a particular element details which has more references ???
    By j_kathiresan in forum Algorithms & Recursion
    Replies: 1
    Last Post: December 31st, 2009, 12:11 AM
  4. how to extract variables,keywords,operator...
    By Nanda in forum Java Theory & Questions
    Replies: 1
    Last Post: November 12th, 2009, 09:19 PM