How to Grab the HTML source code of a website URL index page?

**JavaPF** · December 1st, 2008, 10:29 AM

This code will grab the HTML source from a given URL.

Change "website here.com" to a real URL starting with http:// and the program will display the index pages source code in the console.

The nice thing about this code is it spoofs the connection to make it look like its a web browser.
This enables you to navigate to sites like google that normally block connections from non web browser applications.

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
 
public class GrabHTML {
 
 public static void Connect() throws Exception{
 
  //Set URL
  URL url = new URL("http://website here.com");
  URLConnection spoof = url.openConnection();
 
  //Spoof the connection so we look like a web browser
  spoof.setRequestProperty( "User-Agent", "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
  BufferedReader in = new BufferedReader(new InputStreamReader(spoof.getInputStream()));
  String strLine = "";
 
  //Loop through every line in the source
  while ((strLine = in.readLine()) != null){
 
   //Prints each line to the console
   System.out.println(strLine);
  }
 
  System.out.println("End of page.");
 }
 
 public static void main(String[] args){
 
  try{
   //Calling the Connect method
   Connect();
  }catch(Exception e){
 
  }
 }
}

**sohum** · January 7th, 2010, 01:02 PM

I have found an interesting use for this code... my problem is that I need to be logged into the site first. Is there any way to pass the cookie information when opening the connection?

I can probably do it with some javascript and a webpage but would rather have it in one app.

Thanks for the source,
Jason

**copeg** · January 7th, 2010, 01:36 PM

Use the methods (in URLConnection) getHeaderField to retrieve the cookie, and setRequestProperty to set the cookie. Its a bit more complex than just that, so see the following link for a more detailed description: Handling Cookies Using the java.net API

**sohum** · January 7th, 2010, 02:49 PM

Thanks for the help.

**Duff** · February 3rd, 2010, 12:37 PM

This is really good, and think it might help me with the application I want to build.

How could I use this to only pull some information out of a site? for example to filter everything apart from prices of items?
Thanks
Duff

**Json** · February 4th, 2010, 03:25 AM

You would have to grab the lot and then parse through the source code somehow, maybe a simple regex will work for you.

// Json

**Bryan** · April 22nd, 2010, 02:46 PM

Very nice JavaPF, very nice!

Together with copegs url and 'login by http post' this is perfect for me

Thread: How to Grab the HTML source code of a website URL index page?

LinkBack

Thread Tools

Display

How to Grab the HTML source code of a website URL index page?

The Following 2 Users Say Thank You to JavaPF For This Useful Post:

Related threads:

Re: How to Grab the HTML source of a website URL

Re: How to Grab the HTML source of a website URL

Re: How to Grab the HTML source of a website URL

Re: How to Grab the HTML source of a website URL

Re: How to Grab the HTML source of a website URL

Re: How to Grab the HTML source of a website URL

Similar Threads

Source code for Email address book/contacts importer

[SOLVED] Books and sources for Java beginners

How to open external link from another site of my homepage