Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 2 of 2

Thread: How to read URLs from a web page

  1. #1
    Junior Member
    Join Date
    Sep 2009
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default How to read URLs from a web page

    Dear Members,

    I want to know are there any methods to read the URL of various links of a web page. To give you clarity, I wish to give you a realistic example. In the web site, "http://in.yahoo.com", there are many links in the form of Finance, Games, Life Style, News etc.

    If you place the mouse pointer on any of those links, you can see the URL associated with it in the status bar at the bottom of the web browser. For instance, if you place the mouse pointer on the link "Games", you can see the URL "http://in.yahoo.com/r/ygms" displayed on the status bar at the bottom of the IE browser.

    Similarly, if you place mouse pointer on the link of "Life Style", you can see the URL "http://in.yahoo.com/lfs" displayed on status bar at the bottom of the IE browser. In that web page, there are so many such links available.

    My wish is that I want to write a Java Program (something like public class GrabURLs) that takes the URL of any web page (not necessarily, "http://in.yahoo.com", it can be any web page) in its constructor. From the URL which is passed in the constructor, the program has to find whether any links are available in that web page; if so, the program should grab all the links contained in that page in the form of String array or Vector.

    For example, I write code something like :
    GrabURLs webPage = new GrabURLs("http://in.yahoo.com");
    String[] links = webPage.getLinks();

    The links array is supposed to contain elements such as links[0] = "http://in.yahoo.com/r/ygms", links[1] = "http://in.yahoo.com/lfs" and so on.

    Now what source code will I write for the method getLinks() of the class GrabURLs.

    I would be delighted if someone gives a solution. The solution need not be a full Java program; at least, I want to know what are all the classes and methods involved to achieve this challenging task.

    With best regards,

    Abitha.


  2. #2
    Senile Half-Wit Freaky Chris's Avatar
    Join Date
    Mar 2009
    Location
    Wales, Bangor & England, Warwickshire
    Posts
    820
    My Mood
    Cynical
    Thanks
    7
    Thanked 104 Times in 90 Posts

    Default Re: How to read URLs from a web page

    import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.net.MalformedURLException;
    import java.net.URL;
    import java.util.ArrayList;
     
     
    public class GrabURLs {
    	private URL url;
     
    	public static void main(String[] args){
     
    		GrabURLs gu = new GrabURLs("http://in.yahoo.com");
     
    		ArrayList<String> AL = gu.getLinks();
    		for(String line : AL){
    			System.out.println(line);
    		}
     
    	}
     
    	public GrabURLs(String urls){
    		try {
    			url = new URL(urls);
    		} catch (MalformedURLException e) {
    			e.printStackTrace();
    		}
    	}
     
    	public ArrayList<String> getLinks(){
    		BufferedReader urlIn = null;
    		ArrayList<String> links = new ArrayList<String>();
     
    		try {
    			urlIn = new BufferedReader(new InputStreamReader(url.openStream()));
    		} catch (IOException e) {
    			e.printStackTrace();
    		}
     
    		String s = null, t;
     
    		try {
    			while( ( t = urlIn.readLine()) != null){
    				s += t;
    			}
    		} catch (IOException e) {	
    			e.printStackTrace();
    		}
     
    		String baseHREF = null;
     
    		baseHREF = s.substring(s.indexOf("<base href=") + 12 , s.indexOf("<base href=") + 12 + s.substring(s.indexOf("<base href=") + 12).indexOf("\""));
     
    		System.out.println(baseHREF);
     
    		while(s.indexOf("<a href=") != -1){
    			links.add(s.substring(s.indexOf("<a href=") + 9 , s.indexOf("<a href=") + 9 + s.substring(s.indexOf("<a href=") + 9).indexOf((s.substring(s.indexOf("<a href=") + 8, s.indexOf("<a href=") + 9).equals("'")) ? "'" : "\"")));
    			s = s.substring(s.indexOf("<a href=") + 9 + s.substring(s.indexOf("<a href=") + 9).indexOf("\""));
    		}
     
    		return links;
    	}
    }

    It's not perfect but you get the idea

    Regards,
    Chris
    chris[at]javaprogrammingforums[dot]com

    Prifysgol Bangor University, North Wales

Similar Threads

  1. Java program to open jsp page on client side instead of server side
    By khanshakil in forum JavaServer Pages: JSP & JSTL
    Replies: 1
    Last Post: July 8th, 2009, 07:26 AM
  2. How to Read a Portion of a File in Java?
    By jazz2k8 in forum File I/O & Other I/O Streams
    Replies: 3
    Last Post: July 7th, 2009, 05:16 PM
  3. Replies: 1
    Last Post: June 21st, 2009, 12:05 PM
  4. Saving .jsp page as .pdf file while generating report for struts based web application
    By ravindra_kumar_tiwari in forum JavaServer Pages: JSP & JSTL
    Replies: 3
    Last Post: August 12th, 2008, 10:32 AM
  5. How to read an XML document in Java with DOM Parse?
    By JavaPF in forum File Input/Output Tutorials
    Replies: 0
    Last Post: May 20th, 2008, 08:04 AM