Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 4 of 4

Thread: HTML Parsing

  1. #1
    Junior Member
    Join Date
    Aug 2010
    Location
    UK
    Posts
    19
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Default HTML Parsing

    Hi folks, I'm trying to parse some HTML using the Jsoup 3rd party library but I seem to be getting a nasty exception everytime I try running the program. (for reasons unbeknown to me)


    import java.io.File;
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
     
     
     
     
     
     
    public class FileHtmlParser{
     
    	public String input;
     
     
    	//constructor
    	public FileHtmlParser(String inputFile){input = inputFile;}
     
     
    	//methods
    	public FileHtmlParser execute(){
     
    		File file = new File(input);
                    System.out.println("The file can be read: " + file.canRead());
     
     
    		Document doc = Jsoup.parse(input, "UTF-8");
     
    		Element content = doc.getElementById("navbar");
    		if(content.hasText()){System.out.println("result is " + content.text());}
    		else System.out.println("nothing!");
     
    		return this;
    	}
     
    }/*endOfClass*/

    public class Prog {
     
     
     
     
    	public static void main(String args []){
     
     
    		FileHtmlParser fhp = new FileHtmlParser("src/resulthtml.html");
    		fhp.execute();
     
     
     
    	}
     
    }

    The HTML file that I'm working with is just a random IMDB page which I downloaded (Airheads (1994) - IMDb) and the message from the console is the following:

    The file can be read: true
    Exception in thread "main" java.lang.NullPointerException
    at FileHtmlParser.execute(FileHtmlParser.java:31)
    at Prog.main(Prog.java:11)



    If anyone can shine some light on this, I would be most grateful.
    Last edited by Bacon n' Logic; March 6th, 2012 at 05:53 PM.


  2. #2
    Super Moderator pbrockway2's Avatar
    Join Date
    Jan 2012
    Posts
    987
    Thanks
    6
    Thanked 206 Times in 182 Posts

    Default Re: HTML Parsing

    Which is line 31 of FileHtmlParser.java?

  3. #3
    Junior Member
    Join Date
    Aug 2010
    Location
    UK
    Posts
    19
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Default Re: HTML Parsing

    That's 31:

    if(content.hasText()){System.out.println("result is " + content.text());

  4. #4
    Super Moderator pbrockway2's Avatar
    Join Date
    Jan 2012
    Posts
    987
    Thanks
    6
    Thanked 206 Times in 182 Posts

    Default Re: HTML Parsing

    You will get a NullPointerException whenever you use a variable, method (, or other expression) as if it had a non null value when it is really null. Commonly such usage involves array access with [] or the "dot" operator. For example:

    String[] foo;
    foo[42] = "?"; // bad []! foo is null
     
    foo = new String[666];
    System.out.println(foo[42].length()); // bad dot! foo[42] is null
     
    foo[42] = "?"
    System.out.println(foo[42].length()); // success!

    In the line "if(content.hasText()){System.out.println("res ult is " + content.text());" the only thing you use the "dot" operator on is content. So my guess would be that content is null. You can check this with:

    Element content = doc.getElementById("navbar");
    System.out.println("About to get the content: content=" + content);
    if(content.hasText()){System.out.println("result is " + content.text());}
    else System.out.println("nothing!");

    Once (or if) you verify that content is the culprit you have to figure out why doc.getElementById() returned null. This may involve checking the API docs for getElementById() to see under what conditions it returns null.
    Last edited by pbrockway2; March 6th, 2012 at 06:20 PM. Reason: typos

Similar Threads

  1. Help with XML Parsing
    By ur2cdanger in forum File I/O & Other I/O Streams
    Replies: 1
    Last Post: October 1st, 2011, 07:58 AM
  2. Parsing XML
    By jrookie in forum What's Wrong With My Code?
    Replies: 1
    Last Post: March 17th, 2011, 10:04 AM
  3. Question regarding XML Parsing
    By newbie in forum Java Theory & Questions
    Replies: 2
    Last Post: February 24th, 2011, 06:03 AM
  4. Parsing urls
    By Riddhi Sharma in forum Java Theory & Questions
    Replies: 2
    Last Post: January 25th, 2011, 10:06 AM
  5. Parsing CDATA
    By Sai in forum What's Wrong With My Code?
    Replies: 3
    Last Post: April 9th, 2010, 12:33 AM