Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 6 of 6

Thread: extracting a list of few lines from html file

  1. #1
    Junior Member
    Join Date
    Apr 2012
    Location
    India
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default extracting a list of few lines from html file

    hello all, i'm beginner to java with few basic skills.
    I want to extract few field from an html file

    lets say a file has format as on page in following link
    AIRCC - IJWMN - Journal

    now I want to extract all the topics mentioned inthe page as a list, and store them in a text file

    the text file should have only following fields

    Architectures, protocols, and algorithms to cope with mobile & wireless Networks
    Distributed algorithms of mobile computing
    .
    .
    Wireless multimedia systems
    Service creation and management environments for mobile/ wireless systems


    how to do this?
    I tried to have regexp used. But couldnot use it to its extent I guess.
    Please help me to get through


  2. #2
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: extracting a list of few lines from html file

    Are you trying to parse the contents of an html page?
    I think there are some third party packages that will help, but I don't have a link or name.
    If you don't understand my answer, don't ignore it, ask a question.

  3. #3
    Junior Member
    Join Date
    Apr 2012
    Location
    India
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: extracting a list of few lines from html file

    dear norm,
    yes upto some extent yes, I need to parse that file. Or other way round I can directly read from url through connection object. But the problem i'm facing is to know exactly from where should I start and up to what point should I read the file in order to get a complete list of the topics given as list in the web page.
    Hope you could give me a clue regarding it.

  4. #4
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: extracting a list of few lines from html file

    Sorry, I don't have the name of any packages to parse html.
    If you don't understand my answer, don't ignore it, ask a question.

  5. #5
    Super Moderator Sean4u's Avatar
    Join Date
    Jul 2011
    Location
    Tavistock, UK
    Posts
    637
    Thanks
    5
    Thanked 103 Times in 93 Posts

    Default Re: extracting a list of few lines from html file

    You could do it trivially with javax.swing.text.html.parser.DocumentParser - read the Java SE API doc for that class and implement your own ParserCallback. You'll have to override handleStartTag and look for HTML.Tag.LI and use handleText to capture the text.

    It's quite a bit of work to do it that way for just one page. Copy-pasting the list (I used Firefox and vim-gnome on Ubuntu just now) gives me plain text one line per list item. Using a Pattern on the whole document can be hard, but it would be easy to do it line-by-line if you were to read the file with BufferedReader.readLine(), for example.

  6. #6
    Junior Member
    Join Date
    Apr 2012
    Location
    India
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: extracting a list of few lines from html file

    Ok sir
    i can understand that simply copy paste would have been a better option. But i just want to create a list from many such pages. And i would be glad if i could automate it somehow
    i Will try with your suggestion for parser thank you

Similar Threads

  1. Replies: 1
    Last Post: February 27th, 2012, 09:16 AM
  2. [SOLVED] Access Denied when extracting zip file.
    By techwiz24 in forum What's Wrong With My Code?
    Replies: 1
    Last Post: December 18th, 2011, 07:50 PM
  3. Problem extracting data from file
    By hello_world in forum File I/O & Other I/O Streams
    Replies: 17
    Last Post: August 21st, 2011, 09:35 PM
  4. Extracting the BINDING element from WSDL file
    By Sai in forum What's Wrong With My Code?
    Replies: 1
    Last Post: March 26th, 2010, 02:56 AM
  5. [SOLVED] reading only certain lines from a .txt file
    By straw in forum File I/O & Other I/O Streams
    Replies: 4
    Last Post: March 7th, 2010, 07:49 PM

Tags for this Thread