Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 3 of 3

Thread: Regular Expression in Java for HTML pages

  1. #1
    Junior Member
    Join Date
    Feb 2014
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question Regular Expression in Java for HTML pages

    hi,

    I need to parse an html web page to extract specific information from the tags in Java. For example,

    <b>Species </b> Strain </td>

    I need to look for the Strain info (Strain is variable in length) in the page. The whole web page is stored as a huge string. I need a regular expression that can help me identify all the Species and retrieve their corresponding strain info.

    Does someone has a clue how to do this or can propose some clever string manipulation methods in Java.

    Thank you.


  2. #2
    Super Moderator
    Join Date
    Jun 2013
    Location
    So. Maryland, USA
    Posts
    5,132
    My Mood
    Mellow
    Thanks
    187
    Thanked 659 Times in 646 Posts

    Default Re: Regular Expression in Java for HTML pages

    Welcome to the Forum! Please read this topic to learn how to post code correctly and other useful tips for newcomers.

    What have you tried?

  3. #3
    Member andbin's Avatar
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    443
    Thanks
    4
    Thanked 122 Times in 114 Posts

    Default Re: Regular Expression in Java for HTML pages

    Quote Originally Posted by pari89 View Post
    I need to parse an html web page to extract specific information from the tags in Java. For example,

    <b>Species </b> Strain </td>

    I need to look for the Strain info (Strain is variable in length) in the page. The whole web page is stored as a huge string.
    In general you have two options: the (possibly heavy) use of regular expressions (or other String functions) or the use of a specialized library like jsoup (an HTML parser, search on the web).

    In either cases, you have to clearly know what are the precise rules to find/extract the text you want. For example: is "xyz" (what you want) always at end of <td> tag? Is there always a <b> tag before what you want? Is it possible that there is a <i> (or other) instead of <b>? Do you want only last word or all words at the end? Do you care about surrounding spaces?
    Andrea, www.andbin.net SCJP 5 (91%) SCWCD 5 (94%)

    Useful links for Java beginners My new project Java Examples on Google Code

Similar Threads

  1. validate a String in java Regular Expression
    By khalidhabib in forum What's Wrong With My Code?
    Replies: 3
    Last Post: October 1st, 2013, 04:45 AM
  2. [SOLVED] Regular Expression Difficulties...
    By snowguy13 in forum What's Wrong With My Code?
    Replies: 4
    Last Post: July 4th, 2012, 08:31 AM
  3. Replies: 3
    Last Post: September 30th, 2011, 08:45 AM
  4. Regular Expression help
    By medoos in forum Java SE APIs
    Replies: 0
    Last Post: March 19th, 2011, 07:23 PM
  5. Using Regular Expression (regex) in Java Programming
    By lordelf007 in forum What's Wrong With My Code?
    Replies: 8
    Last Post: May 14th, 2010, 10:29 AM

Tags for this Thread