Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 2 of 2

Thread: Text Processing with Regular Expressions explained in Java

  1. #1
    mmm.. coffee JavaPF's Avatar
    Join Date
    May 2008
    Location
    United Kingdom
    Posts
    3,336
    My Mood
    Mellow
    Thanks
    258
    Thanked 287 Times in 225 Posts
    Blog Entries
    4

    Post Text Processing with Regular Expressions explained in Java

    Text Processing with Regular Expressions explained


    A string can be formatted or parsed based on a specified pattern that will be searched in the string.

    In order to format or parse data (text or data types), you want to be able to tell your code: pick up that data item from there, and do this to it.
    How can we do this? You do it with a search based on pattern matching. The search pattern is described by what is called a Regular Expression (regex for short). In other words, Regular Expressions can be used to process text. A piece of text (sequence of characters) that is found to correspond to the search pattern is called a match.

    For example, you may want to say: Find a dot (.) in a string and split the string around each dot you find. That means if there are two dots in a string, the string will be split into three pieces. Or you may want to validate user input such as an email address.

    Java provides support for Regular Expressions (to define search patterns) and for matching the patters (to the text in the string) by providing the following elements:

    The java.util.regex.Pattern class

    The regular expression constructs

    The java.util.regex.Matcher class


    The simplest form of a regular expression is searching for string literal such as "Hello". You may want to look for certain words in a users input. However, you can also build very sophisticated expressions using what are called Regular Expression Constructs.

    Consider the following expression:

    [A-Za-z0-9]

    Any character in the range of A through Z, a through z, or 0 through 9 (any letter or digit) will match this pattern.

    Any other character will match the pattern described by the following expression:

    [^A-Za-z0-9]

    Notice the ^ character. This ^ character negates the expression.

    Here are some important points about constructs:

    Use backslash (\) as an escape character. For example, \. matches a period, whereas \\ matches a backslash.

    Use | or logical OR, ^ to match the beginning of a line, and $ to match the end of a line. Remember that ^ inside [ ] means negation.

    A Character class is a set of character alternatives enclosed in brackets;

    for example, [abc] means a, b or c. The character - denotes a range, and the character ^ inside [ ] denotes a negation (that is all the characters except those specified here). For example, the character class [^a-zA-Z] means all the characters that are not included in the range a through z and A through Z.

    There are several character classes that are already defined for you, such as \d means all digits.

    Character Classes (Brackets used as a grouping mechanism)

    [ABC]

    Any of the characters represented by A, B, C etc.

    [^ABC]

    Any character except A, B, C (negation)

    [a-zA-Z]

    a through z or A through Z (range)

    [...&&...]

    Intersection of two sets (AND)

    Predefined Chatacter Classes
    .

    (dot) Any character if the DOTALL flag is set, else any character except the line terminators.

    \d

    A digit [0-9]

    \D

    A non-digit: [^0-9]

    \s

    A whitespace character

    \S

    A non whitespace character

    \w

    A word character: [a-zA-Z0-9]

    \W

    A non-word character: [^\w]

    If you are looking for a regular expression, say X, to repeat itself a number of times, you can say it in the pattern by using a quantifier immediately following X. For example, X+ means one or more X.

    Greedy Quantifiers (X Represents Regular expression)

    X?

    X, zero or one time

    X*

    X, zero or more times

    X+

    X, one or more times

    X{n}

    X, exactly n times

    Some other Constructs

    ^

    The beginning of a line

    XY

    Y following X

    X|Y

    Either X or Y

    (?:X)

    X, as a noncapturing group


    The following is a typical process for pattern matching:

    • Compile the regular expression specified as a string into an instance of the Pattern class, for example, with a statement like the following:
    Pattern p = Pattern.compile("[^a-zA-Z0-9]");

    • Create a Matcher object that will contain the specified pattern and the input text to which the pattern will be matched:
    Matcher m = p.matches("[EMAIL="myemail@emailaddress.com"]myemail@emailaddress.com[/EMAIL]")


    • Invoke the matches() method or the fine() method on the Matcher object to find if a match is found.
    boolean b = m.find();


    The following is a code example to validate email addresses:

    import java.util.regex.*;
    public class EmailValidator 
    {
    public static void main(String[] args) 
    {
     String email="";
     
     if(args.length < 1)
     {
     System.out.println("Command syntax: java EmailValidator <emailAddress>");
     System.exit(0);
     }
     else
     {
     email = args[0];
     }
     //Look for for email addresses starting with
     //invalid symbols: dots or @ signs.
     Pattern p = Pattern.compile("^\\.+|^\\@+");
     Matcher m = p.matcher(email);
     
     if (m.find()) 
     {
     System.err.println("Invalid email address: starts with a dot or an @ sign.");
     System.exit(0);
     }
     
     //Look for email addresses that start with www.
     p = Pattern.compile("^www\\.");
     m = p.matcher(email);
     
     if (m.find())
     {
     System.out.println("Invalid email address: starts with www.");
     System.exit(0);
     }
     
     p = Pattern.compile("[^A-Za-z0-9\\@\\.\\_]");
     m = p.matcher(email);
     
     if(m.find()) 
     {
     System.out.println("Invalid email address: contains invalid characters");
     }
     else
     {
     System.out.println(args[0] + " is a valid email address.");
     }
    }
    }


    This code will be taking the user input as a parameter. For this to work properly you will need to give a string as a command-line argument.

    Try a few combinations of email addresses and see what you get. As you will see, this code will only return a correctly formatted email address as correct.
    Please use [highlight=Java] code [/highlight] tags when posting your code.
    Forum Tip: Add to peoples reputation by clicking the button on their useful posts.

    Looking for a Java job? Visit - Java Programming Careers


  2. #2
    Member
    Join Date
    May 2008
    Posts
    35
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default Re: Text Processing with Regular Expressions explained

    Thanks...Nice explanation

  3. The Following User Says Thank You to jazz2k8 For This Useful Post:

    ChristopherLowe (June 26th, 2011)

Similar Threads

  1. Replies: 3
    Last Post: December 22nd, 2011, 09:46 AM
  2. [SOLVED] Java Regular Expressions (regex) Greif
    By username9000 in forum Java SE APIs
    Replies: 4
    Last Post: June 11th, 2009, 06:53 PM
  3. How to format text in java?
    By fourseven in forum Java Theory & Questions
    Replies: 3
    Last Post: May 16th, 2009, 10:42 PM
  4. Java program to reduce spaces between the words in a text file
    By tyolu in forum File I/O & Other I/O Streams
    Replies: 2
    Last Post: May 13th, 2009, 08:17 AM
  5. Problem in implementing mortgage calculator
    By American Raptor in forum AWT / Java Swing
    Replies: 1
    Last Post: April 1st, 2009, 03:09 PM