Go Back   Java Programming Forums > Learning Java > Java Code Snippets and Tutorials


Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 19-05-2008, 10:54 AM
JavaPF's Avatar
mmm.. coffee
 
7 Highscores

Join Date: May 2008
Location: United Kingdom
Posts: 1,583
Thanks: 104
Thanked 93 Times in 86 Posts
JavaPF is someone you want to know!JavaPF is someone you want to know!JavaPF is someone you want to know!

I'm feeling Mellow
Post Text Processing with Regular Expressions explained

Text Processing with Regular Expressions explained


A string can be formatted or parsed based on a specified pattern that will be searched in the string.

In order to format or parse data (text or data types), you want to be able to tell your code: pick up that data item from there, and do this to it.
How can we do this? You do it with a search based on pattern matching. The search pattern is described by what is called a Regular Expression (regex for short). In other words, Regular Expressions can be used to process text. A piece of text (sequence of characters) that is found to correspond to the search pattern is called a match.

For example, you may want to say: Find a dot (.) in a string and split the string around each dot you find. That means if there are two dots in a string, the string will be split into three pieces. Or you may want to validate user input such as an email address.

Java provides support for Regular Expressions (to define search patterns) and for matching the patters (to the text in the string) by providing the following elements:

The java.util.regex.Pattern class

The regular expression constructs

The java.util.regex.Matcher class


The simplest form of a regular expression is searching for string literal such as "Hello". You may want to look for certain words in a users input. However, you can also build very sophisticated expressions using what are called Regular Expression Constructs.

Consider the following expression:

[A-Za-z0-9]

Any character in the range of A through Z, a through z, or 0 through 9 (any letter or digit) will match this pattern.

Any other character will match the pattern described by the following expression:

[^A-Za-z0-9]

Notice the ^ character. This ^ character negates the expression.

Here are some important points about constructs:

Use backslash (\) as an escape character. For example, \. matches a period, whereas \\ matches a backslash.

Use | or logical OR, ^ to match the beginning of a line, and $ to match the end of a line. Remember that ^ inside [ ] means negation.

A Character class is a set of character alternatives enclosed in brackets;

for example, [abc] means a, b or c. The character - denotes a range, and the character ^ inside [ ] denotes a negation (that is all the characters except those specified here). For example, the character class [^a-zA-Z] means all the characters that are not included in the range a through z and A through Z.

There are several character classes that are already defined for you, such as \d means all digits.

Character Classes (Brackets used as a grouping mechanism)

[ABC]

Any of the characters represented by A, B, C etc.

[^ABC]

Any character except A, B, C (negation)

[a-zA-Z]

a through z or A through Z (range)

[...&&...]

Intersection of two sets (AND)

Predefined Chatacter Classes
.

(dot) Any character if the DOTALL flag is set, else any character except the line terminators.

\d

A digit [0-9]

\D

A non-digit: [^0-9]

\s

A whitespace character

\S

A non whitespace character

\w

A word character: [a-zA-Z0-9]

\W

A non-word character: [^\w]

If you are looking for a regular expression, say X, to repeat itself a number of times, you can say it in the pattern by using a quantifier immediately following X. For example, X+ means one or more X.

Greedy Quantifiers (X Represents Regular expression)

X?

X, zero or one time

X*

X, zero or more times

X+

X, one or more times

X{n}

X, exactly n times

Some other Constructs

^

The beginning of a line

XY

Y following X

X|Y

Either X or Y

(?:X)

X, as a noncapturing group


The following is a typical process for pattern matching:
  • Compile the regular expression specified as a string into an instance of the Pattern class, for example, with a statement like the following:
Java Code
 
Pattern p = Pattern.compile("[^a-zA-Z0-9]");
  • Create a Matcher object that will contain the specified pattern and the input text to which the pattern will be matched:
Java Code
Matcher m = p.matches("myemail@emailaddress.com")
  • Invoke the matches() method or the fine() method on the Matcher object to find if a match is found.
Java Code
boolean b = m.find();

The following is a code example to validate email addresses:

Java Code
import java.util.regex.*;
public class EmailValidator 
{
public static void main(String[] args) 
{
 String email="";
 
 if(args.length < 1)
 {
 System.out.println("Command syntax: java EmailValidator <emailAddress>");
 System.exit(0);
 }
 else
 {
 email = args[0];
 }
 //Look for for email addresses starting with
 //invalid symbols: dots or @ signs.
 Pattern p = Pattern.compile("^\\.+|^\\@+");
 Matcher m = p.matcher(email);
 
 if (m.find()) 
 {
 System.err.println("Invalid email address: starts with a dot or an @ sign.");
 System.exit(0);
 }
 
 //Look for email addresses that start with www.
 p = Pattern.compile("^www\\.");
 m = p.matcher(email);
 
 if (m.find())
 {
 System.out.println("Invalid email address: starts with www.");
 System.exit(0);
 }
 
 p = Pattern.compile("[^A-Za-z0-9\\@\\.\\_]");
 m = p.matcher(email);
 
 if(m.find()) 
 {
 System.out.println("Invalid email address: contains invalid characters");
 }
 else
 {
 System.out.println(args[0] + " is a valid email address.");
 }
}
}

This code will be taking the user input as a parameter. For this to work properly you will need to give a string as a command-line argument.

Try a few combinations of email addresses and see what you get. As you will see, this code will only return a correctly formatted email address as correct.



__________________
Don't forget to add syntax highlighted code tags around your code: [highlight=Java] code here [/highlight]

Forum Tip: Add to peoples reputation () by clicking the button on their useful posts.
Reply With Quote Share this thread on Facebook
Sponsored Links
Java Training from DevelopIntelligence
  #2 (permalink)  
Old 06-08-2008, 07:03 AM
Member
 

Join Date: May 2008
Posts: 35
Thanks: 0
Thanked 0 Times in 0 Posts
jazz2k8 is on a distinguished road
Default Re: Text Processing with Regular Expressions explained

Thanks...Nice explanation
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Java Regular Expressions (regex) Greif username9000 Java SE APIs 4 11-06-2009 10:53 PM
Formating text fourseven Java Theory & Questions 3 17-05-2009 02:42 AM
text file tyolu File I/O & Other I/O Streams 2 13-05-2009 12:17 PM
Printing to a text area American Raptor AWT / Java Swing 1 01-04-2009 07:09 PM
How to Validate an email address using Regular Expressions JavaPF Java Code Snippets and Tutorials 0 19-05-2008 12:26 PM


100 most searched terms
Search Cloud
2d arraylist java actionlistener actionlistener in java actionlistener java addactionlistener addactionlistener in java addactionlistener java applications of oops could not create java virtual machine xp double format java double to int java double to integer in java double to integer java eclipse shortcut keys eclipse tutorial for beginners exception in thread "awt-eventqueue-0" java.lang.outofmemoryerror: java heap space exception in thread "main" java.lang.nullpointerexception exception in thread "main" java.lang.outofmemoryerror: java heap space format double java get mouse position java http://www.javaprogrammingforums.com/object-oriented-programming/3713-limiting-decimal-places-double.html java 2d arraylist java actionlistener java addactionlistener java double format java double to int java double to integer java format double java forum java forums java get mouse position java list to map java mouse position java programmers forum java programming forum java programming forums java programming help java project ideas java sendkeys java two dimensional arraylist java.lang.classformaterror: truncated class file java.lang.outofmemoryerror: java heap space java.util.arraylist jbutton actionlistener jtextarea font jtextarea font color jtextfield font size jxl.read.biff.biffexception: unable to recognize ole stream two dimensional arraylist java writing ipod apps

All times are GMT. The time now is 11:47 PM.
Powered by vBulletin® Copyright ©2000-2009, Jelsoft Enterprises Ltd.