Using Regular Expression (regex) in Java Programming
Hiii all,
I'm new in using Regular Expression in Java.
For example, I have String s like this:
String s = "Why John Smith and Alan Smith and Nick Gates are the same?"
How can I get sub-strings as "John Smith", "Alan Smith", "Nick Gates" - (names of people - with first upper character) from s by using regex?
I try to use regex in Java, but It totally doesn't work.
This is my code:
/*
Code :
String EXAMPLE_TEST = "Why John Smith and Alan Smith and Nick Gates are the same?";
System.out.println(EXAMPLE_TEST.matches("([A-Z]&&[a-z])"));
String[] splitString = (EXAMPLE_TEST.split("([A-Z]&&[a-z])"));
System.out.println(splitString.length);
for (String string : splitString) {
System.out.println(string);
}
*/
Please help me to have exact "regex" in this case.
Thanks all in advance!
Re: Using Regular Expression (regex) in Java Programming
If all you want are words that begin with capital letters, it's much easier to not use your own regex and simply use the Scanner.next() method to get the word, at which point you can check to see if the first letter is upper case.
Code :
Scanner reader = new Scanner(EXAMPLE_TEST); // using the string you provided
while (reader.hasNext())
{
String read = reader.next();
if (Character.isUpperCase(read.charAt(0))
{
System.out.println("Found a name: " + read);
}
}
There are going to be a few problems with this basic code, such as it will pick up the first letter of the sentence, and it will also pick up abbreviations/other proper nouns that happen to have the first letter capitalized. Unfortunately, there's not really a way to get around this problem without doing some pretty advanced contenxt handling and/or natural language processing.
Re: Using Regular Expression (regex) in Java Programming
This will be pretty hard I think unless you can always ignore the first word, otherwise you'd end up getting a match on "Why John" which is not supposed to happen just like explained by helloworld.
You'd need to have a dictionary of commons words to exclude if you'd like to make it more fool proof and maybe even a dictionary of known names and if the word matches a name you can safely say its a name :)
// Json
Re: Using Regular Expression (regex) in Java Programming
hii all,
Thanks for your opinions,
I figured out by using regex like this: " [A-Z][a-z]+ [A-Z][a-z]+"
This is my code:
/*
Pattern pattern = Pattern.compile(" [A-Z][a-z]+ [A-Z][a-z]+");
Matcher matcher = pattern.matcher("Why John Smith and Alan Smith and Nick Gates are the same?");
while( matcher.find() )
System.out.println("main: "+matcher.group());
*/
print out:
main: John Smith
main: Alan Smith
main: Nick Gates
Re: Using Regular Expression (regex) in Java Programming
will it work for the string
Why John smith and Alan Smith and Nick gates are the same?
Re: Using Regular Expression (regex) in Java Programming
So basically you expect every name to have a whitespace in front of it as well. Fair enough, this might work :D
// Json
Re: Using Regular Expression (regex) in Java Programming
The OP has cross-posted on CodeGuru forums.
Re: Using Regular Expression (regex) in Java Programming
And you've obviously cross-read it :)
// Json
Re: Using Regular Expression (regex) in Java Programming
Quote:
Originally Posted by
Json
And you've obviously cross-read it :)
Just a heads-up for anyone who feels they may be wasting their time on a question that may have already been answered elsewhere...