Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 3 of 3

Thread: Need help with a specific regular expression

  1. #1
    Senior Member
    Join Date
    Jul 2013
    Location
    Europe
    Posts
    900
    Thanks
    0
    Thanked 166 Times in 148 Posts

    Default Need help with a specific regular expression

    Hi there,

    I need a little help with a regular expression which I want to use to split a string.

    The string could look similar to this:
    "a = \"Hello World\" b = \"Some other String\" c = \"test\""
    The String is read from a file where the file contents would be:
    a = "Hello World" b = "Some other String" c = "test"
    After splitting I would like to get the following array:
    String[] splitString = new String[] {"a", "=", "\"Hello World\"", "b", "=", "\"Some other String\"", "c", "=", "\"test\""}

    I know I could just write a simple text parser to go over the string character by character, but I would rather not do it to keep my code clean and simple. No need to reinvent the wheel.
    However, I just cant seem to be able to find the right regular expression for this task. I know that a RE must exist because this can be solved by a finite automaton.

    If somebody was so kind to explain to me how the RE is supposed to look like this would be great.
    Thank you all very much.


  2. #2
    Member
    Join Date
    Feb 2014
    Posts
    180
    Thanks
    0
    Thanked 48 Times in 45 Posts

    Default Re: Need help with a specific regular expression

    Hi,

    I can think of 2 ways to do this:

    1. Use String.split() directly

    String[] splitStr = str.split("(?:(?<=\\w) (?==))|(?:(?<==) (?=\"))|(?:(?<=\") (?=\\w))");

    I'm not a regex expert, so it may be possible to refine the above. Note the use of:
    • (?<=X) - zero-width positive lookbehind
    • (?=X) - zero-width positive lookahead
    • (?:X) - non-capturing group
    • | - OR Logical operator

    The basic idea is to split against the space character conditional upon the character before and after the space character to avoid splitting up "Hello World" and other similar quoted strings.

    See Pattern (Java Platform SE 6) and Regular Expression Tutorial - Learn How to Use Regular Expressions.

    2. Use Pattern and Matcher

    	String[] splitMultiSteps(String str) {
    		System.out.println(str);
    		Pattern pattern = Pattern.compile("(\\w+) (=) (\"[^\"]+\")");
    		Matcher matcher = pattern.matcher(str);
     
    		List<String> matches = new ArrayList<>();
    		while (matcher.find()) {
    			for (int i = 1; i <= matcher.groupCount(); i++) {
    				String matchedStr = matcher.group(i);
    				matches.add(matchedStr);
    			}
    		}
     
    		return matches.toArray(new String[matches.size()]);
    	}

    This uses a far simpler regex with the aim of matching var = "non-double quote chars", with a capturing group for each of the required elements, and iterate through each 'find' that the regex matches.

    Personally I'd prefer the latter because it's more maintainable.

  3. The Following User Says Thank You to jashburn For This Useful Post:

    GregBrannon (March 3rd, 2014)

  4. #3
    Senior Member
    Join Date
    Jul 2013
    Location
    Europe
    Posts
    900
    Thanks
    0
    Thanked 166 Times in 148 Posts

    Default Re: Need help with a specific regular expression

    I have tested the second approach and it works.
    Thank you very much.

    However, I should have said earlier, that the file also contains lines like:
    i = 5
    j = 3.677
    k = true
    l = false
    Which now can not be parsed anymore.

    I changed your code to account for this, if anybody is interested in my solution, here it is:
    	public static String[] splitConfig(String str) {
    		System.out.println("splitConfig"+str);
    		Pattern pattern = Pattern.compile("(\\w+) (=) ([^\"]+|\"[^\"]+\")");
    		Matcher matcher = pattern.matcher(str);
     
    		List<String> matches = new ArrayList<>();
    		while (matcher.find()) {
    			for (int i = 1; i <= matcher.groupCount(); i++) {
    				String matchedStr = matcher.group(i);
    				matches.add(matchedStr);
    			}
    		}
    		String[] result = matches.toArray(new String[matches.size()]);
    		System.out.println(Arrays.toString(result));
    		return result;
    	}
    This will parse the following text, line by line:
    a = 5
    b = 3.3331
    c = false
    d = "Hello World"
    to this String array:
    String[] {
    "a", "=", "5",
    "b", "=", "3.3331",
    "c", "=", "false",
    "d", "=", "\"Hello World\""
    }

    Thank you again, very useful answer.

Similar Threads

  1. Problem with Regular Expression
    By jdluk87 in forum What's Wrong With My Code?
    Replies: 2
    Last Post: October 2nd, 2012, 09:44 AM
  2. regular expression issue
    By flamant in forum What's Wrong With My Code?
    Replies: 2
    Last Post: July 22nd, 2012, 03:57 AM
  3. [SOLVED] Regular Expression Difficulties...
    By snowguy13 in forum What's Wrong With My Code?
    Replies: 4
    Last Post: July 4th, 2012, 09:31 AM
  4. Regular expression handling
    By marquiseoflight in forum What's Wrong With My Code?
    Replies: 2
    Last Post: April 9th, 2012, 05:57 PM
  5. Regular Expression help
    By medoos in forum Java SE APIs
    Replies: 0
    Last Post: March 19th, 2011, 08:23 PM