Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 13 of 13

Thread: How split a text looking for values with regular expression

  1. #1
    Member
    Join Date
    Apr 2014
    Posts
    31
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Default How split a text looking for values with regular expression

    I have to read a text looking for value related to certain variable. The variable are always DE plus identifier from 1 to 99 or DE plus identifier from 1 to 99 and SF plus identifier from 1 to 99. The value can be alphanumeric. How can I get these values? As a start point, I try to use split("DE") but I can get something that is wrong if there is "DE" inside the text that doesn't me interest. I look for scanner but it doesn't help. I guess that there is some way, maybe using regex but I am completely lost ( I have never used regedix before and I am in rush to fix this). Basicaly, the text is similiar to the below where DE means data element and sf sub field. Some data elements have sub fields while others don't. I gues that there is a way to split with something like DE+anyNumberFrom1To99 = theValueAimed in some array and DE+anyNumberFrom1To99+,+SF+anyNumberFrom1To99 = theValueAimed in other array.

    DE 2, SF 1 = 00 SOME TEXT THAT DOESN'T INTEREST ME
    DE 22, SF 1 = 0 SOME TEXT THAT DOESN'T INTEREST ME
    DE 22, SF 4 = 1 SOME TEXT THAT DOESN'T INTEREST ME
    DE 22, SF 5 = 0 SOME TEXT THAT DOESN'T INTEREST ME
    DE 22, SF 6 = 11 SOME TEXT THAT DOESN'T INTEREST ME
    DE 22, SF 7 = 90x SOME TEXT THAT DOESN'T INTEREST ME
    DE 22, SF 7 = 12ab SOME TEXT THAT DOESN'T INTEREST ME
    DE 99 = 1234 SOME TEXT THAT DOESN'T INTEREST ME


  2. #2
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,424
    My Mood
    Hungover
    Thanks
    144
    Thanked 636 Times in 540 Posts

    Default Re: How split a text looking for values with regular expression

    Are each of these lines really on a separate line, or do they come as one big String without line breaks?

    Do you not care about anything after the equals sign? Will the equals sign always be there?

    If so, split on the = to get at the DE 22, SF 7 part. Then either split on the comma or do the parsing yourself.
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  3. #3
    Member
    Join Date
    Apr 2014
    Posts
    31
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Default Re: How split a text looking for values with regular expression

    Hi. Your idea give me a light. This text snippet is a whole cell in spreadsheet. I believe that there might exist some safer way by using regex. The basic logic behind each cell is that the user can paste any DE (data element) from 1 to 99 and he can add SF (sub field). For instance: First example: DE 50 = 12AB34 and some description that I have to ignore. Second example: DE 50, SF99 = 34CD56 and its description that I have to ignore. In this case the whole cell will be: DE 50 = 12AB34 bla bla DE 50, SF99 = 34CD56 bla bla. If I split by = so I will get in some position bla bla DE 50, SF99. Note that the value is in one position ahead and the variable one behind. Well, I can parse it but I have to get DE and its id (1 to 99) and test if it is only DE or there is a SF id. If there is a way by using regex it will be better because I can't guarantee that "=" and "DE" will not appear in bla bla. I tried the code below without success:
    Pattern p = Pattern.compile("(DE 3, SF 1 = ) (\\d+)");
    Matcher m = p.matcher(strCell);

  4. #4
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,424
    My Mood
    Hungover
    Thanks
    144
    Thanked 636 Times in 540 Posts

    Default Re: How split a text looking for values with regular expression

    For future reference, saying that something didn't work doesn't give us a lot to work from. Try telling us what your code did instead, or provide an MCVE so we can see it for ourselves.

    I'm not a regex expert, but this seems to work:

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
     
    public class Test {
    	public static void main(String... args){
     
    		String data = "DE 2, SF 1 = 00 SOME TEXT THAT DOESN'T INTEREST ME DE 22, SF 1 = 0 SOME TEXT THAT DOESN'T INTEREST ME DE 22, SF 4 = 1 SOME TEXT THAT DOESN'T INTEREST ME DE 22, SF 5 = 0 SOME TEXT THAT DOESN'T INTEREST ME DE 22, SF 6 = 11 SOME TEXT THAT DOESN'T INTEREST ME DE 22, SF 7 = 90x SOME TEXT THAT DOESN'T INTEREST ME DE 22, SF 7 = 12ab SOME TEXT THAT DOESN'T INTEREST ME DE 99 = 1234 SOME TEXT THAT DOESN'T INTEREST ME";
     
    		Pattern pattern = Pattern.compile("DE \\d+(, SF \\d+)* = \\d+");
    		Matcher matcher = pattern.matcher(data);
     
    		while (matcher.find()){
     
    			String wholeThing = matcher.group();
     
    		    System.out.println(wholeThing);
     
    		    String deDigit = wholeThing.substring(3, wholeThing.contains(",") ? wholeThing.indexOf(",") : wholeThing.indexOf("=") - 1);
     
    		    System.out.println("DE digit: " + deDigit);
     
    		    if(wholeThing.contains(",")){
    		    	String seDigit = wholeThing.substring(wholeThing.indexOf(",") + 5, wholeThing.indexOf("=")-1);
    		    	System.out.println("SE digit: " + seDigit);
    		    }
     
    		    String value = wholeThing.substring(wholeThing.indexOf("=") + 1);
    		    System.out.println("Value: " + value);
     
    		    System.out.println();
     
    		}
    	}
    }
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  5. #5
    Member
    Join Date
    Apr 2014
    Posts
    31
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Default Re: How split a text looking for values with regular expression

    Thank you so much. Can you please help me improve your example to get the values from this data?
    String data = "DE 1, SF 2 = A9A DE and SF with 1 digit DE 1, SF 22 = 1234abcd DE with 1 digit and SF with 2 digits DE 11, SF 22 = BC both DE and SF with 2 digits DE 33, SF 4 = 99 DE with 1 digit and SF with 2 digits DE 9 = A123 de without sf";
    The description I keep not interested in but note that the id of DE or SF can be between 1 and 99 and the value is alphanumeric with variable digits.

  6. #6
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,424
    My Mood
    Hungover
    Thanks
    144
    Thanked 636 Times in 540 Posts

    Default Re: How split a text looking for values with regular expression

    If I understand what you're asking, all you need to do is change the last bit of the regular expression from capturing digits (\\d) to capturing word characters (\\w).

    Pattern pattern = Pattern.compile("DE \\d+(, SF \\d+)* = \\w+");

    All of this is available in the Pattern API: Pattern (Java Platform SE 7 )
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  7. The Following User Says Thank You to KevinWorkman For This Useful Post:

    DemeCarv (August 10th, 2014)

  8. #7
    Member
    Join Date
    Apr 2014
    Posts
    31
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Default Re: How split a text looking for values with regular expression

    KevinWorkman, hopefully you still see this thread. Firstly, thank you for every tips. I am facing a new obstacle: there is description that I have to ignore between the id and "=". How can I read the values from DE and PDS and ignore completlly the description between "(" and ")"?

    DE 4 (Amount, Transaction) = 440,800 COP
    DE 32 (Acquiring Institution ID Code) = 999692
    DE 33 (Forwarding Institution ID Code) = 999692
    PDS 0146 (Amounts, Transaction Fee) = 3,800 COP
    PDS 1012 (Customer ID) = 123456789bb (test value)
    PDS 1002 (Tax[Value Added Tax])= 60,800 COP
    PDS 1003 (Tax Amount Base)= 380,000 COP
    PDS 1013 (Interchange Code)= P
    PDS 1015 (Interchange Percentage)= 00100

  9. #8
    Forum VIP
    Join Date
    Jun 2011
    Posts
    317
    My Mood
    Bored
    Thanks
    47
    Thanked 89 Times in 74 Posts
    Blog Entries
    4

    Default Re: How split a text looking for values with regular expression

    You could get away doing this with basic String methods. Use int indexOf(char c) to find the location of the '(' and ')' characters then concatenate two substrings with the bits that come before and after.

    It's possible to solve this with regular expressions but this is cleaner.
    Computers are fascinating machines, but they're mostly a reflection of the people using them.
    -- Jeff Atwood

  10. The Following User Says Thank You to ChristopherLowe For This Useful Post:

    DemeCarv (August 14th, 2014)

  11. #9
    Member
    Join Date
    Apr 2014
    Posts
    31
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Default Re: How split a text looking for values with regular expression

    ChristopherLowe, thank you. I do need to use regular expression. Even if I wasn't expected to use regex, when I try match with "DE \\d+(, SF \\d+)* = \\w+" I don't retrieve any group if there are something between DE itsId and "=". I carefully looked at the Pattern API but I didn't find how to fix it. Your tip could help me if first I could retrieve the groups (eg. DE 1 (bla bla), DE 99 (bla bla), PDS 1111 (bla bla)) then I could use your tip to remove the "(bla bla)" but I must first be able to get the groups.

  12. #10
    Forum VIP
    Join Date
    Jun 2011
    Posts
    317
    My Mood
    Bored
    Thanks
    47
    Thanked 89 Times in 74 Posts
    Blog Entries
    4

    Default Re: How split a text looking for values with regular expression

    Do you need to use regular expressions as part of an assignment or learning exercise? If so I'd recommend using an online tool like this to test your regular expression syntax. If not, well there is a saying we use in my workplace; "I had a problem and used regular expression. Now I have two problems."

    For this type of problem I like using String methods like .split and .indexOf to convert plain text to usable data.

    So if this is the data;

    DE 4 (Amount, Transaction) = 440,800 COP
    DE 32 (Acquiring Institution ID Code) = 999692
    DE 33 (Forwarding Institution ID Code) = 999692
    PDS 0146 (Amounts, Transaction Fee) = 3,800 COP
    PDS 1012 (Customer ID) = 123456789bb (test value)
    PDS 1002 (Tax[Value Added Tax])= 60,800 COP
    PDS 1003 (Tax Amount Base)= 380,000 COP
    PDS 1013 (Interchange Code)= P
    PDS 1015 (Interchange Percentage)= 00100
    I would start by splitting it up at the newline '\n'

    String[] rows = allThatJunkData.split("\n");

    Now each element of the rows array contains a string representing each row. You can iterate over this array like so;

    for (String row: rows) {
       // do something with the individual row
    }

    As I said above you can use the .indexOf and .substring methods to isolate the bits you are interested in for each row.

    int startOfParenthesis = row.indexOf('(');  // position of (
    int endOfParenthesis = row.indexOf(')');  // position of )
     
    String firstColumn = row.substring(0, startOfParenthesis -1);  // everything before the (
    String secondColumn = row.subString(endOfParenthesis, row.length);  //everything after the )

    This is all junky code I made up on the fly so there are probably bugs but the point is now you will have:

    row[0] == "DE 4 (Amount, Transaction) = 440,800 COP"
    firstColumn == "DE 4 "
    secondColumn == " = 440,800 COP"
    Computers are fascinating machines, but they're mostly a reflection of the people using them.
    -- Jeff Atwood

  13. #11
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,424
    My Mood
    Hungover
    Thanks
    144
    Thanked 636 Times in 540 Posts

    Default Re: How split a text looking for values with regular expression

    I'm not totally sure what you're asking. Can you show us an MCVE showing the new data, as well as the new regular expression you're trying to use? What does it match? What doesn't it match?
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  14. #12
    Member
    Join Date
    Apr 2014
    Posts
    31
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Default Re: How split a text looking for values with regular expression

    There are 4 pattern. The first two I am able to retrieve thanks to KevinWorkman help. The third I am still struggling to retrieve. After I can fix the third I will try the fourth. Is ("DE \\d+ (w+) (, SF \\d+)* = \\w+") the regex for picking up the ids and values in the cell 3?
    Notes: bla bla means any text that I want to ignore.

    Spreadsheet:
    cell 1:
    "bla bla can be any text
    DE 1 = 00 bla bla
    DE 22 = A1B bla bla
    bla bla"
    cell 2
    "bla bla
    DE 3, SF 1 = 00 bla bla
    DE 0052, SF 3 = 1 bla bla
    bla bla"
    cell 3
    "bla bla
    DE 4 (bla bla) = ABC
    DE 88 (bla bla) = 1234
    bla bla"
    cell 4
    "bla bla
    DE 4 (bla bla), subfield 12 (bla bla) = ABC
    DE 88 (bla bla), subfield 55 (bla bla) = 1234
    bla bla"

    package com.parse;
     
    import java.io.FileInputStream;
    import java.io.InputStream;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
     
    import org.apache.poi.ss.usermodel.Cell;
    import org.apache.poi.ss.usermodel.Row;
    import org.apache.poi.ss.usermodel.Sheet;
    import org.apache.poi.ss.usermodel.Workbook;
    import org.apache.poi.ss.usermodel.WorkbookFactory;
     
    public class Import3 {
     
    	public static void main(String[] args) {
    		try {
     
    			/*
    			 There are 4 patterns of identifiers:
    			 1 - DE 99 = value
    			 2 - DE 99, SF 99 = value
    			 3 - DE 99 (some description) = value
    			 4 - DE 99 (some description), subfield 99 (some description) = value
    			 */
    			InputStream inp = new FileInputStream(
    					"/home/deme/the_spreadSheetWith4CellsAbove.xlsx");
    			Workbook wb;
     
    			wb = WorkbookFactory.create(inp);
    			Sheet sheet = wb.getSheetAt(0);
    //			Row row = sheet.getRow(0); //pattern 1 
    //			Row row = sheet.getRow(1); //pattern 2
    			Row row = sheet.getRow(2); //pattern 3
    //			Row row = sheet.getRow(3); //pattern 4
    			Cell cell = row.getCell(0);
    			String strCell = cell.getStringCellValue();
     
    			String data = strCell; 
     
    //			Pattern pattern1_and_2 = Pattern.compile("DE \\d+(, SF \\d+)* = \\w+");
    			Pattern pattern3 = Pattern.compile("DE \\d+ (w+) (, SF \\d+)* = \\w+"); // over here lands my real dificulty
    //			Pattern pattern4 = Pattern.compile("DE \\d+ (w+) (, subfield \\d+ (w+) )* = \\w+");
    			Matcher matcher = pattern3.matcher(data);
     
    			while (matcher.find()) {
     
    				String wholeThing = matcher.group();
     
    				System.out.println(wholeThing);
     
    				String deDigit = wholeThing.substring(3, wholeThing
    						.contains(",") ? wholeThing.indexOf(",") : wholeThing
    						.indexOf("=") - 1);
     
    				System.out.println("DE digit: " + deDigit);
     
    				if (wholeThing.contains(",")) {
    					String seDigit = wholeThing.substring(wholeThing
    							.indexOf(",") + 5, wholeThing.indexOf("=") - 1);
    					System.out.println("SF digit: " + seDigit);
    				}
     
    				String value = wholeThing
    						.substring(wholeThing.indexOf("=") + 1);
    				System.out.println("Value: " + value);
     
    				System.out.println();
     
    			}
    		} catch (Exception e) {
    			System.out.println(e.getMessage());
    		}
    	}
     
    }
    Last edited by DemeCarv; August 14th, 2014 at 08:15 PM. Reason: better format the code

  15. #13
    Member
    Join Date
    Apr 2014
    Posts
    31
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Default Re: How split a text looking for values with regular expression

    I am still looking for help with this challenge and I will appreciate any help.

Similar Threads

  1. Need help with a specific regular expression
    By Cornix in forum What's Wrong With My Code?
    Replies: 2
    Last Post: March 4th, 2014, 09:51 AM
  2. Problem with Regular Expression
    By jdluk87 in forum What's Wrong With My Code?
    Replies: 2
    Last Post: October 2nd, 2012, 08:44 AM
  3. [SOLVED] Regular Expression Difficulties...
    By snowguy13 in forum What's Wrong With My Code?
    Replies: 4
    Last Post: July 4th, 2012, 08:31 AM
  4. Regular expression handling
    By marquiseoflight in forum What's Wrong With My Code?
    Replies: 2
    Last Post: April 9th, 2012, 04:57 PM
  5. Regular Expression help
    By medoos in forum Java SE APIs
    Replies: 0
    Last Post: March 19th, 2011, 07:23 PM

Tags for this Thread