Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 5 of 5

Thread: Enhancement in program of removing whitespace from text file

  1. #1
    Junior Member
    Join Date
    Mar 2009
    Posts
    28
    Thanks
    5
    Thanked 0 Times in 0 Posts

    Default Enhancement in program of removing whitespace from text file

    Right i'm back again to pickle your brains (which is normally JavaPF lol

    Basically i've got some code below which removes all the whitespace from a text file.

    Originally it read through the whole file line by line, but this can take forever as some of the files that i'm dealing with can be quite large.

    What i want the program to do is just remove the white space from the xml tag headers and not the whole file. For instance the tags appear as < t a g >< / t a g > instead of the normal way <tag></tag> which is a problem when i'm trying to use DOMParse to get the xml from the file.

    As you can see from the code i added in the line
     if (strLine.contains("< M D R - D V D >"))

    This was just to see if the program would pick up the that particular tag which it did, however my file has loads of different tags.

    My question is, is there any way of modifying the code to make the program pull all the tag names using a single line of code without having to enter every single tag name?

    I have a few other questions, but i'll get this one out of the way first.

     
    import java.util.regex.*;
    import java.io.*;
     
    public class regularexpressions{
      public static void main(String[] args) throws IOException{
        BufferedReader bf = new BufferedReader(new InputStreamReader(System.in));
        System.out.print("Enter file name: ");
        String filename = bf.readLine();
        File file = new File(filename);
        if(!filename.endsWith(".txt")){
          System.out.println("Usage: This is not a text file!");
          System.exit(0);
        }
        else if(!file.exists()){
          System.out.println("File not found!");
          System.exit(0);
        }
        FileInputStream fstream = new FileInputStream(filename);
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        Pattern p;
        Matcher m;
        String afterReplace = "";
        String strLine;
        String inputText = "";
        while ((strLine = br.readLine()) != null)
        	if (strLine.contains("< M D R - D V D >")){
          System.out.println (strLine);
          inputText = strLine;
          p = Pattern.compile("\\s+");
          m = p.matcher(inputText);
          System.out.println(afterReplace);
          afterReplace = afterReplace + m.replaceAll("") + "\r\n";
        }
        FileWriter fstream1 = new FileWriter(filename);
        BufferedWriter out = new BufferedWriter(fstream1);
        out.write(afterReplace);
        in.close();
        out.close();
      }
    }

    Thanks

    John


  2. #2
    Senile Half-Wit Freaky Chris's Avatar
    Join Date
    Mar 2009
    Posts
    834
    My Mood
    Cynical
    Thanks
    7
    Thanked 105 Times in 90 Posts

    Default Re: Removing whitespace from text file

    It's too late for me to look into code right now, but i might have a look in the morning if you still need it. But my suggestion to you is that you look for < and then a > thus marking the opening or closing of a tag ie it could pull up <tag> or </tag> which would be what you require and don't worry about what is in between.

    Hope that makes sense,
    Chris

  3. #3

    Default Re: Removing whitespace from text file

    Hi John,

    To remove the white space from the xml tag headers try this:

        String inputString = "askaslk asasas alsklas <s asa k>lallala <s asa k/> popp <a   op/>";
    		String regExp = "<.[^(><.)]+>";
    		Matcher m = Pattern.compile(regExp).matcher(inputString);
    		while (m.find()) {
    			inputString = inputString.replaceFirst(m.group(), m.group()
    					.replaceAll(" ", ""));
    		}
    		System.out.println(inputString);

    The code above will convert this string:

    String inputString = "askaslk asasas alsklas <s asa k>lallala <s asa k/> popp <a   op/>";

    into this:

    askaslk asasas alsklas <sasak>lallala <sasak/> popp <aop/>

  4. The Following User Says Thank You to leandro For This Useful Post:

    JavaPF (April 26th, 2009)

  5. #4
    Junior Member
    Join Date
    Mar 2009
    Posts
    28
    Thanks
    5
    Thanked 0 Times in 0 Posts

    Default Re: Removing whitespace from text file

    Quote Originally Posted by leandro View Post
    Hi John,

    To remove the white space from the xml tag headers try this:

        String inputString = "askaslk asasas alsklas <s asa k>lallala <s asa k/> popp <a   op/>";
            String regExp = "<.[^(><.)]+>";
            Matcher m = Pattern.compile(regExp).matcher(inputString);
            while (m.find()) {
                inputString = inputString.replaceFirst(m.group(), m.group()
                        .replaceAll(" ", ""));
            }
            System.out.println(inputString);
    Hey leandro thanks for the code.

    In my code i replaced the following code:
    p = Pattern.compile("\\s+");

    with the regular expression you provided above:
    p = Pattern.compile("<.[^(><.)]+>");
    It seemed to pull all the data within the tags however not the tag header itself.
    I think your code is right however i need to alter some of my code to make it work.
    I messed about with the code for a while, but kept getting different errors so just decided to stick with my original method for now as i have another question

    I'll start a new thread though and come back to this one after.

  6. #5

    Default Re: Removing whitespace from text file

    John you only need to put all the text from the file in inputString and my code will work

Similar Threads

  1. Text Processing with Regular Expressions explained in Java
    By JavaPF in forum Java Programming Tutorials
    Replies: 3
    Last Post: February 8th, 2022, 05:16 PM
  2. Use of scanner to return white space between the tokens
    By Alysosh in forum File I/O & Other I/O Streams
    Replies: 3
    Last Post: May 20th, 2009, 09:22 AM
  3. How to format text in java?
    By fourseven in forum Java Theory & Questions
    Replies: 3
    Last Post: May 16th, 2009, 09:42 PM
  4. Java program to reduce spaces between the words in a text file
    By tyolu in forum File I/O & Other I/O Streams
    Replies: 2
    Last Post: May 13th, 2009, 07:17 AM
  5. Problem in implementing mortgage calculator
    By American Raptor in forum AWT / Java Swing
    Replies: 1
    Last Post: April 1st, 2009, 02:09 PM