Enhancement in program of removing whitespace from text file
Right i'm back again to pickle your brains (which is normally JavaPF lol ;))
Basically i've got some code below which removes all the whitespace from a text file.
Originally it read through the whole file line by line, but this can take forever as some of the files that i'm dealing with can be quite large.
What i want the program to do is just remove the white space from the xml tag headers and not the whole file. For instance the tags appear as < t a g >< / t a g > instead of the normal way <tag></tag> which is a problem when i'm trying to use DOMParse to get the xml from the file.
As you can see from the code i added in the line
Code :
if (strLine.contains("< M D R - D V D >"))
This was just to see if the program would pick up the that particular tag which it did, however my file has loads of different tags.
My question is, is there any way of modifying the code to make the program pull all the tag names using a single line of code without having to enter every single tag name?
I have a few other questions, but i'll get this one out of the way first.
Code :
import java.util.regex.*;
import java.io.*;
public class regularexpressions{
public static void main(String[] args) throws IOException{
BufferedReader bf = new BufferedReader(new InputStreamReader(System.in));
System.out.print("Enter file name: ");
String filename = bf.readLine();
File file = new File(filename);
if(!filename.endsWith(".txt")){
System.out.println("Usage: This is not a text file!");
System.exit(0);
}
else if(!file.exists()){
System.out.println("File not found!");
System.exit(0);
}
FileInputStream fstream = new FileInputStream(filename);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
Pattern p;
Matcher m;
String afterReplace = "";
String strLine;
String inputText = "";
while ((strLine = br.readLine()) != null)
if (strLine.contains("< M D R - D V D >")){
System.out.println (strLine);
inputText = strLine;
p = Pattern.compile("\\s+");
m = p.matcher(inputText);
System.out.println(afterReplace);
afterReplace = afterReplace + m.replaceAll("") + "\r\n";
}
FileWriter fstream1 = new FileWriter(filename);
BufferedWriter out = new BufferedWriter(fstream1);
out.write(afterReplace);
in.close();
out.close();
}
}
Thanks
John
Re: Removing whitespace from text file
It's too late for me to look into code right now, but i might have a look in the morning if you still need it. But my suggestion to you is that you look for < and then a > thus marking the opening or closing of a tag ie it could pull up <tag> or </tag> which would be what you require and don't worry about what is in between.
Hope that makes sense,
Chris
Re: Removing whitespace from text file
Hi John,
To remove the white space from the xml tag headers try this:
Code :
String inputString = "askaslk asasas alsklas <s asa k>lallala <s asa k/> popp <a op/>";
String regExp = "<.[^(><.)]+>";
Matcher m = Pattern.compile(regExp).matcher(inputString);
while (m.find()) {
inputString = inputString.replaceFirst(m.group(), m.group()
.replaceAll(" ", ""));
}
System.out.println(inputString);
The code above will convert this string:
Code :
String inputString = "askaslk asasas alsklas <s asa k>lallala <s asa k/> popp <a op/>";
into this:
Code :
askaslk asasas alsklas <sasak>lallala <sasak/> popp <aop/>
Re: Removing whitespace from text file
Quote:
Originally Posted by
leandro
Hi John,
To remove the white space from the xml tag headers try this:
Code :
String inputString = "askaslk asasas alsklas <s asa k>lallala <s asa k/> popp <a op/>";
String regExp = "<.[^(><.)]+>";
Matcher m = Pattern.compile(regExp).matcher(inputString);
while (m.find()) {
inputString = inputString.replaceFirst(m.group(), m.group()
.replaceAll(" ", ""));
}
System.out.println(inputString);
Hey leandro thanks for the code.
In my code i replaced the following code:
Code :
p = Pattern.compile("\\s+");
with the regular expression you provided above:
Code :
p = Pattern.compile("<.[^(><.)]+>");
It seemed to pull all the data within the tags however not the tag header itself.
I think your code is right however i need to alter some of my code to make it work.
I messed about with the code for a while, but kept getting different errors so just decided to stick with my original method for now as i have another question;)
I'll start a new thread though and come back to this one after.
Re: Removing whitespace from text file
John you only need to put all the text from the file in inputString and my code will work