Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 8 of 8

Thread: Problems with Formatted Output XML File

  1. #1
    Junior Member
    Join Date
    Sep 2014
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Problems with Formatted Output XML File

    I am trying to use Java to convert a .txt file to a .xml format
    It successfully parses the text file to an xml file. The problem is I'm having issues getting all the tags I'd like.
    This is a sample of what my input text file looks like:

    <DOC>
    <DOCNO>3393</DOCNO>
    <TEXT>
          Biblical Traditions  
    </TEXT>
    </DOC>
     
    <DOC>
    <DOCNO>42027</DOCNO>
    <TEXT>
        Automobiles   
    </TEXT>
    </DOC>
     
    <DOC>
    <DOCNO>7456</DOCNO>
    <TEXT>
         Fruits and Vegetables
    </TEXT>
    </DOC>

    When I am finished with my current code, the outputted parsed xml file looks like this:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <DOC>
        <DOC>
            <TEXT></TEXT>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>
        <DOC>
            <TEXT>      Biblical Traditions            </TEXT>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>
        <DOC>
            <TEXT>    Automobiles   </TEXT>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>
        <DOC>
            <TEXT/>
        </DOC>

    It's creating redundant <DOC> and <TEXT> tags. Here is my current code. I need help modifying it to include the <DOCNO> tag:
    package convert;
    import java.io.BufferedReader;
    import java.io.FileReader;
    import javax.xml.*;
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.transform.OutputKeys;
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerConfigurationException;
    import javax.xml.transform.TransformerException;
    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.dom.DOMSource;
    import javax.xml.transform.stream.StreamResult;
    import org.w3c.dom.*;
     
     
    public class convertToXML {
    	BufferedReader in;
    	StreamResult out;
     
    	Document xmldoc;
    	Element root;
     
    	public static void main(String[] args) {
    		new convertToXML().doit();
    	}
     
    	public void doit() {
    		try {
    			in = new BufferedReader(new FileReader("C:\\text.txt"));
    			out = new StreamResult("C:\\newXML.xml");
    			initXML();
    			String str;
    			while ((str = in.readLine()) != null) {
    				process(str);
    			}
    			in.close();
    			writeXML();
    		}
    		catch (Exception e) { e.printStackTrace(); }
    	}
     
    	public void initXML() throws ParserConfigurationException {
    	//JAXP + DOM
    	DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    	DocumentBuilder builder = factory.newDocumentBuilder();
    	DOMImplementation impl = builder.getDOMImplementation();
     
    	xmldoc = impl.createDocument(null,  "DOC", null); 
    	root = xmldoc.getDocumentElement();
    }
     
    public void process(String s) {
    	//Escape character 
    	String [] elements = s.split("\\<");
    	Element e0 = xmldoc.createElement("DOC");
     
    	Element e1 = xmldoc.createElement("TEXT");
    	Node n1 = xmldoc.createTextNode(elements[0]);
    	e1.appendChild(n1);
     
    	//Element e2 = xmldoc.createElement("TEXT");
    	//Node n2 = xmldoc.createTextNode(elements[1]);
    	//e2.appendChild(n2);
     
    	e0.appendChild(e1);
    	//e0.appendChild(e2);
    	root.appendChild(e0);
    }
     
    public String writeXML() throws TransformerConfigurationException, TransformerException {
    	DOMSource domSource = new DOMSource(xmldoc);
    	TransformerFactory tf = TransformerFactory.newInstance();
    	Transformer transformer = tf.newTransformer();
     
    	transformer.setOutputProperty(OutputKeys.METHOD, "xml");
    	transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    	transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
    	transformer.setOutputProperty(OutputKeys.INDENT, "yes");
     
    	transformer.transform(domSource, out);
     
    	java.io.StringWriter sw = new java.io.StringWriter();
    	StreamResult sr = new StreamResult(sw);
    	transformer.transform(domSource,  sr);
    	return sw.toString();
    }
    }


  2. #2
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Problems with Formatted Output XML File

    If you don't understand my answer, don't ignore it, ask a question.

  3. #3
    Junior Member
    Join Date
    Sep 2014
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Problems with Formatted Output XML File

    Quote Originally Posted by Norm View Post
    I agree. I'm trying to code it myself. Could not find any useful tools online to suit my needs. Parsing it later down the road I always ran into errors.
    If anyone could provide insight with parsing the text tags that would be great.

    Cheers.

  4. #4
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Problems with Formatted Output XML File

    The txt file looks like it's in xml format. What happens if you wrap it in some tags: <OUTER> and </OUTER>
    If you don't understand my answer, don't ignore it, ask a question.

  5. #5
    Junior Member
    Join Date
    Sep 2014
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Problems with Formatted Output XML File

    Quote Originally Posted by Norm View Post
    The txt file looks like it's in xml format. What happens if you wrap it in some tags: <OUTER> and </OUTER>
    I tried using <OUTER> tags and trying to parse it.
    Even with the encoding
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    tag defined at the top I still get this error:


    The markup in the document following the root element must be well-formed

    Don't know why it's doing that. The way the client defined their input file is

    <DOC>
    <DOCNO>12345</DOCNO>
    <TEXT>Title of Book</TEXT>
    </DOC>

    And there's about 500 of those in a single file. Don't understand how it doesn't follow a particular form.

  6. #6
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Problems with Formatted Output XML File

    The markup in the document following the root element must be well-formed
    What program gives that error message? I'm opening the file in Firefox browser.
    It gives a message:
    This XML file does not appear to have any style information associated with it. The document tree is shown below.

    Here is the file it reads:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <OUTER>
    <DOC>
    <DOCNO>3393</DOCNO>
    <TEXT>
          Biblical Traditions  
    </TEXT>
    </DOC>
     
    <DOC>
    <DOCNO>42027</DOCNO>
    <TEXT>
        Automobiles   
    </TEXT>
    </DOC>
     
    <DOC>
    <DOCNO>7456</DOCNO>
    <TEXT>
         Fruits and Vegetables
    </TEXT>
    </DOC>
    </OUTER>
    If you don't understand my answer, don't ignore it, ask a question.

  7. #7
    Junior Member
    Join Date
    Sep 2014
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Problems with Formatted Output XML File

    Quote Originally Posted by Norm View Post
    What program gives that error message? I'm opening the file in Firefox browser.
    It gives a message:
    This XML file does not appear to have any style information associated with it. The document tree is shown below.

    Here is the file it reads:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <OUTER>
    <DOC>
    <DOCNO>3393</DOCNO>
    <TEXT>
          Biblical Traditions  
    </TEXT>
    </DOC>
     
    <DOC>
    <DOCNO>42027</DOCNO>
    <TEXT>
        Automobiles   
    </TEXT>
    </DOC>
     
    <DOC>
    <DOCNO>7456</DOCNO>
    <TEXT>
         Fruits and Vegetables
    </TEXT>
    </DOC>
    </OUTER>
    I am now able to parse some of the XML. I encounter a new error, however.
    The entity "eacute" was referenced, but not declared.

    I was reading online and apparently I need to define "eacute" within the DTD tags of the XML file. How would I go about doing that?

  8. #8
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: Problems with Formatted Output XML File

    Sorry, that sounds like an XML question, not a java programming question.
    I do not know XML. Try asking on an XML forum.
    If you don't understand my answer, don't ignore it, ask a question.

Similar Threads

  1. xml file in JAR archive vs xml in classes folder
    By n0hc in forum What's Wrong With My Code?
    Replies: 1
    Last Post: December 2nd, 2013, 09:44 PM
  2. Problems reading XML file -- casting problem
    By Christian Egger in forum What's Wrong With My Code?
    Replies: 2
    Last Post: August 14th, 2013, 01:09 PM
  3. convert excel to xml and read the input from xml file
    By rahulruns in forum Object Oriented Programming
    Replies: 5
    Last Post: April 3rd, 2012, 11:13 AM
  4. Reading XML File using DOMParser and have problem with accessing xml
    By optiMystic23 in forum What's Wrong With My Code?
    Replies: 2
    Last Post: January 21st, 2012, 02:22 PM
  5. Writing formatted data to file
    By kafka82 in forum File I/O & Other I/O Streams
    Replies: 1
    Last Post: June 20th, 2011, 03:07 PM