Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 5 of 5

Thread: conversion of pdf file to text using itext in java

  1. #1
    Junior Member
    Join Date
    Dec 2013
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question conversion of pdf file to text using itext in java

    I have a problem while reading the content from pdf and creating a word document(doc/docx) in byte streams.Generated a word document contains junk characters instead of original text.

    Here is my code

     
    import java.io.FileNotFoundException;
    import java.io.FileOutputStream;
    import java.io.IOException;
     
    import com.itextpdf.text.pdf.PdfReader;
    import com.itextpdf.text.pdf.parser.ContentByteUtils;
    import com.lowagie.text.Document;
    import com.lowagie.text.DocumentException;
    import com.lowagie.text.Paragraph;
    import com.lowagie.text.rtf.RtfWriter2;
     
    public class Check1 {
     
     
     
     public static void main(String[] args) throws FileNotFoundException, IOException, DocumentException {
     
         PdfReader reader = new PdfReader("/home/mujafar/Desktop/NPTEL Transcription Guidelines.pdf");
            int n = reader.getNumberOfPages();
            System.out.println("total no of pages:::"+n);
     
             Document document = new Document();
     
             RtfWriter2.getInstance(document, new FileOutputStream("/home/mujafar/Desktop/file.docx"));
                System.out.println("file created");
                document.open();
                byte[] bytes;
                for(int i=1;i<=n;i++)
                {
     
                bytes= ContentByteUtils.getContentBytesForPage(reader, i);
     
                String s= new String(bytes);
                document.add(new Paragraph(s));
     
                document.newPage();
     
     
                }
     
                document.close();
     
     }


  2. #2
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: conversion of pdf file to text using itext in java

    Try isolating the problem. Is it on the input side or the output side?
    What is in the String s? Add a println(s) to see what is there.
    If you don't understand my answer, don't ignore it, ask a question.

  3. #3
    Junior Member
    Join Date
    Dec 2013
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: conversion of pdf file to text using itext in java

    Thanks for reply. In itextpdf document.add() accepting "String" or "Element" datatype. Thats why i converted byte array to String (i.e 's'), and i passed the 's' to Document.add(new Paragraph(s)); But content in word document in junk characters instead of original text.Any help appreciated.

  4. #4
    Super Moderator Norm's Avatar
    Join Date
    May 2010
    Location
    Eastern Florida
    Posts
    25,042
    Thanks
    63
    Thanked 2,708 Times in 2,658 Posts

    Default Re: conversion of pdf file to text using itext in java

    Is the contents of String: s ok?
    If you don't understand my answer, don't ignore it, ask a question.

  5. #5
    Junior Member
    Join Date
    Dec 2013
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: conversion of pdf file to word document using itext in java

    yes.how to get the content perfectly.

Similar Threads

  1. extract selected text out of a pdf file using java
    By ashish.sharma in forum What's Wrong With My Code?
    Replies: 5
    Last Post: December 2nd, 2013, 04:01 AM
  2. image to pdf conversion
    By manjunathk707 in forum What's Wrong With My Code?
    Replies: 2
    Last Post: August 6th, 2013, 09:12 AM
  3. Conversion of any image file to text file in Java...'
    By suyog53 in forum File I/O & Other I/O Streams
    Replies: 17
    Last Post: September 23rd, 2012, 08:37 AM
  4. PDF iText code frustrating
    By RiseAboveYou in forum What's Wrong With My Code?
    Replies: 0
    Last Post: October 18th, 2011, 08:04 AM
  5. java program to copy a text file to onother text file
    By francoc in forum File I/O & Other I/O Streams
    Replies: 3
    Last Post: April 23rd, 2010, 03:10 PM

Tags for this Thread