Hi there, I want to make a program which takes an image, reads the text written in the image and outputs the text to an editable format such as a .txt file or a .doc file. How should I proceed, any ideas???
Printable View
Hi there, I want to make a program which takes an image, reads the text written in the image and outputs the text to an editable format such as a .txt file or a .doc file. How should I proceed, any ideas???
You have posted in the wrong forum. Thread moved.
What you want is far from trivial, and most solutions require complex machine learning algorithms. The following is the only java library I know of written exclusively in java, and I have not tried it for its abilities
Java OCR | Ron Cemer's Blog
Also, google "optical character recognition"
What you need is OCR (optical charcter regognition) SDK. As far as i know, there are no free/opensource pure Java OCR engines. There are Java APIs which wrap calls for native interfaces, for example, for one of the most popular opensource OCR engines - Tesseract - there are some Java wrappers like tesjeract or Tess4J.
However, opensource engines are rather hard to set up and don't provide enough quality, so if you are planning a business software - have a look at ABBYY FineReader Engine. It has a well-composed developer guide, a great set of image analysis and preprocessing features and provides Java API. It's not free, but, as you may know, ABBYY provides the best OCR quality, for example check out Linux OCR Software Comparison [splitbrain.org] or you may test it yourself, it’s free to try.
One more solution could be a cloud service. It requires end-user application to have the internet connection, but it's independent from your programming language choice and resources limitations. Have a look at ABBYY Cloud OCR SDK, it's a cloud-based OCR SDK recently launched by ABBYY. It's in beta, so for now it's totally free to use.
Here is a tool; I hope it’s going to solve your problem,
Aspose.OCR for Java is a Java OCR component that allows developers to add OCR functionality in their Java web applications, web services and Windows applications. It provides a simple set of classes for controlling character recognition tasks. It helps developers to work with image files from within their Java applications. It allows developers to extract text from images, Read font, style information quickly, saving time & effort involved in developing an
Here is a related post:
Extract Text from Specific Part of the Image: http://docs.aspose.com/display/ocrne...t+of+the+Image
hi vaibhav21
i want the Same project which you were looking for year ago.
i have done till now is that i am taking an image and able to do binarize.
now i am confused how to extract text from an image and save in .txt file
propabably you got the solution. so plz help me out for this.