Scanning Documents with OCR

Chris | Live Tech Support | Video Help | Add to iTunes – I received an email the other day from someone wondering why documents they scan into their computer rearranges the text and doesn’t retain formatting. Well user-whose-name-I-can’t-pronounce, what you’re asking about deals with OCR.

OCR, or Optical Character Recognition software allows you to scan a document and edit it. If the OCR software isn’t very good, then it probably won’t retain the layout that the original document had. You are probably using whatever software came with your all-in-one machine, and I don’t know what brand that is.

The problem is, you’re not really going to get really good OCR software unless you pay virtually hundreds of dollars. Probably the best I know of is OmniPage. OmniPage Professional 16 is the fastest and most precise way to convert high volumes of paper, PDF and forms into files you can edit and search.

Unfortunately, I don’t know of any open source or cheaper software titles that are on the same level with OmniPage. Do you? If you use something that works just as well, without the price tag, I’d love to hear about it. I’m sure that user-whose-name-I-can’t-pronounce would love to hear your suggestions, as well.

Want to embed this video in your blog? Use this code:

Formats Available: MPEG4 Video (.mp4) Flash Video (.flv) MP3 Audio (.mp3)

14 thoughts on “Scanning Documents with OCR”

  1. Pingback: Left Of Center
  2. II run into similar problems with OCR. I have tried quite a few application, also OmniPage (version 14) which I did not like at all. OmniPage-14 does not retain the original settings of a document. The best solution that I found is Solid Converter PDF Professional ( . Gene should do the following:
    1. Scan the document (at least at 300 dpi)
    2. Save the scan as a pdf file
    3. Open that pdf file in MS Word, after you have installed Solid Converter PDF Professional.

    Solid Converter retains all the original settings of the document. I do not know any better solution at an affordable price. Highly recommended.

  3. Hi,

    Here are a few other OCR software:

    1) This is one of the best: ABBYY FineReader, an award-winning Optical Character Recognition (OCR) software that allows users to convert paper documents, PDF files, and various images including photographs taken by a digital camera to editable formats for changing and repurposing.

    2) Tesseract is a free optical character recognition engine. It was originally developed at Hewlett-Packard from 1985 until 1995. After ten years with no development, Hewlett Packard and UNLV released it in 2005. Tesseract is currently developed by Google and released under the Apache License, Version 2.0. The current version of Tesseract is 2.01, released August 30, 2007.

    3) The SimpleOCR freeware demonstrates the power of our engine and is the only OCR application that is completely free.

    Hope this is of help to the person whose name you cannot pronounce…

  4. As good as Omnipage is, I find that ReadIris Pro is excellent. Amazingly, I sometimes use the older TextBridge Pro (precurser to Omnipage) because it formats complex documents really well. By complex, I mean columns and offsets with fotos.

  5. Personally, I really like the ABBYY FINE READER software package that came bundled with my Epson 6400. The program is awesome!

  6. A basic version of Read-Iris is bundled with most HP Inkjet all-in-one printers and the full version of Read-Iris Pro is bundled with select HP Scanjet scanners. The pro version is excellent, as it is the culmination of something like 30 years of research and development. It costs under $130 now, which is well worth the money if you have hoards of documents you need to update on a regular basis.

    As for the bundled copy that comes with the HP printers, it’s basicly just something that is there to show the customer that indeed it is possible for their AiO to do OCR. As many all-in-ones can be had for a whopping $40, it wouldn’t add any value to the products for HP to include niche-market software with them. This is especially true now that the full software is so inexpensive (it used to REALLY expensive not all that long ago). However, with great free programs like HP Photosmart Premier (photo management and manipulation program) and their snapfish online service, it’s not like HP is just selling you the hardware and throwing you to the wolves!

    Your friendly HP tech support personel, at 1-800-HPINVENT will be more than happy to show how to install and test the functionality of the OCR software bundled with an in warranty HP all-in-one printer. However, they’ll also politely set your expectations and point you in the direction of Read-Iris Pro if OCR is something you are serious about doing on a regular basis. Again, Read-Iris Pro is worth the money if OCR is something you need – and it works great in combination with an all-in-one printer with an automatic document feeder!

    Read-Iris. Check it out,

    Incidently, if you don’t need to edit a document and you’re just making a digital archive, your best bet is to scan it to a pdf (Postscript Document Format) file, as it will retain the formatting and store as single file with multiple pages in high quality. The pdf format is great, as it is compatible with all major operating systems via free software and will likely be supported for many years to come in its current state.

  7. I also use ABY Fine Reader, and it is excellent. I use it to OCR entire books and then convert them to audio using TextAloud ( and when scanned at 300 or 400dpi, the accuracy is amazing. (TextAloud is fantastic, too, for converting text to audio.)

  8. This might be a stretch, but I think Microsoft OneNote is worth a mention here. You can import pics and OneNote will OCR them in the background, plus the data is searchable. OneNote Mobile comes with OneNote for free and works on a Windows Mobile device: using that app and your device camera, you can snap and sych pics which then get OCR’d like any other imported pics. This definitely wouldn’t work if you need to manipulate the OCR’d text, but it’s a great solution in certain circumstances.

  9. I agree with Leo. ABBYY Fine Reader is very good. I got my copy free with PC Plus a couple of years ago, fully registered!

  10. A really good and free OCR package is TopOCR. A very interesting feature of this software is that it not only works with scanners but also digital cameras. It also has a text to speech interface that I use to convert images to MP3 files that I then listen to on an iPod on the train every morning. You can find TopOCR at

  11. If you are not interested in Layout Retention the most powerfull and acurate OCR engine is RecoStar Full Page Reader.
    RecoStar Full Page Reader makes scanned documents and faxes searchable. He is capable of processing all types of documents, but is particularly suited for the processing of business-related documents.
    For more than 15 years, RecoStar is reknown for its robustness and reliability. RecoStar is standard in almost all applications defined as “mission-critical”.

    For further information see:

Leave a Reply

Your email address will not be published. Required fields are marked *