ocr software

Justin R Ervin jrervin at uncg.edu
Fri Feb 6 11:55:28 EST 1998


>>> I NEED TO BUY A VERY GOOD OCR SOFTWARE FOR WINDOWS NT 4.0,
[...]
>> Paolo: We recently purchased Caere's OmniPage Pro 8.0 and are very
>> happy with it.
[...]
> Does this program scan forms?

OP works with documents in four steps: scan the page, recognise zones 
(groups) of text, recognise the text itself, and export the text. An OCR 
Wizard can perform all these steps automatically.

I inserted a North Carolina Individual Income Tax Return, form D-400 
(You can view one online at http://www.dor.state.nc.us/DOR/downloads/; 
choose Form D-400. Careful! It's in PDF format!) into my scanner and ran 
the OCR Wizard. It asked about the layout of the original page and gave me 
a few choices with descriptions: single column (pages with one block of 
text)? multiple columns (pages with separate blocks of text and pictures, 
like newspapers or magazines)? table or spreadsheet (pages with text 
arranged by rows and columns such as financial forms)? mixed page layout? 
I chose table or spreadsheet. It next asks to what degree I want to retain 
the original page's appearance: remove all formatting? retain font and 
paragraph formatting? retain font, paragraph, and column formatting? use 
frames to retain the original appearance as closely as possible (I chose 
this one.)?

The light grey shading really threw OP off. It added a lot of periods and 
ellipses in strage places (in addition to the periods that associate 
boxes with lines of text); the spacing turned out really weird; it wasn't 
able to deal with the vertical text and didn't translate the boxes well. I 
estimate that it probably would've taken me about an hour to scan this 
form and clean it up; inserting the boxes would've added considerably to 
this estimate.

When I scanned the form and zoned the page manually, the result wasn't 
much better. I didn't find any place where I could tell OP that I had 
vertical text to deal with; a quick glance in help revealed nothing. 
I was able to specify whether zones contained alphanumeric, numeric, 
graphical, or tabular information. After zoning, I told OP to perform OCR 
(convert the image and zones into paragraphs of text); it admitted that 
the document was too complex to display and suggested that I go ahead and 
export; I had a wide range of file formats from which to choose, including 
several versions of Word and WordPerfect; a handful of other 
types of applications (Excel, 1-2-3, Harvard Graphics, Quattro Pro, 
PowerPoint, MS Publisher, etc); several types of ASCII and ANSI text and 
RTF. I exported the document to MS Word 97 and found that the results 
weren't much better than when I let OP do it's own thing.

To be fair, I've had no trouble at all scanning standard letters, memos, 
and other simple documents. I think that OmniPage Pro 8.0's strongest 
points are its ability to recognise multiple languages (It asks which 
languages you want to install at setup.) and its faster and more accurate 
recognition capabilities (as compared to OmniPage 5.x).

I hope that this info helps!
=================Justin R Ervin==================
Computing Support Technician I
Jackson Library Electronic Information Resources, UNCG
jrervin at uncg.edu                 http://www.uncg.edu/~jrervin/



More information about the Web4lib mailing list