ocr software
Justin R Ervin
jrervin at uncg.edu
Fri Feb 6 11:55:28 EST 1998
>>> I NEED TO BUY A VERY GOOD OCR SOFTWARE FOR WINDOWS NT 4.0,
[...]
>> Paolo: We recently purchased Caere's OmniPage Pro 8.0 and are very
>> happy with it.
[...]
> Does this program scan forms?
OP works with documents in four steps: scan the page, recognise zones
(groups) of text, recognise the text itself, and export the text. An OCR
Wizard can perform all these steps automatically.
I inserted a North Carolina Individual Income Tax Return, form D-400
(You can view one online at http://www.dor.state.nc.us/DOR/downloads/;
choose Form D-400. Careful! It's in PDF format!) into my scanner and ran
the OCR Wizard. It asked about the layout of the original page and gave me
a few choices with descriptions: single column (pages with one block of
text)? multiple columns (pages with separate blocks of text and pictures,
like newspapers or magazines)? table or spreadsheet (pages with text
arranged by rows and columns such as financial forms)? mixed page layout?
I chose table or spreadsheet. It next asks to what degree I want to retain
the original page's appearance: remove all formatting? retain font and
paragraph formatting? retain font, paragraph, and column formatting? use
frames to retain the original appearance as closely as possible (I chose
this one.)?
The light grey shading really threw OP off. It added a lot of periods and
ellipses in strage places (in addition to the periods that associate
boxes with lines of text); the spacing turned out really weird; it wasn't
able to deal with the vertical text and didn't translate the boxes well. I
estimate that it probably would've taken me about an hour to scan this
form and clean it up; inserting the boxes would've added considerably to
this estimate.
When I scanned the form and zoned the page manually, the result wasn't
much better. I didn't find any place where I could tell OP that I had
vertical text to deal with; a quick glance in help revealed nothing.
I was able to specify whether zones contained alphanumeric, numeric,
graphical, or tabular information. After zoning, I told OP to perform OCR
(convert the image and zones into paragraphs of text); it admitted that
the document was too complex to display and suggested that I go ahead and
export; I had a wide range of file formats from which to choose, including
several versions of Word and WordPerfect; a handful of other
types of applications (Excel, 1-2-3, Harvard Graphics, Quattro Pro,
PowerPoint, MS Publisher, etc); several types of ASCII and ANSI text and
RTF. I exported the document to MS Word 97 and found that the results
weren't much better than when I let OP do it's own thing.
To be fair, I've had no trouble at all scanning standard letters, memos,
and other simple documents. I think that OmniPage Pro 8.0's strongest
points are its ability to recognise multiple languages (It asks which
languages you want to install at setup.) and its faster and more accurate
recognition capabilities (as compared to OmniPage 5.x).
I hope that this info helps!
=================Justin R Ervin==================
Computing Support Technician I
Jackson Library Electronic Information Resources, UNCG
jrervin at uncg.edu http://www.uncg.edu/~jrervin/
More information about the Web4lib
mailing list