OCR Service at a Public Library?

Lolis, John jlolis at WHITEPLAINSNY.GOV
Fri Jul 8 12:00:38 EDT 2016


Thanks much for the suggestions.  I was surprised to find that tesseract
was installed by default on my Ubuntu workstation.  I tested it with a
couple of image files that had nothing but text in them.  The first
consisted of text using the sans-serif system font, and it wasn't as
accurate as I had hoped.  I then tried Times New Roman, and it appeared to
like that better, but still the results were disappointing.  For quick and
dirty OCR, I'm going to keep it in mind, but I'd rather offer something
much more accurate to the public.

The first year cost quoted to me for ABBYY Recognition server is $2,300,
thereafter $450/year for up to 100,000 pages, and it provides many more
features (hopefully in addition to much better accuracy).  Since that
wouldn't break the bank, I'll at least give the 30-day trial a go.  But
what I'm really looking for is a workflow solution that someone may have
implemented.  Recognition Server can accept email attachments for
processing, but I'll have to find a way to implement some form of access
control, and then come up with a way for patrons to retrieve their output.

I'd like to eventually offer an online OCR service to patrons who would
have to log in to upload image files (perhaps using EZProxy?).


John Lolis
Coordinator of Computer Systems

<http://whiteplainslibrary.org/>
100 Martine Avenue
White Plains, NY  10601

tel: 1.914.422.1497
fax: 1.914.422.1452

http://whiteplainslibrary.org/


On Thu, Jul 7, 2016 at 6:00 PM, Ted Koppel <tpk at auto-graphics.com> wrote:

> There’s also FreeOCR      http://www.freeocr.net/, which was built on the
> Tesseract engine, but is a Windows executable.
>
>
> Ted
>
>
>
>
>
> *From:* Web technologies in libraries [mailto:WEB4LIB at LISTSERV.ND.EDU] *On
> Behalf Of *Steffen Schilke
> *Sent:* Thursday, July 7, 2016 5:09 PM
> *To:* WEB4LIB at LISTSERV.ND.EDU
> *Subject:* Re: [WEB4LIB] OCR Service at a Public Library?
>
>
>
> Dear John,
>
> you might want to have a look at Google Tesseract
> https://github.com/tesseract-ocr so you have no costs at the start for
> such a server/ service. If your IT is good a Linux box could do the job.
>
> Kind regards
>
> sws
>
> Am 07.07.2016 23:04 schrieb "Lolis, John" <jlolis at whiteplainsny.gov>:
>
> (apologies for cross-posting)
>
>
> We're looking to digitize our microfilm collection, and I thought that we
> might want to go a step further and offer an OCR service to our patrons.
>
> Is there anyone out there who is making OCR available to the public?  And
> if so, how are you going about that?  Is it simply through a standalone
> scanner workstation?  Do you charge a fee?
>
> In particular, I'm looking into ABBYY's Recognition Server (
> https://www.abbyy.com/recognition-server/) for the initial digitization
> project and going forward, for an OCR service.
>
> Many thanks,
>
>
>
> John Lolis
>
> Coordinator of Computer Systems
>
>
>
> <http://whiteplainslibrary.org/>
>
> 100 Martine Avenue
>
> White Plains, NY  10601
>
>
>
> tel: 1.914.422.1497
>
> fax: 1.914.422.1452
>
>
>
> http://whiteplainslibrary.org/
>
>
>
> ============================
>
> To unsubscribe: http://bit.ly/web4lib
>
> Web4Lib Web Site: http://web4lib.org/
>
> 2016-07-07
>
> ============================
>
> To unsubscribe: http://bit.ly/web4lib
>
> Web4Lib Web Site: http://web4lib.org/
>
> 2016-07-07
> ============================
>
> To unsubscribe: http://bit.ly/web4lib
>
> Web4Lib Web Site: http://web4lib.org/
>
> 2016-07-07
>

============================

To unsubscribe: http://bit.ly/web4lib

Web4Lib Web Site: http://web4lib.org/

2016-07-08
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.nd.edu/pipermail/web4lib/attachments/20160708/d955a647/attachment.htm>


More information about the Web4lib mailing list