Subject: Re: Digitizing text From: gspaff@execpc.com (George Spafford) Date: Fri, 28 Apr 1995 14:15:36 -0400
How the Web Was Won
Subject: Re: Digitizing text From: gspaff@execpc.com (George Spafford) Date: Fri, 28 Apr 1995 14:15:36 -0400
Sender: owner-online-news@marketplace.com
Precedence: bulk
Status: RO
X-Status: 

>We've got 138 years of back issues, about 135 of which exist only in hardcopy
>form. We'd like to digitize the entire archive, so that we make it available
>in various ways as a research tool or editorial product etc online (the web,
>maybe) and/or on cd-rom. This is a long-term project--and as you can
>imaginethere are numerous other issues we've got to resolve-- but right now
>the question is, What's the fastest and cheapest way to render 150,000 pages
>of Atlantics into digitized form. We figure that's about 14 person-years of
>typing. Scanning might be a little faster for the issues from 1950 to the
>present, but we've had little luck scanning the older issues which were
>typeset so distinctively that the scanner doesn't convert them at all
>accurately. It's proven faster to type them than to scan and then clean them
>up.

This may be an already considered question, but are you planning on having
full text search on everything?  Could you scan in the pages as images
(TIFF, etc) for the older copy and then starting with full text search on
the newer ones?  No matter how you look at it, you have a big job in front
of you - good luck.
--G--


From owner-online-news@marketplace.com Fri Apr 28 18:26:02 1995
Received: from marketplace.com by cnj.digex.net with SMTP id AA05777
  (5.67b8/IDA-1.5 for ); Fri, 28 Apr 1995 18:25:55 -0400
Received: (from majordom@localhost) by marketplace.com (8.6.12/8.6.12) id NAA14380 for online-news-outgoing; Fri, 28 Apr 1995 13:32:11 -0600
Received: from interport.net (interport.net [199.184.165.1]) by marketplace.com (8.6.12/8.6.12) with ESMTP id NAA14374 for ; Fri, 28 Apr 1995 13:32:08 -0600