Date: Sun, 30 Apr 1995 23:50:09 -0400 (EDT)
In-Reply-To: <199504200014.SAA09691@marketplace.com>
Message-ID:
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
On Wed, 19 Apr 1995, amandell wrote:
>
> Mike wrote:
> "I still like the idea sending readers either notices of stories they
> may be interested in, or emailing them the actual stories. Though
> personally, I like notifing them that the site has a story that of
> interest to them rather than emailing the story"
There are 3 common variations of this:
Queries-
Query engines, similar to those used by Mead (Nexus/Lexus) Dow
Jones (News Retrieval System), or WAIS (Wide Area Information Server) can
quickly index each article of plain-text so that each significant word
can be used to get the optimal "newspaper". The user then selects from a
list of headlines which stories would be most interesting.
WAIS also supports indexing of HTML documents. Most databases also offer
databasing aids such as "Catagory Codes" and "Industry Codes" and
"Company Codes"(CUSIP & Ticker Symbols).
Clipping-
When a real-time componant is needed, each story can be sent through
an engine that compares it against queries put on file by the customer.
This requires a bit more processing, due to the requirement of running
500+ queries against each story to "see if it fits". This is also
sometimes called a profiler and the queries are called profiles.
Feed Distribution-
When a customer is hungry for large numbers of stories, it is often more
convenient to sent them the text and let them do the profiling. Most
will also keep their own databases as well. Wire services and financial
institutions are heavy consumers of this type of service. Some feeds
such as Dow Jones' abstracts only sends digests of the article.
Some internet equivalents are:
The WAIS appendage to the Web Browsers. This is a very low-cost method
of enabling the user to search thousands of databases with one query.
The Dow Jones' "DowQuest" product uses these engines to simultaneously
search 2500 databases in a few seconds.
Clipping-
Primitive clipping can include features like "grep", "awk", or "perl"
with simple front-ends. Unix manages multiple processes very efficiently
and message queues or pipes and tees work well. Hits can be emailed back
to the user in real-time.
Feeds-
E-Mail is very inefficient for processing feeds. This is especially true
if the user is on netcom.com or one of these other 1000+ user
hosts/networks. NNTP (News) provides near real-time and content can be
encrypted with keys being encryption mailed to each subscriber. Offering
good "bulk rates" to the Internet Access Provider can get you a nice
market comparable to clarinet. You can post digests to the newsgroups
along with URLs to your web pages.
> body) of all the articles of the 6 Italian major newspapers and 1
> weekly magazine. My routine is that I read very quick the headlines
> and go to read the full-text on the web or in Nexis, only of the
> articles that I am interested in. I like the idea that news come to
> me and not the opposite. It is like skimming through the headlines
> of the paper-newspaper delivered to my home; but the difference is
> that I skim through 6 newspapers now (and very quick too).
Digests are very useful. It is a good idea to break up each item so that
users can "search" the mail using the find features of their mail.
Again, news-groups are better if you are shipping digests to hundreds of
users on the same host (netcom.com, aol.com, prodigy.com, ...) A good
rule of thumb is that you should never send more than 1 megabyte to any
single host.
> What I do not like, as a daily news consumer, instead, is the idea
> that there should be a pattern of personalization of the content I am
> interested in, so that I should pay for someone (human or searching
> engine) who chooses the stories for me.
Besides, it is very expensive logistically. Good commercial profilers
run about $25,000/server and each server can only serve about 500 users
for a large feed. Optimization of "searched words" and other schemes
can increase this substantially, but again there is a finite number of
bytes/second that can be compared, and the disk drive can only spin at a
certain speed.
If you can't send the whole feed, choose an appropriate subset (digests).
Even with 2 gig drives, they will probably want to cache only a few hours
worth of data, the rest they will search using a query engine.
> What drives me toward buying an article for
> research purposes is a mix of interest in the topic and trust in the
> professional organization which I I ask to do the search for me.
If you have ever requested a search from the corporate librarian, you
know that it is very easy to get insufficient or incorrect information.
The information you want is probably there, but she is pruning her
queries so as to reduce the count to only 10 articles (about $50.00 worth
on NRS) in less than 2 minutes (about $2/minute).
I've never met anyone who could read fast enough to keep up with a
DowVision feed, which is only driving a 19.2Kb X.25 link. About 200
megabytes/day flows through the internet (News, Email-lists, Chat, and
New Web Pages).
Another way of putting it is you have a year's worth of reading coming
through every day and the challenge is to boil that down to something
that can be read in less than two hours. When commercial internet first
hit the market, the biggest concern was not security, it was that
employees could spend 8 hours/day reading "News". Employees and
Employers have found that 1 hour spent reading the right kind of news can
contribute several thousand dollars to the bottom line, on the other
hand, at $60/hour for a lawyer, financial analyst, or engineer, 20
hours/week spent reading alt.sex.* groups can turn into a $1200/week
"hobby".
Rex Ballard (S&P)
From rballard@cnj.digex.net Sun Apr 30 23:58:15 1995
Status: O
X-Status: