Subject: Re: Individualizing the news From: R Ballard Date: Sun, 30 Apr 1995 23:50:09 -0400 (EDT)
How the Web Was Won
Subject: Re: Individualizing the news From: R Ballard Date: Sun, 30 Apr 1995 23:50:09 -0400 (EDT)
In-Reply-To: <199504200014.SAA09691@marketplace.com>
Message-ID: 
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII



On Wed, 19 Apr 1995, amandell wrote:

> 
> Mike wrote:
> "I still like the idea sending readers either notices of stories they  
> may be interested in, or emailing them the actual stories. Though  
> personally, I like notifing them that the site has a story that of  
> interest to them rather than emailing the story"

There are 3 common variations of this:

Queries-
	Query engines, similar to those used by Mead (Nexus/Lexus) Dow 
Jones (News Retrieval System), or WAIS (Wide Area Information Server) can
quickly index each article of plain-text so that each significant word 
can be used to get the optimal "newspaper".  The user then selects from a 
list of headlines which stories would be most interesting.

WAIS also supports indexing of HTML documents.  Most databases also offer 
databasing aids such as "Catagory Codes" and "Industry Codes" and
"Company Codes"(CUSIP & Ticker Symbols).

Clipping-
	When a real-time componant is needed, each story can be sent through 
an engine that compares it against queries put on file by the customer.  
This requires a bit more processing, due to the requirement of running 
500+ queries against each story to "see if it fits".  This is also 
sometimes called a profiler and the queries are called profiles.

Feed Distribution-
When a customer is hungry for large numbers of stories, it is often more 
convenient to sent them the text and let them do the profiling.  Most 
will also keep their own databases as well.  Wire services and financial 
institutions are heavy consumers of this type of service.  Some feeds
such as Dow Jones' abstracts only sends digests of the article.

Some internet equivalents are:
The WAIS appendage to the Web Browsers.  This is a very low-cost method 
of enabling the user to search thousands of databases with one query.  
The Dow Jones' "DowQuest" product uses these engines to simultaneously 
search 2500 databases in a few seconds.

Clipping-
Primitive clipping can include features like "grep", "awk", or "perl" 
with simple front-ends.  Unix manages multiple processes very efficiently 
and message queues or pipes and tees work well.  Hits can be emailed back 
to the user in real-time.

Feeds-
E-Mail is very inefficient for processing feeds.  This is especially true 
if the user is on netcom.com or one of these other 1000+ user 
hosts/networks.  NNTP (News) provides near real-time and content can be 
encrypted with keys being encryption mailed to each subscriber.  Offering 
good "bulk rates" to the Internet Access Provider can get you a nice 
market comparable to clarinet.  You can post digests to the newsgroups 
along with URLs to your web pages.

> body) of  all the articles of the  6 Italian major newspapers and 1  
> weekly magazine.   My routine is that I read very quick the headlines  
> and go to read the full-text on the web or in Nexis, only of the   
> articles that I am interested in.  I like the idea that news come to  
> me and not the opposite.  It is like skimming through the  headlines  
> of the paper-newspaper delivered to my home; but the difference is  
> that I skim through 6 newspapers now (and very  quick too).

Digests are very useful.  It is a good idea to break up each item so that 
users can "search" the mail using the find features of their mail.  
Again, news-groups are better if you are shipping digests to hundreds of 
users on the same host (netcom.com, aol.com, prodigy.com, ...)  A good 
rule of thumb is that you should never send more than 1 megabyte to any 
single host.

> What I do not like, as a daily news consumer, instead, is the idea  
> that there should be a pattern of personalization of the content I am  
> interested in, so that I should pay for someone (human or searching  
> engine)  who  chooses the stories for me.

Besides, it is very expensive logistically.  Good commercial profilers 
run about $25,000/server and each server can only serve about 500 users 
for a large feed.  Optimization of "searched words" and other schemes
can increase this substantially, but again there is a finite number of 
bytes/second that can be compared, and the disk drive can only spin at a 
certain speed.

If you can't send the whole feed, choose an appropriate subset (digests). 
Even with 2 gig drives, they will probably want to cache only a few hours
worth of data, the rest they will search using a query engine. 

> What  drives me toward buying an article for  
> research purposes is a mix of  interest in the topic and trust in the  
> professional organization which I I ask to do the search for me.
If you have ever requested a search from the corporate librarian, you 
know that it is very easy to get insufficient or incorrect information.  
The information you want is probably there, but she is pruning her 
queries so as to reduce the count to only 10 articles (about $50.00 worth 
on NRS) in less than 2 minutes (about $2/minute).

I've never met anyone who could read fast enough to keep up with a 
DowVision feed, which is only driving a 19.2Kb X.25 link.  About 200 
megabytes/day flows through the internet (News, Email-lists, Chat, and 
New Web Pages).

Another way of putting it is you have a year's worth of reading coming 
through every day and the challenge is to boil that down to something 
that can be read in less than two hours.  When commercial internet first 
hit the market, the biggest concern was not security, it was that 
employees could spend 8 hours/day reading "News".  Employees and 
Employers have found that 1 hour spent reading the right kind of news can 
contribute several thousand dollars to the bottom line, on the other 
hand, at $60/hour for a lawyer, financial analyst, or engineer, 20 
hours/week spent reading alt.sex.* groups can turn into a $1200/week 
"hobby".

Rex Ballard (S&P)


From rballard@cnj.digex.net Sun Apr 30 23:58:15 1995
Status: O
X-Status: