Subject: An Open Letter from Brian Reid About The Rimm Study From: kieran@interport.net (Aaron Dickey) Date: Fri, 07 Jul 1995 03:10:20 -0500
How the Web Was Won
Subject: An Open Letter from Brian Reid About The Rimm Study From: kieran@interport.net (Aaron Dickey) Date: Fri, 07 Jul 1995 03:10:20 -0500
Organization: Red Wigglers - The Cadillac of Worms!
X-Newsreader: Yet Another NewsWatcher 2.0b27.4
Sender: owner-online-news@marketplace.com
Precedence: bulk
Status: O
X-Status: 

Just as an example, I'm going to post this one relatively-short piece
about the Rimm study that TIME magazine covered.  

For those of you who don't know, Brian Reid of Digital Equipment
Corporation is generally regarded as the undisputed God of Usenet
statistics.  He is no amateur in this area.  Martin Rimm directly cited
Mr. Reid's work on several occasions in an attempt to bolster his own
study.  Given these two facts, it's pretty clear that Mr. Reid is
completely qualified to make the statements that he does below.

The following is an open letter; you are free to distribute it anywhere.

> --forwarded message follows--
> 
> 
>  From: Brian Reid 
>  Subject: Critique of the Rimm study
>  Date: Wed, 05 Jul 95 20:30:49 -0700
>  X-Mts: smtp
> 
>  I have read a preprint of the Rimm study of pornography and I am so
>  distressed by its lack scientific credibility that I don't even know
>  where to begin critiquing it. Normally when I am sent a publication for
>  review, if I find a flaw in it I can identify it and say "here, in this
>  paragraph, you are making some unwarranted assumptions". In this study
>  I have trouble finding measurement techniques that are *not* flawed.
>  The writer appears to me not to have a glimmer of an understanding even
>  of basic statistical measurement technique, let alone of the
>  application of that technique to something as elusive and ill-defined
>  as USENET.
> 
>  I have been measuring USENET readership and analyzing USENET content,
>  and publishing studies of what I find since April 1986. I have spent
>  years refining the measurement techniques and the data processing
>  algorithms. Despite those 9 years of working on the problem, I still do
>  not believe that it is possible to get measurements whose accuracy is
>  within a factor of 10 of the truth. In other words, if I measure
>  something that seems to be 79, the truth might be 790 or 7.9 or
>  anywhere in between. Despite this inaccuracy, the measurements are
>  interesting, because whatever unknowns it is that they are measuring,
>  these unknowns are similar from one month to the next, so that the
>  study of trends is meaningful. As long as you are aware of what it is
>  that you are taking the ratio of, it is also meaningful to compare
>  USENET measurements, because whatever the errors might be, they are
>  often similar in two numbers from the same measurement set, and they
>  are multiplicative, so they tend to cancel out in quotient.
> 
>  In other words, in the results that I publish, the two kinds of measurements
>  that are meaningful enough to pay attention to for serious scholarship
>  are the normalized month-to-month trends in the readership percentages
>  of a given newsgroup, and the within-the-same-month ratio of the
>  readership of one newsgroup to the readership of another. The reason
>  that I publish the numbers is primarily to enable trend analysis; it is
>  not reasonable to take a single-point measurement seriously.
> 
>  No matter what the level of accuracy you are seeking, it is imperative
>  that you understand what it is that you are measuring. Whenever you
>  cannot measure an entire population, you must find and measure a
>  sample, and the error in your measurement will be magnified if your
>  sample is not a representative sample. A small error in understanding
>  the nature of the sample population will lead to an error like the
>  famous "Dewey defeats Truman" headline in the 1948 US Presidential
>  election. A large error in understanding the nature of the sample
>  population can lead to results that are completely meaningless, such as
>  measuring pregnancy rates in a population whose age and sex are unknown.
>  Rimm has made three "beginner's errors" that, in my opinion, when taken
>  together, render his numbers completely meaningless:
> 
>      1. He has selected a very homogeneous population to measure. While
>         he has chosen not to identify his population, he has included
>         enough of his sample data to allow me to correlate his numbers
>         with my own numbers for the same measurement period. His data
>         correlate exactly with my numbers for Pittsburgh newsgroups in
>         that measurement period; only his own university (Carnegie-Mellon)
>         has widespread enough campus networking to make it possible for
>         him to sample that large a population. It is therefore almost
>         certain that he has measured his own university. I received my
>         Ph.D. in Computer Science from Carnegie-Mellon University, and I
>         am very aware that it is dominantly male and dominantly a
>         technology school.  The behavior of computer-using students at
>         a high-tech urban engineering school might not be very similar
>         to the behavior of other student populations, let alone
>         non-student populations.
> 
>      2. He has measured only one time period, January 1995. Having lived
>         at Carnegie-Mellon University for a number of years, I know
>         first-hand that student interests in January are extremely
>         different from student interests in September or April. When
>         measuring human behavior about which very little is known, it is
>         important to take numerous measurements over time and to look for
>         time series. Taking the last few years worth of my data and
>         doing a trend analysis in the newsgroups that he has named as
>         pornographic shows an average 3:1 seasonal trend change between
>         low-readership months (November and April) and high-readership
>         months (September and January). But the trends are different in
>         different newsgroups. A single-point measurement is not nearly
>         as meaningful as a series of measurements.
> 
>      3. He makes the assumption that by seeing a data reference to an
>         image or a file, it is possible to tell what the individual did
>         with the file. We in the network measurement business are very
>         careful to explain what it is that our measurements mean. Here
>         is the standard explanation that I publish with my monthly
>         measurements to talk about the number that Rimm calls "number
>         of downloads".
> 
>            To "read" a newsgroup means to have been presented with the
>            opportunity to look at at least one message in it. Going
>            through a newsgroup with the "n" key counts as reading it.
>            For a news site, "user X reads group Y" means that user
>            X's .newsrc file has marked at least one unexpired message
>            in Y.
> 
>         Rimm used my network measurement software tools to take his data,
>         and he did not anywhere in his article state that he had made changes
>         to them, so I must conclude that his numbers and my numbers are
>         derived from the same software. But the number that he is using for
>         "number of downloads" is the same number that I call "number of
>         readers" by the above definition. It has nothing to do with the
>         number of downloads. In fact, it is not possible for this
>         measurement system to tell whether or not a file has been downloaded;
>         it can tell whether or not a person has been presented with
>         the opportunity to download a file but it cannot tell whether the
>         user answered "yes" or "no".
> 
>  In summary, I do not consider Rimm's analysis to have enough technical rigor
>  to be worthy of publication in a scholarly journal.
> 
>  Brian Reid, Ph.D.
>  Director, Network Systems Laboratory
>  Digital Equipment Corporation
>  Palo Alto, California
>  reid@pa.dec.com
>  http://www.research.digital.com/nsl/people/reid/bio.html

From owner-online-news@marketplace.com Fri Jul  7 06:06:28 1995
Received: from marketplace.com (majordom@marketplace.com [199.45.128.10]) by cnj.digex.net (8.6.12/8.6.12) with ESMTP id GAA27944 ; for ; Fri, 7 Jul 1995 06:06:26 -0400
Received: (from majordom@localhost) by marketplace.com (8.6.12/8.6.12) id BAA23149 for online-news-outgoing; Fri, 7 Jul 1995 01:33:40 -0600
Received: from gatekeeper.mcimail.com (gatekeeper.mcimail.com [192.147.45.5]) by marketplace.com (8.6.12/8.6.12) with ESMTP id BAA23144 for ; Fri, 7 Jul 1995 01:33:37 -0600
Received: from mailgate2.mcimail.com (mailgate2.mcimail.com [166.38.40.100]) by gatekeeper.mcimail.com (8.6.12/8.6.10) with SMTP id HAA31457; Fri, 7 Jul 1995 07:29:14 GMT
Received: from mcimail.com by mailgate2.mcimail.com id aa29789;
          7 Jul 95 7:25 WET