Lies, Damned Lies and Statistics

Some decades ago, some serious scientists “proved” that it was impossible for a bumblebee to fly.

All that proved is that the scientists didn’t know everything that needed to be known.

Pretty much the same sentiment ought to hit you when you hear about a study that “proves” that file-sharing doesn’t affect record sales.

This will become a legendary study in some circles, but of those who have and will cite in, not one in a hundred will read the study, and not one in a thousand will comprehend it.


Conceptually, the question is simple, “On the whole, do people who download music buy more or less music than they would have if they didn’t download?”

Finding the answer is not so simple, because you’re inherently dealing with a “what if.” Until we can access alternate universes or something like that, we’ll never have a perfect answer.

So we must come up with approximations, and operate with certain assumptions. This is not easy to do, and no matter how you approach it, the results will be arguable.

Those who put together this study basically tracked as best they could (they ended up tossing over 80% of the data gathered because they couldn’t identify the download with the song) from a sampling of downloaders, and compared their downloads to record sales.

To keep this really, really simple, I’ll just mention a few of the assumptions they made.

First, they assumed that it was a record sale OR a download. This obviously doesn’t account for those who use P2P to sample music, then buy.

Second, they assumed that the length of a song had a significant deterrent effect on someone downloading a song. Yeah, most Zeppelin fans I know refuse to download “Stairway to Heaven” because it’s too long.

Third, they assumed based on German students off on school holidays, that record sales ought to go down because they were downloading more, and because they didn’t, there must not be any effect.

Well, when you have more free time, yes, you have more free time to download, but you also have more free time to do other things, too, like go to the record store for some Christmas or post-Christmas shopping? Oh, there was no effort to track how many albums the people being tracked bought, only total record sales (which do have a habit of going up towards the end of the year).

There are many, many other assumptions made in this study. Some are reasonable, others are as dubious as the ones mentioned above. It really looks like a house of cards, and I’m afraid I’m insulting the solidity of houses of cards by saying that: assumption built on assumption built on assumption. Even if this survey had concluded that filesharing chewed up record sales, I’d doubt the “proof” of that just as much.

Nor does it help the credibility of the study for its authors to spend many more words praising file-sharing and including the URL of an anti-RIAA site than justifying or even explaining the validity of their key equations.

Of course, the use of this survey won’t be scientific at all. It will be used as a political weapon, a factoid to be tossed in opponent’s faces. People aren’t interested in the truth here. If this survey had said filesharing hurt record sales, do you think 99%+ of those praising it now would be doing the same thing? Of course not.

Other Views

There’s some other recent statistical work you probably haven’t heard about, probably because the results weren’t quite politically correct. One recent thesis concluded that P2Ping had a negative effect on record sales for the young but a positive effect for the old.

Again, though, it seems like data is being pulled out of thin air and important factors are being left out). For instance, Internet access does not mean P2P access. A 50-year-old may be just about as likely to have a computer and Internet access as a 15-year-old, but that doesn’t make him about as likely to have Kazaa.

Work done based on survey information (i.e. people filling out questionaires), on the other hand, generally show that a general decrease in the number of albums bought, though that decrease only explains a portion of the decline in CD sales the last few years.

When you have results all over the place, something’s wrong somewhere. This isn’t necessarily the statistician’s fault; this is a hard phenomenon to measure, again, because of the “what if?” and because record buying is often a lot more whimsical than, say, home sales.

Statistical analysis has its limits. It’s based on levels of abstraction and simplifications of data meant to measure relatively simple situations. Sometimes the abstractions and simplifications go too far or (more likely), this situation becomes too complex to be accurately measured.

That is precisely what happened with the bumblebee. All the “laws” that predicted the bumblebee couldn’t fly are based on the equivalent simplifications and abstractions as they apply to flight. The bumblebee has certain characteristics too complex to be modeled into a relatively simple “law.”

It’s much the same here. When that’s the case, you can generate all the statistical reports you like, but the value of what you get is the same as a certain old computer adage.

Garbage In, Garbage Out.

The Last Word

One useful analysis of this conflicting data can be found here. It’s written by Professor Edward W. Felten of Princeton University, and well, read what he has to say first.

Let me emphasize one very important point Professor Felten makes:

“But what happens in the future? It all depends on what happens to today’s Free-riders. Perhaps today’s Free-riders will mature into Samplers, to be replaced by a new generation of Free-riders, so that the effects of the two groups continue in a rough balance. Or perhaps today’s Free-riders, never having known anything else, will keep Free-riding as they get older, and the balance will tip toward Free-riders.”

As Hamlet once said, “That is the question.”

From the survey information available today, even if you give the content makers all the benefits of the doubt, it’s hard to claim that P2Ping is responsible for all or even most of the drop in record sales. It’s reasonable to assume that it’s responsible for some of it, but not more than that.

However, the future is a different story, and nobody know which way this is going to pan out.

We are (or soon will be) capable of providing any and all entertainment on demand freed from the prior contraints of space, time, and matter. But just like the Internet didn’t free the dot.coms from having to make money, none of this changes in the least the very old-fashioned need to feed the entertainment goose that lays the golden eggs.


Be the first to comment

Leave a Reply