FAIR USE NOTICE. This document contains copyrighted material whose use has not been specifically authorized by the copyright owner. The Managerial Economics course is making this material available as part of our mission to promote critical thinking about economic issues. We believe that this constitutes a `fair use' of the copyrighted material as provided for in section 107 of the US Copyright Law. If you wish to use this copyrighted material for purposes of your own that go beyond `fair use', you must obtain permission from the copyright owner.
Privacy, Shmivacy.
Corporations don't want to know us, they want to know our data
Bret Dawson
Stay Free!
THE DIGITAL RACE
(Data Mining For Demand Analysis)
Chuck Petrakis is excited. "Let me give you an example," he says. "Here's Mrs. Smith, who's just been in to see her
doctor for a checkup. Say you're her insurance company. You run her records, and the software tells you that she's
likely to develop diabetes. Well, that gives you an opportunity to be really proactive."
The software Petrakis is talking about is a new package from Orlando's MedAI. "Chronic Disease Identification," as it's
called, represents the cutting edge of healthcare data systems; it crunches through huge volumes of medical records
and, using artificial-intelligence algorithms, actually predicts life-threatening diseases. For American health
insurers, this is big news.
"As an insurer," Petrakis continues, "you can say, `Well, we need Mrs. Smith to do ten or fifteen things right now. We
need her to do something about her weight, we need her to watch her intake of certain foods, we need her to make
appointments for blood work,' and so on.
"Most insurance companies already have call centers set up. So you could have a nurse or a clinician phone up Mrs.
Smith--to find out what she's been eating, to find out whether she's sticking to her program, to set up appointments
for her, to monitor her progress," Petrakis says. "You don't have to wait for her to call; you can get moving right
away. And with this software, you can take those steps before she develops diabetes. That's a much more cost-effective
way to handle things."
As MedAI's sales director, Petrakis unveiled the software this spring at a medical industry conference. He won't talk
numbers, but slyly assures me that CDI caught the undivided attention of several big insurance industry players. He's
betting on a big sales year.
"Insurance companies," he says, "are already very proactive. Not just in terms of getting the best possible care but in
doing that in the most cost-effective manner possible. And this software lets them take that to the next level."
This, I think, hanging up the phone, is an obscene understatement. Health care in the U.S. is an unabashedly
moneymaking undertaking, and HMOs and insurance companies make no bones about their focus on the bottom line. Thus
those call centers: When you're diagnosed with an expensive illness in the States, you can expect to be harassed about
your lifestyle, lest you cost your insurer unbudgeted-for dollars. In HMO marketing-speak, this is variously called the
"wellness" or "managed care" approach.
But CDI isn't just about managing the care of people with serious illnesses. It's about computers deciding who's going
to get sick: not by examining patients but by playing statistical games with data. None of this is unique to MedAI; the
direct-marketing industry has used similar techniques--massaging personality profiles out of large databases--for some
time now.
The real issue here, though, is not that an insurance company can fool around with Mrs. Smith's medical records, or
target her for junk mailings. It's about the way our society is undergoing a fundamental shift. It's about how
institutions--banks, hospitals, governments--would now rather deal with our data rather than with us. Our digital
profiles are taking on traits that may have nothing to do with our real-world selves, and these profiles are now
beginning to live our lives for us.
Back in October, Undercurrents (the Canadian media and technology program I used to work for) sent me to Ottawa for a
day to attend Privacy International's "Advanced Surveillance Technologies II" conference. In a plain hotel meeting
room, I listened to lectures from and rubbed shoulders with some of the rising stars of privacy activism: Phil Agre,
Simon Davies, David Banisar, Ann Cavoukian, that Garfinkel guy who writes for Wired. There was a lot of frightening
talk, but also a lot of backpatting; people casually talked about privacy issues as "the new environmentalism."
It's true: Those people are certainly succeeding in raising public awareness, if only in the
ozone-layer-good-styrofoam-cups-bad way that the environmental lobby has succeeded. So, in true Earth Day fashion,
we're facing a rising tide of media scaremongering. Articles in Wired, The New York Times, Details, and a parade of
blue-ribbon-bedecked websites all approach the issue in similar ways--they rattle off long lists of ways people can
find out stuff about you. Your grocery-store loyalty card tracks where and when you bought what. Surveillance cameras
follow you almost everywhere you go. Your bank pays attention to where you use ATMs. Your credit file is widely
distributed. And, uh, that sucks.
What's largely escaped the mass-media take on privacy is something called "database enrichment," or "merge-and-purge."
This is a big oversight because merge-and-purge turns a person's dataset into a fully formed electronic character
sketch.
Here's how it works: Take two or more databases and combine them. (For this example, let's use U.S. News and World
Report's subscriber list and a "wealthy householders" list available from any number of national brokers.) Keep only
the names that appear on both lists. Now combine the new database with another--say, a list of households with no
children--and, again, keep only the names that appear on both lists. As the process goes on, and the list is enriched
with more and more information, a sort of superrecord emerges. It's now a list of profiles, a set of personalities, if
you will. (This is no idle example, by the way. USNR is currently engaged in a multitrial legal battle with a Virginia
man named Ram Avrahami over precisely this practice.)
This business--the creation and sale of electronic profiles--has turned firms such as Chicago's Metromail into
multimillion-dollar operations. Metromail got its start back in 1948 as a printer for mass mailings--a "lettershop."
Over the following decades, the company branched into direct mail and database marketing, selling mailing lists and
list-processing services. Today, it's a multifaceted data-and-marketing powerhouse, with $281 million in annual sales
(U.S.), and more than 3,000 employees. It collects and purchases personal data from a variety of sources--public
records, surveys, warranty-card registrations, the U.S. Postal Service's change-of-address files, and so on--and sells
the data in a variety of forms: mailing lists, reference services, and, of course, merge-and-purge processing. The
company's central database now contains records on nearly every household in the United States. If you have a list of
names, and you want to attach income, marital status, home ownership, or nearly any other kind of data to those
names--in short, you want to buy a set of profiles--you go to a company like Metromail.
Tim Fitzpatrick, the firm's VP of corporate Communications, is a little defensive on the phone. He's heard the privacy
lobby's concerns, and he's got answers.
"It's very important," he says, "to keep one thing in mind. This is about finding out what your customers have in
common. I mean, marketers may care about knowing me, Tim Fitzpatrick, but they care a lot more about knowing about a
group of Tim Fitzpatricks. This is about learning what your customers' needs are, so that you can do a better job of
serving them. And this is the fundamental truth: customers' needs are being satisfied. People are voting with their
wallets."
If I understand him correctly, Fitzpatrick is saying this: yes, many businesses have detailed files about their
customers. Yes, it's easy to enhance those files with outside data sources. No, it's not the Big-Brother threat the
privacy lobby would have us believe. Nobody at your grocery store is looking at your individual purchase history and
saying, "Uh oh. That's the third time Bret's bought Preparation H this month." Customer data is important--and
useful--only in the aggregate. The data traders of the direct-marketing industry (and now, the healthcare industry)
aren't interested in knowing you at all. They're happy just deciding which aggregate your profile belongs in, and what
it says about your future behavior.
Roger Clarke runs an information-systems consulting business and is a visiting fellow at the Australian National
University in Canberra. In 1994, he wrote an article for The Information Society entitled "The Digital Persona and its
Application to Data Surveillance." The piece introduced one of the most interesting--and most widely ignored--concepts
in all of privacy activism. Here's the basic idea:
Think of the digital persona as the shadow you cast into cyberspace. It's a profile of you that grows more detailed as
databases are merged and as you interact with evermore systems. In time, this persona develops its own personality; it
makes certain kinds of purchases at certain times on certain days of the week, and it has an employment history. It's
been preapproved for a new credit card, and it subscribes to three or four magazines. It owes a bit of money on its
Visa, and a lot on its student loans. It uses Sprint for long distance.
It's a chilling thought. A profile with this kind of detail gives away a lot about what sort of person you are. But
there's more to it. Your digital persona doesn't just describe you, it is you.
Public life doesn't happen in streets and offices and shops anymore because the arena of public life--birth, school,
work, death--has moved. Public life is data. Making a purchase, applying for a job, voting, placing a phone call,
buying insurance--these are activities our digital personae now do for us by proxy.
"You can trace it back quite some distance," Clarke says, "through two trends. The first is the increasing intensity of
data exchange between people and institutions. The second, which doesn't necessarily involve technology, is a trend
toward centralization of authority. In the past, my bank manager had to know me before deciding whether I was worthy of
a loan. Well, in practice, that authority is no longer in that manager's hands."
Decision-making processes at financial institutions have become so automatic, so data-centric, so disinterested in the
personal details of their customers' transactions, that there's no longer any need for physical branches. Our digital
persona is now so detailed that machines are in positions to make decisions about our creditworthiness.
"We sometimes say that traditional database analysts sit in smart air," chuckles Rick Makos, the VP of sales and
marketing for the Toronto-based Angoss Knowledge Engineering. "They have to build a model, then test it, over and over
again. If you come up with the model that works, you must be sitting in smarter air. But it's kind of a backward
process. You shoot, then you aim. Our approach is more data-driven. It's more based on the reality that's in the data."
Makos is arguably at the cutting edge of "data mining": a new kind of information analysis that makes plain old
merge-and-purge look positively timid by comparison. Data mining uses artificial intelligence software to hunt for
patterns (in marketing-speak, "actionable characteristics") in large databases. The basic theory is simple: any large
set of data holds patterns, some of which are obvious, and some of which may not be. The goal is to have a computer
find those nonobvious connections and then exploit them to your financial advantage. (For example, early data miners at
a grocery chain found that people who buy diapers also tend to buy a lot of beer. The result was "Parties for
Parents.")
What really distinguishes data mining from ordinary database analysis is that data mining systems don't need
hypotheses. They don't need to be asked, "Is it true that people who buy diapers buy more beer?" They're designed to
answer tougher, more open-ended questions like, "Who buys a lot of beer?"
Practical data mining is only a few years old. It grew out of the wave of academic research into artificial
intelligence and that started in the early 1980s. At the time, algorithms for machine learning--decision-tree
generators and so on--existed only as theoretical concepts. But as high-test processors got cheaper, data-analysis
firms began to write AI into their custom software. Then, approximately five years ago, some marketing genius slapped
the name "data mining" on the process, and a new industry was born. Today, with the proliferation of Sun workstations
and Pentium Pro-based PCs, it has begun to show up in everyday business.
Makos has been in the trenches since the early days, doing database-query demonstrations for hardware companies, and
then data mining for banks as the owner of his own consulting firm. He joined Angoss in October of last year, working
on the company's pride and joy--a data-mining package called KnowledgeSeeker.
Among his biggest clients is the Canadian Imperial Bank of Commerce's "risk management" division. The CIBC uses the
package to track the bank's mortgage customers, finding, in Makos's words, "which buckets of behavior yield what
results."
Risk management being what it is, data mining at CIBC came to revolve around predicting which types of mortgage
accounts were most likely to slip into delinquency. Surprisingly, perhaps, the bank found that people with a history of
late mortgage payments were not those who tended to default. It was those who'd always paid on time but were suddenly
late with a single check who tended to fall into the deep end of the financial pool. For Jim Carswell, the bank's
managing director of credit scoring, the result meant a big shift of priorities.
"When we first discovered this, we thought it was an error," he says. "But then it dawned on us: Almost everyone falls
behind once in a while, if you're talking about credit cards or phone bills. But people who take their mortgages
seriously take them really seriously. Someone like that isn't going to miss a payment unless they're in some
difficulty."
What does this mean for individual mortgage holders? Well, he notes, if you've got a spotless record, and, for once,
you're two days late with a payment, you can expect the risk-management division to target you much more aggressively
than ever before. "It might make the difference between a form letter and a phone call. Or it might mean that we'd call
you today, rather that three days from now."
For reasons I don't quite understand, this grates on me. If my payment record has been flawless, I figure I'm owed the
benefit of the doubt when my check's a day or two late. And I certainly don't want the laggard who's always late to
have an easier time of it than me, no matter what KnowledgeSeeker thinks I'm going to do. "I understand your point,"
Carswell says, "and I can see why some people might be uncomfortable with that. But the reality is this: even in this
higher-risk group, people still pay you. My position is that, if we talk to people, work things out, maybe spot a
problem early, that's better for everyone."
In early March of 1997, a Boston-area woman named Wendy Eldredge found a mysterious envelope in her mailbox. There was
no return address on the envelope. Just a typewritten address, a California postmark, and a standard-issue 32-cent U.S.
stamp. Inside was a full-page ad, torn from a newspaper, for a weight-loss pill called "Berry Trim Plus." At the top of
the page, someone had written, "Wendy, try it. It works!" in blue pen.
"I was just crushed," she says. "Crushed. I mean, I've had two kids. I could lose twenty pounds. But that, oh, man. I
was crying leaving the post office, and my four year old was asking, `Mommy, what's wrong?' and I didn't even know what
to say to her.
"So then I thought, 'Okay. Some wacko's bought himself a mailing list.' So I got on the phone to Health Labs of North
America [the company selling Berry Trim Plus] to ask them if they knew that someone was using their ad like that. And
the woman I spoke to said, `Oh, I know. This is one of our advertising campaigns.' Well, I just blew up. I said, `How
dare you insult me like that?' And she said, `We're trying to help you.' "
This didn't sit well either, and Eldredge launched a private campaign against Health Labs. By the time it was over, the Boston Globe had run two separate pieces about her situation, and she'd become something of a local celebrity.
I've never met anyone who actually liked junk mail, so it's hardly surprising that Eldredge reacted so badly to the mailing. But it's telling, I think, to look at why she reacted the way she did. Eldredge is a self-described weight-loss candidate, and Health Labs was selling a weight-loss product, so it wasn't unreasonable for her name to appear on the mailing list. It had, no doubt, been cross-referenced across myriad other databases to make sure that her income, housing, marital status, and occupation fit the ideal demographic for Berry Trim Plus. In short, the data was correct.
The problem, I would argue, is that data is not a very good tool for describing real people. The digital persona is a complex thing; as it is merged-and-purged, mined and manipulated, it acquires character traits that may have nothing to do with its real-world namesake. In Eldredge's case, the error was only humiliating, but it's not tough to imagine a situation with much uglier consequences.
Before CDI, Florida's MedAI cut its data-mining teeth with something called the "Myocardial Infarction Predictor." (A myocardial infarction is a heart attack.) This is a software system designed to be used by emergency room doctors to diagnose patients suffering from severe chest pain.
According to the company's president, Steve Epstein, the package grew out of a genuine shortcoming in ER procedures.
"This is the thing," he says. "People tend to be over-conservative with chest pain. They'll spend huge amounts of money running really expensive tests trying to rule out heart attacks." In practical terms, he says, this means that a majority of the people admitted from emergency rooms into coronary-care units are not actually having heart attacks. The MI Predictor's job is to spot those people before they're admitted in to intensive care, before those costly tests are performed.
MedAI isn't actively marketing the MI Predictor. The system was developed, tested, and is now being used at a single
hospital in Florida. Regulatory difficulties will probably keep it there for the near future. But the project was
really intended as a kind of pilot project, one that set the stage for the nationwide rollout of CDI. That system's
goal, Epstein promises, is nothing short of revolutionary change.
"The old techniques are just not good enough. Before artificial intelligence, you only had actuarials--you might only
know that five people out of a given population are going to develop an illness. Now, you can know which five it's
going to be."
This is pure medicine-by-statistics, a frightening parallel universe where the digital persona's health determines the
real person's treatment.
"The digital persona is a bit like a voodoo doll," Roger Clarke says. "A kind of crude model of you that can be used,
from a distance, to put a curse on you."
As comforting as it might be to think so, the dangers of the digital persona--its arbitrariness, its inaccuracy--are
not just by-products of well-meaning data manipulation. In fact, there's an entire industry--something called
"segmentation"--whose sole job is to tack arbitrary personality types onto individuals' datasets.
The idea is to cut the population up into a few dozen categories, and people inside each will have similar incomes,
tastes, residences, and behaviors. Find out which segments like your products and you'll know which people to chase
with your new marketing campaign.
One of the biggest players in this field is "MicroVision," a product of Equifax National Decision Systems. (Yes, the
same Equifax that maintains your credit rating.) The system fits every household in the U.S. into one of fifty
demographic categories, each with a nickname like "A Good Step Forward" or "Metro Mix."
If you can give MicroVision a zipcode, the system will tell you which category it fits, and will happily provide lists
of other zipcodes where people behave in the same way. And if you want names for mailing lists, well, you can have
those, too. By selling these profiles, MicroVision does more than $40 million-dollars' worth of business each year.
MicroVision assembles its segments by merging hundreds of data sources, among them the 1990 U.S. census (which gives
average ages, incomes, and home ownership), consumer and financial data from the Equifax credit-rating databases, and a
lifestyle survey of 20,000 people which asks about data as specific as restaurants, oil changes, long-distance
companies, and TV shows.
The research is unquestionably thorough, but the results are an absolute howl to read. Here's a taste:
People in category #9, "Building a Home Life," are supposedly do-it-yourselfers who spend a good deal of money on home
improvement and car repair projects. "They also tend to eat dinner at upscale restaurants and watch college football
bowl games on television."
Those in segment #14, "Middle Years," "are the most likely to be a member of a frequent-flyer program, maintain a
municipal bond fund, own a hot tub and have a gold Mastercard. They also like to read travel magazines and listen to
all-news radio." "Stars and Stripes" eat at Taco Bell and play lots of Nintendo.
The system is positively elegant in its self-assured completeness. But as Eldredge discovered, people who look
identical in data often won't behave anything like each other in real life. These are the great thorns in the side of
the marketing industry: subtlety, unpredictability, fickleness. They're also the qualities that make us human.
Consider the official description of MicroVision segment #49, "Anomalies":
"Functionally, these zipcodes represent a small number of unusual areas which should not be included in a marketing
plan. While data exists for the zipcodes in this segment, by definition, they are not homogeneous and cannot be
expected to behave in a consistent manner."
In the midst of all this, the privacy lobby remains woefully focused on soft-headed scare stories: We're being tracked.
We're under surveillance. Big Brother is watching us. There oughta be a law.
The harpings of the digerati notwithstanding, it's more complex than that. We're seeing a profound shift in who
represents "us" in public space, and the consequences could be utterly devastating. Mrs. Smith, Wendy Eldredge, the
mortgage customers of CIBC, these are people who've lost control of their virtual personae, who've really had their
privacy violated.
In the September 25, 1996, issue of his email journal, Netfuture, Stephen Talbott--a prominent critic of
technology--described privacy as a fundamental respect for the sovereignty of others over their own affairs: as "a
certain willingness to lower one's eyes and hold sacred what one knows about the other person." I like this definition.
But it flies in the face of just about everything our economy holds sacred. As Talbott wrote in his FAQ on Computerized
Technology and Human Responsibility:
"It is possible--although it will be a tremendous stretch--for us to extend our gestures of human respect to the
abstract, placeless, and timeless data representations of other people. But it isn't conceivable that we will succeed
in this greater challenge while failing the lesser and more familiar one. We cannot--as programmers, application users,
corporate employees, consumers--enlarge our respect for persons to embrace data when we are forgetting what respect for
persons means in the first place."
Big Brother isn't watching us at all. He's playing with our voodoo dolls.