The data fight

The issues in the fight over telephone companies releasing data to the NSA aren’t so simple as they are being reported and spun under the dark cloud of privacy violation.

From what we know, data was released to the NSA so it could be analyzed to find patterns and thus to find anomalies that might lead to suspect communication and suspects, in turn. In other words, you can’t tell what’s abnormal until you define normal and we define normal.

If, in fact, it is aggregate data they are using to discover those exceptions, then we need to ask a new question that isn’t really being addressed in the networked world: Who owns the wisdom of the crowd? If the people own it, then one could argue that the government, acting as the people, may seek and use that data unless we, the people, forbid it through law. There is, of course, a proper debate about whether the law does allow it. There is also a proper debate over whether this is a necessary and prudent weapon in finding terrorists (and whether that is being done effectively). Indeed, a Washington Post poll says that 63 percent of Americans consider this an “acceptable way for the federal government to investigate terrorism.” And didn’t we protest that our government did not do a good enough job analyzing data and intelligence to prevent 9/11? If someone had been analyzing patterns of enrollment in flight schools — hmm, why are an abnormally high number of Saudis suddenly learning how to fly passenger jets? — then could we have stopped them? A further question is whether we have a right to know that all this is going on or whether that public knowledge cripples this investigation and our safety. Finally, it is not clear that releasing aggregate data necessarily violates individuals’ privacy. My point is that this isn’t as simple as raising the tattered-from-overuse privacy flag. Neither is this as simple as raising the also tattered war-on-terrorism flag.

This is about a new asset that is created in the networked world — the aggregate knowledge generated by our aggregate behavior — and who has a right to that.

This is certainly not new, only more efficient. Insurance companies have long used our health and mortality data in aggregate to set rates. Marketers use our aggregate data to adjust products and ad campaigns. Google uses our aggregate data to improve its search engine. So Google owns, analyzes, and exploits the data we create through our actions. In the case of the kiddie porn investigation, Google tried to refuse to hand over random aggregate data about our searches to the government; other search engines complied. The same thing occurred in the NSA case; some phone companies complied and Qwest did not.

The bottom line is that there isn’t yet a bottom line: The law and ethics around aggregate data are not clear.

See also this New York Daily News editorial:

Well, here we go again with the horrified screams from the crowd that’s inclined to believe the big bad government is peeping through every keyhole and recording every streetcorner chat about whether or not it looks like rain.

Revelations that the National Security Agency has been collecting a database of every telephone call in America – numbers dialed, that is, not conversations parsed – happen to come as British probers report that July’s London transit bombings might have been prevented if only security forces had been aware that one of the bombers regularly called Pakistan in the days before the blasts.

No, it’s no crime to call Pakistan. But when the call is part of a pattern that suggests a security risk, this is worth red-flagging and perhaps eavesdropping on – with a warrant and court supervision, as all right up to the commander in chief agree would be necessary.

Anyway, the idea that phone companies have been turning over raw logs to the NSA somehow doesn’t strike us as all that revelatory. Of course they have been, and they have been doing it legally. If the purpose is synthesizing data, then certainly the NSA would be keeping a database from which to synthesize. And where did you think the NSA was going to go to collect log data? …

: See also this Washington Post story on the privacy buggabuzzword:

“I wish I could say I was bothered by it but I’m not,” said Jacques Domenge, a 28-year-old Potomac man who visited a Cingular Wireless store in Rockville yesterday to replace a stolen phone.

“If it’s only done to protect people and find patterns that help the government find terrorists — I don’t think it will work, by the way, but let’s say it will — then I am all for it,” he said, adding that he had no problems with Cingular — or any other phone company — turning over records.

According to a Washington Post-ABC News poll released yesterday, 63 percent of Americans said they found the NSA program to be an acceptable way to investigate terrorism, including 44 percent who strongly endorsed the effort. Another 35 percent said the program was unacceptable, including 24 percent who strongly objected to it.

“The value of fighting terrorism, in a lot of our research, seems to be more important to the public than what they perceive as violations of their privacy — so far,” said Frank Newport, editor in chief of the Gallup Poll and vice president of the Gallup Organization in Princeton, N.J.

Newport said views of the NSA program — which was disclosed on Thursday by USA Today — should be viewed in the broader context of Americans grappling with more and more of their personal data being collected and analyzed by businesses. “When we ask what’s the most important problem facing the country, we don’t see any signs that privacy is beginning to percolate up,” he said.

  • Jeff, there is nothing that this program can do that couldn’t be done under FISA oversight except for data-mining patterns in phone records to uncover previously unkown terrorists. The potential for abuse is HUGE and the pay-off is extremely sketchy.

  • This isn’t just “aggregate knowledge generated by our aggregate behavior.” This is a federal spy agency amassing a list of all the phone calls made from my home, and your home, and your parent’s home, and your neighbor’s home, etc. This is specific and personal information – the fact that they have collected millions and millions of such records doesn’t make it less of an individual invasion – quite the opposite. This isn’t a problem of access to some faceless “wisdom of crowd” – this is about each and everyone of us, individually, being impacted by a gov’t action.

  • rick gregory

    Actually, the data issue is interesting, but really not the central issue at all in this brouhaha.

    The real issue is whether the executive branch can do whatever it wants with no oversight, simply because the President wants to do it. If there are no limits, then the checks and balances are gone and we are left a few arbitrary steps from dictatorship. I know, that sounds like hyperbole, but if there are no limits, then why can’t the President order arrests of dissidents? Call out the military for domestic use (oh, hold it, he’s about to do that one).

    If there are limits, then we need to raise this issue loudly and debate what those limits are, what needs oversight and legal checks such as warrants – and what doesn’t.

  • Rick Gregory is correct.

    Besides the point that Jeff raises about whether the NSA project is a wise one, a second, different, question is whether it was an accountable one. Both tests have to be passed before it warrants our approval. The ABC/WaPo poll question was lazy because it acted as if the second question does not even exist.

    As for the New York Daily News, when it said this–“But when the call is part of a pattern that suggests a security risk, this is worth red-flagging and perhaps eavesdropping on – with a warrant and court supervision, as all right up to the commander in chief agree would be necessary.”–it seems to have wilfully ignored the very point in dispute, namely that the Commander in Chief had ordered the eavesdropping to proceed without the required warrant and court supervision.

    On the first question–whether the anomaly crunching of such a vast database is wise–a relevant consideration would be a cost/benefit analysis. There are trillions of calls being studied. Can any BuzzMachine reader with a knowledge of computer costs enlighten us about the order of magnitude in costs to run this model? Is it thousands or millions or billions? The answer is relevant. People might consider such a far-fetched scheme reasonable if it costs pennies; if it is a vast expensive permanent bureaucracy, then not so much.

  • Old Grouch

    It’s interesting to note how much our trust in govenment has diminished since World War II. Back then, information like this was probably provided to the FBI as a matter of course, and most people, if asked, would have said that they not only expected it to be, but also thought that doing so was the right thing to do. Different times and circumstances!

    We have to distinguish between “traffic” information– the phone-numbers-time-and-duration data the NSA got from the telcos– and “wiretap” (call content) information (which clearly requires court authorization when it involves purely domestic communication). The law concerning privacy of “traffic” data is much foggier. (For example, traffic data routinely passes between telcos, for billing purposes and to investigate fraud. Also recall the flap this past January when it was revealed that average citizens were buying cell phone traffic data from brokers and using it to stalk people.) While so-called “pen register” laws limit the use of this information by domestic law enforcement, I’m not enough of a lawyer to know what restrictions apply to espionage/terrorism investigations.

    To attempt to answer Andrew’s cost/benefit question, once you get the traffic data into machine-readable form (which it is already), finding the patterns is relatively easy. This post by Kim DuToit (who’s in favor of it) gives the procedure. Pull quote:

    [W]hen you establish that (357) 243-3006 belong[s] to Abdul El-Bomba, who received a call from his brother Aziz, a known member of Hezbollah in Syria, you now have the ability to focus only on all the calls Abdul made and received… That would be a couple hundred calls, out of the (literally) tens of billions of records you’ve collected. Most… will be innocent… [b]ut out of the couple hundred calls, you may find five that are [suspicious]. [I]t’s NOW when you, as the investigator, [] get a warrant for a wiretap so you can start listening to actual content…

    The initial traffic analysis can easily be done by computer (you could do it on your laptop, except there’s too much data). But consider that, at this stage, computer analysis means greater privacy for innocent bystanders than hand-correlation, because computers only pull out the relevant data, and don’t get idly curious when they run across records involving somebody they know.

    Of course, the concern is that, now that you’ve assembled all that data, it’s easy for someone who wants to “get” Andrew Tyndall to start with Andrew’s number and go on a fishing expedition. All it takes is a couple of keystrokes.

    Right now the bottleneck is listening to the wiretaps, once you have them (unless the NSA’s voice-analysis technology is a lot more effective than we know). Which means that the traffic analysis is a good thing, because it helps to concentrate our investigative resources on likely suspects.

    Ed Felten has begun a series of posts examining the technical and privacy issues of all this in light of Moore’s law/increased computing power. [First] [Second] Here’s his prediction:

    “Before too much longer, Moore’s Law will enable government to record every email and phone call it knows about, and to keep the recordings forever. The cost of storage will no longer be a factor. Indeed, if storage is free but analysts’ time is costly, then the cost-minimizing strategy is to record everything and sort it out later, rather than spending analyst time figuring out what to record.”

    Add even rudimentary speech-recognition, and that’s what’s scary to me, which is why this discussion is a good thing.

  • Andrew

    Your point about aggregate data – current law covers is because anything that includes private information is considered under privacy law. If it is truely “aggregate” it doesn’t hold private information – so if they analyse it and uncover patterns, they wouldn’t be able to relate it to an individual or group. The examples you give of insurance companies, etc,. are nonsensical because you choose to enter an agreement with an insurance company. While that choice might be almost entirely required, you still have a CHOICE. And that’s the point – if the government is able to act indiscriminately, how do you know that they won’t further their law enforcement process by arresting anyone that threatens to “kill” another. People have said that and not meant it. Although “if you have nothing to hide, you have nothing to worry about” What’s further concerning is that voicing your opinion against initiatives that take away one’s privacy more is more and more touted as un-american. A government’s power is the voice of the people. If the freedom of speech and expression is diluted or reduced by the government over a period of time, self-perpetuating the government’s own views, how does that not descend into communism?

  • goof

    You know, I’m pretty steamed. Every time I mail an envelope, I know that a government employee is determining to whom I mailed it. The government has really stolen my privacy.

  • Pingback: Mamutong » Data Fight()

  • cab

    Don’t muck up the issue. The main point of all this is that Bush is doing it without a warrant, which is illegal. You can argue all you want about whether spying is correct or not, but it doesnt change the fact that the Pres. believes he can break the laws of our country. His one and only job is to protect the constitution of the US. Every American should be up in arms over a Pres. putting himself above the law.

  • Doesn’t the police worldwide regularly trawl ‘innocent’ license numbers in their databases when they are searching for a suspect car/person and they don’t have the complete license details from a witness to a crime? Or scanning fingerprints to match up with those from a crime scene? Do they ask for a warrant then? This is so obviously not fraught with illegality that one wonders at the irresponsibilty of the Democrats to try to use it as part of their general attack on the president’s executive powers in national security matters. And this from people who so severely attacked President Bush on ‘not connecting the dots’ that could have prevented 911.

  • Uh Cab

    Do the police need a warrant to watch you when you leave your house and go to the market/the mall/a friend’s home/school/etc?

    No. They don’t.

    Understand that police work entails a lot of boring information gathering. From a pattern that citizen A has lots of different cars coming and going at all hours of the day/night and with the visitors only staying a few minutes a visit, the police – legally having said house under surveillance (no warrant needed) – THEN develope enough probable cause to seek a search warrant for the inside of the house.

    Geez, I’m beginning to think that some people believe suspects are identified by calling the Psychic Hotline.

  • how do you know that they won’t further their law enforcement process by arresting anyone that threatens to “kill” another.


  • RonP

    funny how no one seems to care about the instrusiveness of the IRS. each year you hand over a tremendous amount of personal financial information about your income, spending habits, etc. the IRS by virtual fiat can determine whether you are telling the truth or not and launch an investigation of your records and personal life with the power to sieze your assets if they think you have violated the law. i guess i better not hold my breath waiting for the likes of the NYT or politicians like pelosi, schumer, et al to raise a stink about that. this seinfeld poltics – a show about nothing.

  • Pingback: It looks obvious » Blog Archive » Call records, and then what?()

  • RonP

    the previous lengthy post suffers from a lack of understanding of digital vs. analog intercept technology. the larger issue is one of pre 9/11 thinking vs. post 9/11 thinking. the west is doomed. embrace the horror. this reminds me so much of the string quartet continuing to play while the titanic was sinking.

  • Pingback: BuzzMachine » Blog Archive » ‘When and why I reveal secrets’()