Open Data

I’m at Seth Goldstein’s Open Data confab at the Reuters building. I love the mission on the wall: “Open data is to media what open source is to technology. Open data is an approach to content creation that explicitly recognizes the value of implicit user dat. The internet is the first medim to give a voice to the attention that people pay to it. Successful open data companies listen for and amplify the rich data that their audiences produce.”

Katie Neiderhofer of BuzzMetrics is presenting and is asked about opening up their data (because, of course, in the end, it is our data). She doesn’t quite get it, talking about sharing data with a company. Who owns the wisdom of the crowd?

She shows a chart that associates words with the concept safety and groups them: children, life, police, work, home… Bush, president, American, administration…. terrorism, Iraq, military, attacks… And she finds that the emotional words — dangerous, risk, fear, ensure — as associated with the personal words: children, life, etc. This is fascinating data that also becomes useful to associate words and concepts (and, I’d say, behind that the sites and people that talk about them). She shows something called Floodgate with a live view of blog tag clusters; unfortunately, this, too, is closed.

I ask whether they have tied together the work DataMining blogger Matt Hurst did when he was at Buzzmetrics, mapping the social (linking) associations of bloggers with what she shows: the mapping of topics. In other words, have advertisers come to them to find, for example, the most influential food bloggers? Yes, she says. So, Seth says, this becomes a “media planning tool for social media.” But there is also discussion about this being closed. If there is an influence metric, who owns that? I would benefit by knowing that I am an influential food blogger and if I am not given that information, I might shut off the closed network from exploiting me or I might join in an open, competitive network. See: The open-source ad network.

There is much discussion about the sale of our aggregate and/or anonymous behavioral data and issues of both privacy and PR.

Sanjiv Das from Morgan Stanley is about to explain agtorithms. He says that one cannot disrupt markets but must anticipate them (hello, Viacom). He says that data will become commoditized but organization will be proprietary. Amen.

Barak Pridor of ClearForest presents text analysis. For example, he shows search results that occur only in documents that meet some test. I ask whether he could give us things that have the tag X but only if it also has the tag Y. This would be extremely valuable for such things as Outside.in and Edgeio (e.g., show me posts tagged ‘mexican’ but only if they’re also tagged ‘restaurant’ and ‘new york’). I’m dying for that kind of multilayer search and analysis. It enables so much more.