Posts about open-source

Google Base v. microformats

I’ve read the little background material on Google’s Base and still can’t see whether the material you put there can be found by other search engines. I also cannot find evidence of an API that shares any standards for tags and structure. Is Base open or closed? So far, closed.

What we need instead is a means of letting you tag and structure your data so it can be found reliably by any search engine no matter where it is on the internet. That would stay true to the distributed internet Google has so masterfully exploited.

I wish I were hearing more noise from the microformats guys to act as competitors — or at least as pressure on Google for openness and standards.

Imagine if you could go to a page that lets you put in your resume or house ad or job ad and it spits out tagged XML you could put on the web anywhere to be found by anyone.

Or imagine putting tags on restaurant reviews you post on your blog so anyone could aggregate or search for, say, all the cuisine=mexican restaurants in location=chicago. Well, you don’t really have to imagine that. If you aggreed on the tags, you could start doing that today via Del.icio.us and Technorati.

And imagine if you could go to Google or other services — e.g., Indeed and SimplyHired for jobs or Baristanet for three Jersey towns — and see the tags they use so you can swarm around those tags and find and be found. That’s the openness we need. If Google spearheads that with a truly open API that can be adapted by the community, then great. That is our distributed marketplace. But if not, then Google is only trying to recreate the centralized marketplaces of old — otherwise known as newspapers. That worked for newspapers when they had monopolies. They don’t anymore. Does Google think it has a monopoly?

: Mark Pincus hopes Google is not trying to recreate Walmart. It’s a heartfelt, practically tear-wrenching ode to what Google coulda shoulda been:

my other big question is whether google is opening this service to the same crawling it has benefitted from to the tune of $108 billion? …

my take is google has chosen between two paths. one which i thought they were on was to be a platform to enable great things on the web. google could have powered everything with its search engine, ad infrastructure, massive crawling and computing power. it could have been a democratizing force, enabling small services to flourish in being found and in serving them a platform on which to innovate.

instead google has chosen to be merely another big corporate titan. like microsoft, it’s choosing to go for the gold, enriching their shareholders rather than enabling industries….

like msft, google is now going after every other oppty around it, taking advantage of its trojan horse position. suddenly every company is at risk. companies as far away as walmart have to have a ‘google strategy’. today, vc’s ask every new startup how they will compete with google. (at least we dont have to answer the msft question any more.) …

in fact, google feels a like walmart today. once the excitement over trying out their latest release wears off we are left with the realization that they are going to ultimately put the corner grocer (being craigslist) out of business, and suck value out of an economy not add back. …

one last thought to all those ‘web 2.0’ers’ listening. WHEN ARE WE ALL GOING TO WAKE UP AND REALIZE THAT NONE OF US COMPETE WITH EACH OTHER? WE ALL COMPETE WITH GOOGLE, MSFT AND YAHOO. the only chance we have of enabling an independent industry is to come together, leverage s resources, create and protect a level playing field. otherwise, we are all in the business of creating great products in the hope we can sell to them before they build it. how fucking boring is that?

Right. That is precisely why some of us are working on figuring out open ad marketplaces and why I wish the microformats guys were getting more ink pixels.

The answer to any monopoly — water to wicked witches everywhere — is openness.

: I’ve been meaning to link to this PC4Media post on microformats for months; now I have the excuse and the memory to do it:

MicroFormats Enable Distributed Applications!

Exactly. Microformats could be as big an innovation as databases were.

If databases let us store information. Microformats let us access the world’s databases. Potentially!

Yes, APIs do this too. But, microformats make the database (or data store) distributed. Not controlled by one entity.

This could be as big as “http”.

If you don’t get how microformats can change your business, prepare to be outdone.

: See also Fred Wilson on base.

: And see Umair Haque:

There’s only one question that matters, strategically: is Base the AOL-style walled garden of the 00s?

That is, are returns to info owned by Google going to be lower than decentralized info? …

What that means is that Google keeps indexing the world’s information, albeit at increasingly costly factor prices; while superior returns begin flowing to reconstructors and smart aggregators. This scenario devalues centralized mechanisms/walled gardens, like Base – because they’re not part of the attention ecosystem; they’re part of GoogleWorld (we really do need a name for all the info Google owns)….

But I think what it does do is begin to point to a growing vital point competitors can strike….

Then there’s Amazon, eBay, VCs, and media – all attention economy players, who seem totally intent on missing the tectonic shifts right under their feet, which are eroding all their returns.

The key question for any company today is: How do you play in the distributed world? How do you stop the 1.0 insistence of having to control and own and how do you instead make money by enabling others? That was where Google’s own gigantic growth was. But sometimes it’s hardest to learn the lessons you yourself teach.

Umair adds:

Another, marginally related point – it also points to the uncooling of Google. I mean, Base? Can you get more Orwellian, lame, sinister, connected to all the wrong stuff?

EG: Al Qaeda means “the Base”.

See also: base instincts.

: SEE ALSO: The comments. Good notes there from ROR and SimpyHired.

Measure this

The means of measurement used by advertisers for every other medium — newspaper, magazine, radio, TV, and no online — will not — will never — work in the world of citizens and distributed media. That is why we must create our own measurement standards.

To get apples-to-apples numbers for those other, older, major media, advertisers rely on allegedly representative samples.

But you can never get a sample big enough to deal with the mass of niches.

Hell, the samples aren’t big enough to deal with local online newspaper sites. The Online Publishers Association just released a study that found various means of measuring those sites disagree drastically:

Differences arose between the two primary methodologies, surveys and panels….

The paper analyzes data from five services. Firms conducting panel research include comScore Media Metrix and Nielsen//NetRatings-MegaView Local. Firms measuring local audience through a combination of online, phone and postal mail surveys include Nielsen//NetRatings @Plan (online); Scarborough Research (phone and mail); and The Media Audit (phone).

When data from the two forms of collection were analyzed, survey-based methodologies, on average, reported 70 percent higher visitor numbers than panel-based research.

An example cited in the study looked at the number of visitors to LATimes.com. Visitor data differed by one million between two services.

When I was at Advance, we found that these sampling methodologies would find no audience in some markets that we knew from our server stats were actually much bigger than other markets they did measure. They simply didn’t have enough people in Alabama.

Well, they’ll never get enough knitters to measure the knitting bloggers. They can measure a few of the biggest bloggers. But that’s not what this medium is all about. It is about, as someone said at my Web 2.0 ad panel, the “big butt” attached to the fabled long tail of passionate niches that add up to a mass far bigger than the biggest bloggers. So we need to be able to add them up.

This is why it is doubly important for us in this world to create and use our own means of measurement. I’m talking with some folks who are better at getting things done than I am — and working with Burst Media‘s coincidentally named Jarvis Coffin to set up a trade group — to work on open-source collection and reporting.

This isn’t just about collecting and verifying audience and pageview numbers — and demographics and behavior — though all that is important.

This is also about collecting data that can be collected only in this medium of the people and gives us unique value: authority, influence, conversation-starting, relationships, loyalty, engagement.

And this is about additional data that cuts across sites — from the likes of Technorati, Icerocket, Blogpulse — and how all this data will be munged together by various parties doing their own analytics.

So, in the end, when an advertiser wants to reach top food influencers they’ll be able to do so through influential food bloggers … and those bloggers will be able to recognize their value as well.

But it won’t happen through the survey or panel research that have become advertisers’ crutch.

The value of networks of trust

The most valuable and necessary networks of the next economy will be built around trust.

I just had lunch with my VC friend Ed Sim and I was boring him with my view of the future of advertising. The days of one-stop shopping for a mass of “consumers” will soon be over and advertisers will be faced with the opportunity and challenge of putting together smaller, more targeted, more efficient networks: the mass market replaced by the mass of niches. The opportunity is greater value. But the challenge is far greater effort and cost: It’s not going to be easy to put together and manage these small and ad hoc networks.

So I have been arguing that a good way to do this — once the infrastructure is in place — is to rely on human networks of trust: The advertiser or its agency can’t go and find and manage every damned little site (aka audience aggregator, aka community), so they choose a starting point: They trust me and my site (let’s say it’s a big-media site with a sales staff), and I trust you and your site (let’s say you’re a popular blogger), and we trust perhaps one more degree of separation out (let’s say those are your friends who write about the same things in more specialized but related ways). But if your friend messes up and you don’t fix it, then I don’t trust you anymore and I’ll find a new friend to trust — or else the advertiser won’t trust me anymore.

This way, we get to scale while distributing the work and the benefit with the trust. So in the end, the advertisers benefit by putting together the best networks at the lowest cost and effort and risk. And the participants of the networks benefit by attaching themselves, like atoms to molecules, to the highest value buys. (Oh, how I wish this blog had a whiteboard.)

We need some such way to operate in the age when small is the new big.

Then Ed and I were talking about similar challenges for investors and entrepreneurs in the small-is-the-new-big age: Today, it’s much, much easier to start a new company on far, far less capital than it used to be. But this also means that it’s easier for someone else to start a competitor. So speed is more important than ever: You have to develop your business as quickly and nimbly as possible to build your product and then perfect it after it’s out so you quickly establish your value. This means that the VCs need to be able to act just as nimbly to invest as quickly as possible. The good news is that the investments are smaller and the risk is thus less. But the bad news, of course, is that it costs more effort and attention to manage many more smaller investments and it’s hard to act quickly at scale. Early bird, worm, and all that.

So I wonder whether a network of trust is a solution here, too: The VC with the money trusts you to bring in deals and you trust someone else to bring in more deals and whoever brings in the most value gains the most value and grows biggest fastest. The work, risk, and benefit are all distributed.

In a way, I wonder whether that’s what VCs are doing by blogging: They’re going open-source, sort of, to state their interests and bring in more of the right deals more efficiently. But it’s still not efficient enough for a world of companies that need six figures instead of eight to succeed. And there needs to be a means to share benefit with the trust.

I think it can work in news, too: If I trust Sally’s reports on my school board more than Joe, I’ll send traffic her way and she’ll make more money on advertising from the newspaper (see Pincus’ world, below) and maybe she’ll send traffic my way for my reports on the town council if she trusts mine, too.

Where else?

: All of this is my clumsy, imprecise, philosphy-not-math-major’s re-expression of the discussion about Reed’s Law vs. Metcalfe’s Law vs. Oren’s Doubts vs. Evslin’s Postulate, none of which I understand above a kindergarten level. I was just trying to get my head around Reed’s Law, which Evslin explained to me on a napkin, when suddenly he and Wilson and Oren are abandoning it. (Cue Tom Lehrer’s New Math.)

I’ll try to summarize this badly: Metcalfe’s Law says the value of a network increases as more nodes are added to it (i.e., one fax machine is worthless, two fax machines are each work a lot more, a large network of fax machines is truly valuable). Reed says (I think) that if a network includes social sub-groups, it grows exponentially faster. All the wise gentlemen listed above are now debating whether the math works and I leave that to them.

But to me, the humanities guy with the damned liberal arts degree, it’s obvious: A network built on trust is clearly more valuable than a network built on technology.

Repeat after me, after Butterfield, after Mayfield, after Soylent Green: Web 2.0 — It’s made of people. It’s not about controlling scarce assets in a post-scarcity world. It’s about trust.

And it’s hard to chart trust. It’s hard to give it a metric. It’s hard to give it a market value. But it’s damned easy to lose.

: UPDATE: Here’s what Ed took from lunch (besides the check…).

And here’s Tim O’Reilly on both.

We take over the zoo

Bob Garfield writes another magnum opus for Ad Age. The last was on his chaos scenario for advertising. This is on the open-source revolution. Great lead:

Hear that?

In the distance? It’s a crowd forming — a crowd of what you used to call your “audience.” They’re still an audience, but they aren’t necessarily listening to you. They’re listening to each other talk about you. And they’re using your products, your brand names, your iconography, your slogans, your trademarks, your designs, your goodwill, all of it as if it belonged to them — which, in a way, it all does, because, after all, haven’t you spent decades, and trillions, to convince them of just that?

Congratulations. It worked. The Great Consumer Society believes deeply that it has a proprietary stake in you. And like stakeholders everywhere, they are letting their voices be heard.

Why? Because the information society is reversing flow. What began as an experiment among a few software nerds has, thanks to the Internet, expanded into other disciplines, notably media and law. But it won’t stop there. Advertising. Branding. Distribution. Consumer research. Product development. Manufacturing. They will all be turned upside down as the despotism of the executive suite gives way to the will, and wisdom, of the masses in a new commercial and cultural epoch, namely: The Open Source Revolution.

Here’s the Ad Age link, though that won’t work without blood tests and security clearances. Don’t tell anybody, but a blogger put the piece up here. Open-source revolution, indeed.

Web 2.0: Launchpad

13 companies in 90 minutes.

Zimbra: An open-source collaboration suite. Lots of Ajax. Everything is Ajax. He’s getting lots of awws from the crowd for allowing you to see where an appointment is or what you have on a date without having to leave the email. Ajax and Google map mashup and Skype mashup. Can’t lose, eh?

Nevermind my ajax gags. This really looks wonderful: very smart use of interface to let you get around your data (show me just the emails from the guy between these dates that have this kind of attachment; show me a FedEx tracking number and go ahead and get the status dynamically, and so on). In six minutes, it looks like a winner. Best of the bunch. Everybody in the audience wanted it.

Flock: A social browser. The web is not just content or shopping but a stream of events among people, they say. So they built a browser opensource on top of Mozilla; the first, alpha release comes in a few weeks.

It combines favorites and RSS feeds: you click a star on the address bar and it’s a bookmark and you’ve subscribed if there’s a feed. With a story on the page, you can take content and drag it onto a “shelf” (the demo devil is bedeviling them). There’s also a “blogging top bar” within the browser — important for bloggers — that allows you to open a blogging editor and drag content from a page onto your post. Very nice.

Zvents: “Takes the search approach to events.” It’s live for the Bay Area. They’re trying to do deals with old-style local publishers, which is smart, since local sites tend to suck at this. They have what-where-when searches that deliver into maps, lists, and calendars. And the lists are exportable to your blog; it’s distributed.

Socialtext: Ross Mayfield says that Socialtext, the first wiki company, will go open-source. It’s coming full-circle: Wikis came from open-source and now a wiki company goes open-source. He says that wikis are happening inside companies at larger scales than before; organizations are sharing information. “Now we’re giving it all away.” Marc Canter screams: Awwright.”

Wikiwyg.net, the wysiwyg open-source for wikis, is the first step (I think it’s quite neat). Then they add SyncroEdit.com: real-time synchronous editing for the web. Now add in Atom and microformats for offline editing.

Rollyo: Dave Pell, big blogger: “This is going to be the shortest nonsexual performance of my life.”

He shows Rollyo: roll your own search engine. I’m on the beta list: you add a list of the sites you want to search on a regular basis. You can also get people to come to your personal search engines. And you can explore others’ search rolls.

Orb: Shows you all your content from home on any web-connected device anywhere. Works only on PC now; Mac by the end of the year. Very nice.

Wink: Combines search with user interactivity: “people-powered search.” (Well, in a sense, Google is just that, eh?) You can tag search and add that into tags on Delicious et al. They say this means it’s spam free (if tags don’t get spammed, I suppose).
Joyent: A network suite of applications with email, calendar, contacts, files and binders. The data is tagged and smart filtered and can be turned into RSS feeds. The data is open and transportable. It’s focused on small groups of 2-20 people. So, for example, you can overlay other people’s calendars onto your own. So far, I shrug.

Bunchball: It tries to solve the “social application gap” and the “replication of reality.” Didn’t know I had those problems. He’s saying that entering into new social applications is hard because there’s an investment. It’s a platform for starting social applications. I suspect this is a bad-timing award against the announcement this week of Mark Andreesen’s Ninq.

RealTravel: “Real travel. Real advice. Real experiences.” It enables people to put up travel journals and ratings. Not sure what’s different from TripAdvisor, which is already huge.

Knownow: It’s a Kleiner-funded company that’s about dynamic distribution of content. I don’t know what that means yet. It’s a notification service using RSS. I frankly don’t get it.

AllPeers: A web development platform based on Firefox.

Structured Blogging: From the PubSub guys comes a plug-in to Word Press that gets people to publish structured data. It basically adds prepopulated tags — not loose-form — to get people to add the fact that this is a restaurant review, for example. Wish it would work; we’ll see whether it does. I think the key is that people will do this if it helps their stuff be discovered — e.g., to get a restaurant review on your blog aggregated with all your neighbors’ restaurant reviews.

: A slicker version of this report over at Lifehacker, where I’m flattered to be reporting.

Web 2.0: Tagging

At Web 2.0 for the tagging session: SRO.

Tony Stubblebine of O’Reilly says they are the first customer to use Del.icio.us data to find out more about their content. That is precisely the right use of Del.icio.us for media/content sites.

Josh Schachter, founder of Del.icio.us, says he sort of starting the tagging thing when he called tags tags instead of keywords.

The first question: “I use Del.icio.us, but I’m not sure I get it.” Familiar applause from everybody in the audience. Fred Wilson, who invested in the company, has said that he didn’t get it either until I sent him a feed of somebody’s tags of media stories. Then he bought into the razor company.

I confess that I now get Del.icio.us but I don’t get how to tag well because you can tag just for yourself or for the world or to find stuff, you can tag micro or tag macro. Caterina Fake says: “Isn’t it because we’re overthinking it?” Josh says it is split up by use or intent: tagging for others (Technorati) or for yourself (Delicous) or a combination (Flickr). Jeff Veen says that’s not quite right; he uses Delicous [I'm giving up on the damned dots] as a publishing tool.

We’re at that cusp of geekcool to peoplecool; the world will make sense of it. I told Josh before the session that Delicious should go mainstream now and take down the velvet rope, as a VC described the hard-to-grok UI of the service. Josh said there is no intention to have a velvet rope. It’s a geek rope. And they’ll change it.

There’s now a research lab at Yahoo and Berkeley Research Labs working on automatic tagging. Josh says Ojos (he thinks) is working on tagging via face recognition.

Someone says that a key benefit of tagging vs. metakeywords on web pages is that they are visible and you can see whether they are credible and not spam and manipulation. Similarly, Google chose not to use metakeyworks but instead gave weight to the words inside a hyperlink and that’s better because it’s visible, not invisible. So we find out what the world thinks content is about instead of what the author thinks it is about.

It’s not just tags, then. When you link to something and describe it in that link (which means you should pick your link words carefully) you create data about the meaning of that to which you link. Ditto tags. That’s transparent. And anybody can do it.

Catarina talks about a new metric Flickr uses: interestingness, which tries to capture how much people have seen, tagged, linked to something. And she says you can pivot that around a person or a social group: What interests them? Add that to the metrics we as an unmedium need to capture and deliver: Where’s the good stuff? That’s where we want to be (and advertisers, too).

Someone asks about using tagging in a closed corporate environment. Wisely, the group tends to shy away from the enterprise trap. Josh says it’d be interesting for a company to find the people who find good stuff first. O’Reilly says that’s the customers.
At Web 2.0 for the tagging session: SRO.

Tony Stubblebine of O’Reilly says they are the first customer to use Del.icio.us data to find out more about their content. That is precisely the right use of Del.icio.us for media/content sites.

Josh Schachter, founder of Del.icio.us, says he sort of starting the tagging thing when he called tags tags instead of keywords.

The first question: “I use Del.icio.us, but I’m not sure I get it.” Familiar applause from everybody in the audience. Fred Wilson, who invested in the company, has said that he didn’t get it either until I sent him a feed of somebody’s tags of media stories. Then he bought into the razor company.

I confess that I now get Del.icio.us but I don’t get how to tag well because you can tag just for yourself or for the world or to find stuff, you can tag micro or tag macro. Caterina Fake says: “Isn’t it because we’re overthinking it?” Josh says it is split up by use or intent: tagging for others (Technorati) or for yourself (Delicous) or a combination (Flickr). Jeff Veen says that’s not quite right; he uses Delicous [I'm giving up on the damned dots] as a publishing tool.

We’re at that cusp of geekcool to peoplecool; the world will make sense of it. I told Josh before the session that Delicious should go mainstream now and take down the velvet rope, as a VC described the hard-to-grok UI of the service. Josh said there is no intention to have a velvet rope. It’s a geek rope. And they’ll change it.

There’s now a research lab at Yahoo and Berkeley Research Labs working on automatic tagging. Josh says Ojos (he thinks) is working on tagging via face recognition.

Someone says that a key benefit of tagging vs. metakeywords on web pages is that they are visible and you can see whether they are credible and not spam and manipulation. Similarly, Google chose not to use metakeyworks but instead gave weight to the words inside a hyperlink and that’s better because it’s visible, not invisible. So we find out what the world thinks content is about instead of what the author thinks it is about.

It’s not just tags, then. When you link to something and describe it in that link (which means you should pick your link words carefully) you create data about the meaning of that to which you link. Ditto tags. That’s transparent. And anybody can do it.

Catarina talks about a new metric Flickr uses: interestingness, which tries to capture how much people have seen, tagged, linked to something. And she says you can pivot that around a person or a social group: What interests them? Add that to the metrics we as an unmedium need to capture and deliver: Where’s the good stuff? That’s where we want to be (and advertisers, too).

Someone asks about using tagging in a closed corporate environment. Wisely, the group tends to shy away from the enterprise trap. Josh says it’d be interesting for a company to find the people who find good stuff first. O’Reilly says that’s the customers.

We see Consumating.com, where people tag themselves.

Esther asks about time and the decay of popularity. Josh says that Delicious cares about the vector: It’s not interesting that 10,000 people tagged “google” but this tag is hot now; Catarina says the same for the hot tags on Flickr. She says Yahoo research labs will have something on this later.

Beyond porkbusters: Paramedia

I like Porkbusters (and I’m about to hear Glenn Reynolds plug the movement on Reliable Sources). It was born the way things are online: a sudden need, a sudden inspiration clicks with a critical mass and movement moves. This is a great example of our distributed world swarming together to accomplish something. Remember: The internet isn’t a medium. It is a means.

So how could the Porkbuster example be extended? At the MT&R fest the other day, Jay Rosen lauded the similar example of Josh Marshall having bloggers uncover the secret vote on the DeLay rule — a movement of the moment much like Porkbusters. Then Jay said he wanted to come up with another idea:

There wasn’t time for me to explain my suggestion for a next big project in open source journalism– a blog-organized, red-blue, 50-state coalition of citizen volunteers who would read and attempt to decipher every word of every bill Congress votes on and passes next year.

Or, in the vein of Porkbusters, start with the budget and create the wiki-annotated view of federal spending.

All it takes is a leader to push the notion the first time and then a lot of people agreeing and willing to pitch in… and maybe a tag or a microformat to help it come together.

This is the smart mob as a new newsroom. Not the new newsroom, mind you: another new newsroom.
On the way into Manhattan this morning, I listened to Mitch Ratcliffe’s podcast version of this post, in which he argues that we are witnessing the growth of “paramedia.” This is parajournalism.

The exploding classroom

Will Richardson, one of the most forward-thinking educators I know, has been insisting that open-source sharing will come to education. That and this story on CNet made me check into Wikibooks, Jimmy Wales’ effort to revolutionize textbooks, and even though it’s only beginning, it’s already an amazing collection. Of course, I can’t vouch for the quality, neither reading them nor knowing nearly enough. But there can be little doubt that capturing the wisdom of the wisest crowds, freeing it from its ivy bonds, will create amazing resources. I only wish there were a text for journalism.