Google Base v. microformats

I’ve read the little background material on Google’s Base and still can’t see whether the material you put there can be found by other search engines. I also cannot find evidence of an API that shares any standards for tags and structure. Is Base open or closed? So far, closed.

What we need instead is a means of letting you tag and structure your data so it can be found reliably by any search engine no matter where it is on the internet. That would stay true to the distributed internet Google has so masterfully exploited.

I wish I were hearing more noise from the microformats guys to act as competitors — or at least as pressure on Google for openness and standards.

Imagine if you could go to a page that lets you put in your resume or house ad or job ad and it spits out tagged XML you could put on the web anywhere to be found by anyone.

Or imagine putting tags on restaurant reviews you post on your blog so anyone could aggregate or search for, say, all the cuisine=mexican restaurants in location=chicago. Well, you don’t really have to imagine that. If you aggreed on the tags, you could start doing that today via and Technorati.

And imagine if you could go to Google or other services — e.g., Indeed and SimplyHired for jobs or Baristanet for three Jersey towns — and see the tags they use so you can swarm around those tags and find and be found. That’s the openness we need. If Google spearheads that with a truly open API that can be adapted by the community, then great. That is our distributed marketplace. But if not, then Google is only trying to recreate the centralized marketplaces of old — otherwise known as newspapers. That worked for newspapers when they had monopolies. They don’t anymore. Does Google think it has a monopoly?

: Mark Pincus hopes Google is not trying to recreate Walmart. It’s a heartfelt, practically tear-wrenching ode to what Google coulda shoulda been:

my other big question is whether google is opening this service to the same crawling it has benefitted from to the tune of $108 billion? …

my take is google has chosen between two paths. one which i thought they were on was to be a platform to enable great things on the web. google could have powered everything with its search engine, ad infrastructure, massive crawling and computing power. it could have been a democratizing force, enabling small services to flourish in being found and in serving them a platform on which to innovate.

instead google has chosen to be merely another big corporate titan. like microsoft, it’s choosing to go for the gold, enriching their shareholders rather than enabling industries….

like msft, google is now going after every other oppty around it, taking advantage of its trojan horse position. suddenly every company is at risk. companies as far away as walmart have to have a ‘google strategy’. today, vc’s ask every new startup how they will compete with google. (at least we dont have to answer the msft question any more.) …

in fact, google feels a like walmart today. once the excitement over trying out their latest release wears off we are left with the realization that they are going to ultimately put the corner grocer (being craigslist) out of business, and suck value out of an economy not add back. …

one last thought to all those ‘web 2.0’ers’ listening. WHEN ARE WE ALL GOING TO WAKE UP AND REALIZE THAT NONE OF US COMPETE WITH EACH OTHER? WE ALL COMPETE WITH GOOGLE, MSFT AND YAHOO. the only chance we have of enabling an independent industry is to come together, leverage s resources, create and protect a level playing field. otherwise, we are all in the business of creating great products in the hope we can sell to them before they build it. how fucking boring is that?

Right. That is precisely why some of us are working on figuring out open ad marketplaces and why I wish the microformats guys were getting more ink pixels.

The answer to any monopoly — water to wicked witches everywhere — is openness.

: I’ve been meaning to link to this PC4Media post on microformats for months; now I have the excuse and the memory to do it:

MicroFormats Enable Distributed Applications!

Exactly. Microformats could be as big an innovation as databases were.

If databases let us store information. Microformats let us access the world’s databases. Potentially!

Yes, APIs do this too. But, microformats make the database (or data store) distributed. Not controlled by one entity.

This could be as big as “http”.

If you don’t get how microformats can change your business, prepare to be outdone.

: See also Fred Wilson on base.

: And see Umair Haque:

There’s only one question that matters, strategically: is Base the AOL-style walled garden of the 00s?

That is, are returns to info owned by Google going to be lower than decentralized info? …

What that means is that Google keeps indexing the world’s information, albeit at increasingly costly factor prices; while superior returns begin flowing to reconstructors and smart aggregators. This scenario devalues centralized mechanisms/walled gardens, like Base – because they’re not part of the attention ecosystem; they’re part of GoogleWorld (we really do need a name for all the info Google owns)….

But I think what it does do is begin to point to a growing vital point competitors can strike….

Then there’s Amazon, eBay, VCs, and media – all attention economy players, who seem totally intent on missing the tectonic shifts right under their feet, which are eroding all their returns.

The key question for any company today is: How do you play in the distributed world? How do you stop the 1.0 insistence of having to control and own and how do you instead make money by enabling others? That was where Google’s own gigantic growth was. But sometimes it’s hardest to learn the lessons you yourself teach.

Umair adds:

Another, marginally related point – it also points to the uncooling of Google. I mean, Base? Can you get more Orwellian, lame, sinister, connected to all the wrong stuff?

EG: Al Qaeda means “the Base”.

See also: base instincts.

: SEE ALSO: The comments. Good notes there from ROR and SimpyHired.


  1. ROR says:

    ROR is exactly that, an XML format for describing content on webites (products, reviews, articles, [your object here], etc).

    In fact we are shoked to see how Google Base is similar to ROR, except fot the openness issue.

    The ROR guys

  2. Dave McClure says:

    hi Jeff –

    the angle on microformats vs GoogleBase is right on target: those are 2 distinct alternatives to consider for the future. one is a centrally-hosted option (which has benefits, but significant downside if not searchable by anyone but Google), and the other is a distributed option, but “centrally-searchable”.

    in other words, in a microformat world i post my data (job listing, classified ad, restaurant review, etc) on my site or blog, and using tags & microformats, that data is discoverable by anyone who knows to look for those tags & can recognize the data structure defined by the microformat. this is what i call “open access”.

    in a GoogleBase (or Craigslist or eBay) world, i upload my data to a hosted DB, specify the structure as i post the listing there, and then the data is discoverable by anyone who uses the associated portal — and, if that data is searchable by other engines & crawlers, then it can be discovered by anyone using other meta-search or vertical search services. but if it is *not* open to search by others, then i’d say it’s more of a “walled garden”.

    what is interesting about these 2 alternatives is that at first glance they look somewhat similar — they both allow anyone to post data that could be discovered & searched by a large audience. however, in the 2nd case, unless the data is searchable by other services, the size of that audience is limited by the hosting portal.

    furthermore, depending on the web services offering that host portal provides, there could be a limited ability to build application workflow on top of that portal. from one perspective, if the web services offering is very rich, perhaps that could be an advantage you’d trade off for a more limited audience size. however i think as vertical search engines and vertical applications develop, assuming a one size fits all stratey will work is unlikely.

    ultimately, i believe the tradeoff is whether you trust a centrally-hosted data service to provide access to data for others to use and enrich further, or whether you’ll buy into a potential ‘walled garden’ due to other benefits (ease of use, richness of web services, audience size, etc).

    however, as the variety of other services grow, and data becomes available / searchable / remixable, and other web services offerings debut, it becomes increasingly unlikely that any one portal will offer a rich enough platform to offset the downside of the ‘walled garden’ — unless they choose to be truly open.

    as powerful as Google is (or Yahoo or Microsoft), i don’t believe there can be only one.

    – Dave McClure |
    Master of 500 Hats

    my general take is that

  3. Hi Jeff,

    You’re asking the right kind of questions.

    I wanted to direct your attention to a slightly broader topic, that of semantic markup. I wrote an article on Digital Web Magazine about it:

    In this article I mention microformats as well as structured blogging (similar to ROR) and RSS. All of these technologies are doing the same thing: providing a semantic structure for markup. Microformats, though they have a lot of press, aren’t necessarily the best way to solve this problem. I’m not sure what is, but there are definitely alternatives…

    Also, thanks for the comparison with Google Base, it’s a very interesting one. My optimistic side hopes that they simply rewrite their templates to spit out more structured code…


  4. Danny says:

    Ok, so let’s say you’ve prepared this bunch of data about your site, all neatly encoded in one of their format options. So you give it to Google. So why not give it to the rest of the Web as well?

    More at:

  5. Robert Young says:

    Here’s my 2 cents:

    1st penny– GOOG is launching carefully to test their semantic tagging scheme… which is structured at the high level and unstructured at lower levels. In order to validate their scheme, they first need critical mass of meta-data.

    2nd penny– assuming the test works, they’ll open up APIs for the rest of web to mashup (e.g. a la GoogleMaps). However, it’ll be critical for them to tie the API to a GoogleWallet to share transaction revenues. *But*, the transaction dollars will be ad dollars, not consumer purchases… this is the reason why I believe Eric Schmit has been saying that GoogleWallet will not be a direct competitor to PayPal.

  6. Jon Tan says:

    The final question for companies humans concerns me. If I understand correctly, this closed information capture system as a pre-requisite to pseudo-syndication or inclusion in the Google aggregate is yet another profit layer between humans and information from source. It’s a web within the Web with profits being shaved off with “applicable ads”. This is fast becoming an issue of collective ownership versus Google ownership. There are serious issues here for me around access to the human information matrix and its sustainability through distributed, open channels. From what I read, and my own scepticism this smacks of the creation of a false data interface as a profit farm.

    Microformats indeed needs more pixels to offer a decentralised, empowering version. I will continue to evangelise about MF as a component of what I call SIDE but if a client comes to me and starts asking about Google Base submission, I’m going to balk. IMHO, we (meaning all of us “2.0-ers”) have a duty of care towards the information we sheppard and we should get our collective arses in gear to develop robust open standards so access to the more meaningful, aggregated and categorised lexicon of all human thought and knowledge doesn’t one day get become synonymous with Google.

  7. I am not sure if this is of interest but there is a lot you can do already with the URL in Google Base. It is not really an API but you can make structured queries. Very much like get all jobs where job type equals permanent for example. I found a way to make AND statements but not OR yet.

  8. Alex Barnett says:

    I agree with Danny. Why does this need to be an either / or conversation?

    Why couldn’t I do / use both (and others)?

  9. One glimmer of openness: You can submit your data to Google base via public RSS. If people choose that solution, rather than typing their data straight into Google, the data will be available to other robots. There is still an issue of Google namespaces and stuff, but I think Google will have to keep the format quite stable.

    But we need a carrot to motivate webmasters to choose the RSS solution: Yahoo!, and the gang must be able to to something useful with these feeds asap.

    Google Base – Atom 0.3 Specification

  10. Mario Rizzuti says:

    Your thoughts about open vs closed are interesting.

    I guess that the larger the database , the more you need to think about ranking. Also, the less relations between the items, the more difficult that has to be.

    What about ranking items in an enormous (“world’s”) database where items have virtually no linking between them (ie. products, classifieds, images, etc)?

    I think that 2 simple aspects of digg (and other projects) are inspiring and worth exploration:

    1.extracting ranking directly from traffic/users
    2.owning traffic/users by taking manual submissions (as opposed to crawling/rss, etc).

    A not-so-unrealistic scenario could be one in which the leader is not the one controlling the items-db, but the one controlling the largest users-feedback-db and extracting the most value from it.

    If that’s at least partially true, the leader could be completely open about search results without sharing its real assets.

    I don’t see Google Base as an innovator or even a bookmark. Probably, the only potential value it is showing right now is free exposure (and just because of the brand).

  11. PHP API for submitting to Google Base says:

    I’ve found an Open Source library for helping PHP developers to generate and upload through FTP Google Base valid data.

    The website is

    It has a plug-in style for attributes and schemes. But currently only Housing scheme is defined.

  12. Aubrey says:

    I am the single developer of an online classifieds site called Safarri. Instead of being afraid of Google Base, I work with it!

    Whenever ads are submitted to Safarri, I host them then submit them to Oodle and Base.
    This sends people browsing either site to Safarri.

    When people are browsing Safarri, I fill in Base ads under the Safarri ads. They are not quite as relevant, but I have developed some algirithymns that make them pretty good.
    Since Safarri’s interface is awesome, this makes browsing Safarri even better than browsing Base. This sends more people to Safarri.

    In my opinion, Base is a a win/win situation!
    (And if you don’t believe me), check out

  13. Sohbet says:

    Since Safarri’s interface is awesome, this makes browsing Safarri even better than browsing Base. This sends more people to Safarri. Thanks

  14. Chat says:

    I’ve found an Open Source library for helping PHP developers to generate and upload through FTP Google Base valid data.? Thanks