Google’s syndication protocol

Maurice says that Google’s new syndication protocol, which I asked about yesterday, is an indication that Google is putting a gate in the wall around its garden to enable queries into its data. That would be good. But what it really should allow is not just queries into but scraping of its data for that data is our data that we put there in services like Google Base. Why do I care about this? So that new services can come along and aggregate distributed posts — classified ads, listings, reviews, whatever — wherever they are on the internet, in blogs, in other still-closed services, or in open blogs. Now that Google is trying to become a repository of our data, that should be open to the world to aggregate and analyze as Google aggregates others’ data. The Golden Rule of the Google Age should be: Scrape unto others as you would have them scrape unto you. I hope that’s what the new syndication protocol does but I’m still not sure.

  • I’ve not really dug deep yet, but I get the impression that anything that you put into GData you can access directly – no need for scraping or search-style queries, it’s available as Atom through simple requests over HTTP (they don’t appear to support APP introspection, but that’s not a showstopper) .

    Whether or not Google decide to follow this path in their other products is another matter, but it looks to me like as far as GData is concerned, the garden has no wall.

  • Andy Freeman

    > I’ve not really dug deep yet, but I get the impression that anything that you put into GData you can access directly – no need for scraping or search-style queries

    Scraping is what search engines do to create their indexes.

    Jarvis is asking if Google will make this data available to Yahoo and the like, not to its owners. (Its owners don’t need access – they can keep a copy of what they upload.)

  • I agree that allowing massive scraping would be awesome, but I dont think it’s currently possible with just using GData– It’s something that the underlying service would have to provide.

    It seems that GData communications are only used to interact with a pre-existing feed, which is provided by the underlying service (ex, Google Calendar will provide you with the URL of the feed of your calendar items, which you send gdata requests to).

    The scraping you describe would require an easy way to get a large list of readable feeds. Looking at the calendar API, there doesn’t seem to be a way to do this for Google Calendar feeds.

    This makes sense I think– calendar data is inherently more private than items placed in Base, which is meant to allow public viewing and searching. Perhaps when they add a GData interface to Google Base, the Base service would extend GData to let you get a list of feeds. (Atompub, one of the emerging standards that GData is based on, lets you do this already via it’s introspection features).


  • I think they are moving towards distribution of their data, however they are going to charge for it through their API services. They are trying out pay per transaction API calls through their adwords product, it would not surprise me if they expand their API into other data and charge per ‘token’, right now it is going to be .25 per 1000 tokens.

  • janice

    I agree with Craig about the $$ point.

    It’s be cool though if they put out embed-a-ble feeds. I looked at the microformat local stuff this week — their integration of Yellow Pages, Local & Reviews from across the web. Being able to grab that stuff back to a site in a feed would rock — putting your ad in their page is not the same.

    Hm. It’s like reverse advertising. Take our formatted content and pay for it – and keep the ad dollars on your own site. Syndicatizing.

    In any case, it’ll be interesting to see how it plays out.