Zap the zombies

Om Malik and John Battelle, among others, are following the blog plagiarism/blog zombie problem of reverse spammers taking our content and slapping them onto fake blogs to get Adsense revenue and Google links. John says, “We need to address this.” Actually, we have to make Google address this. First, Google’s Blogger is being used for this fraud. Second, Google is paying people for this; they know who the fraudsters are. So perhaps the victims need to gang up and file suit, which means that can subpoena Google for the identities of those whom Google is paying, which might make Google sit up and pay attention.

  • I think you’re on to something. A post I ran earlier today cites a software program for the very purpose of creating splogs. The ad for this thing (ominously called “VooDooBlogger”) says that it ONLY uses Blogger to build its blogs.

  • Jeff- Holding Google responsible in court for the actions of sploggers is legal answer to a technology question.

    This isn’t really even about Google. All crawlerbot indexes (such as Google and many others) attempt to model real-world factors: they seek to create algorithms that assign an importance value proximate to the actual (but intangible) importance value of millions of sites and trillions of documents on the web. Because there is a potential financial reward to the value assignment, bad actors will almost invariably attempt to game the system. There is no way to prevent the bad guys from trying to do this, and there is probably no sure-fire way to prevent them from being successful at least some of the time.

    We’d be better served trying to come up with a technology solution to catch some large chunk of the offending sites and harass them off the net. There is reconciliation software used by financial institutions that looks for patterns and generates exception handling messages in money movement – I would imagine that something similar could be set up for blogs. I can envision a fairly similar workflow that would index subscriber blogs, locate excerpted text and check for proper citation. Lacking proper citation, exceptions would be generated, and notices transmitted to the subscriber, the webhost, and the administrative contact for the domain. Any contact email address listed on the site could be notified to remove or properly cite the offending content. Subscribers would have a choice of copyright protections for their content.

    (The service provider would have to charge at least $5/month for this or it would not scale – too many free riders and the system would bog down, also the service provider would need to have a contractual relationship to effectively pursue sploggers who subscribed to the service in an attempt to game the system.)

  • DensityDuck

    Sterling: I love the way that your solution to the splogging problem is to make the splogg-ee jump through all kinds of hoops to avoid it. When your dog bites someone, do you sue them for not taking anti-dog provisions?

  • JohnS

    Conceptually being done – see Turn It In web site. “A proprietary system that instantly identifies papers containing unoriginal material and acts as a powerful deterrent to stop student plagiarism before it starts.”
    It’s amazing what students try to submit as their own work.

  • Jeff,

    You sound exactly like the RIAA. The RIAA would like to force every search engine (Napster, Gnutella, Kazaa, etc.) to do exactly what you’re hoping Google will do for you. This is not a good solution.

    You could however take a technology solution into your own hands. If you don’t want others to profit from your work you could extend the effort to track down illegal copies of your work using one of the forgery finding tools above and send a DMCA take down request to the hoster – which in this case is Google. That’s the balance/compromise/deal that the early search engines struck with the copyright owners and now that you’re the one on the negative end of it it doesn’t seem as easy as file sharing should be ok, does it?

    The good news in all of this is that it is in Google’s economic interest to start finding and deleting splogs as they will hurt Google’s search quality and click through/ad value. Give them time to start figuring out how to determine how to delink the cheaters without delinking you in an automated way that allows them to not hire the mongol hordes to read billions of links.


  • I watch Technorati and Google blogsearch feeds for references to one of my sites. Two things I’m seeing in Splogs that are worth noting.

    1) Make Poverty History banners that conveniently cover the Blogspot “Flag” button.
    2) An inceasing number of splogs that use WordPress.

    I don’t have a solution but I can tell you that it’s making Google blogsearch almost useless for tracking references to our site. I’m getting an average of 2-3 new splogs per day.

  • Gene… Yes, there’s that danger.
    But Google is benefitting from this; they’re making money from the poor advertisers who end up on the splogs. It’s the advertisers who should be taking multiple parties to court for fraud.
    But the point is that Google is in the best position to stop this, not only because Blogger is being used (though something else could be) but because Google is enabling this through both Adsense and search. If Google made it a priority to help the community by at least refusing to support this fraud, then that would go a long way to killing it.
    We need Google’s help. I would have thought they would have volunteered it already. But since they havfen ot, we need to push them to volunteer it. One subpoena would go a long way.

  • Isn’t the idea of blogging to get as much exposure as possible for one’s ideas? And don’t splogs help with that? Or is it to make max ad money? Seems like it’s a bit of a compliment to be ripped off in this way. I for one welcome our new splogging overlords.

  • Sterling: I love the way that your solution to the splogging problem is to make the splogg-ee jump through all kinds of hoops to avoid it.

    Ummm…who do you think is going to do all the work of protecting your copyright without being paid to do so? If your IP is valuable to you, you ought to be willing to subscribe to a service to defend it. The service in question would search the net probably once a week to find plagiarized entries from the whole history of your blog, and then would run a series of workflows when it found one. That kind of thing gets expensive, fast.

    Also, and I hate to be the one to break this to you, but some sploggers are very likely using cheap translation tools to convert English-language posts into Spanish, Russian, Japanese and other high-traffic languages. Others are probably using blog text as context fodder to substitute some nouns with high-value ones like “mesothelioma,” “workman’s comp” and “Lance Armstrong”. Both of these tactics would defeat efforts to detect them.

    No one is going to protect you for free. There isn’t going to be any government agency to protect against splogs (not that such a thing would be free, either.) If you want to protect your IP you’re going to have to ante up to someone who invents a clever technical workaround to the problem.

  • tom brandt

    Your link to Om Malik 404’s.

  • Non-atttributed full posts are theft plain and simple and should attract take down notices. However, it would be wise for content producers, if they are worried about this, to set their RSS feeds up with excerpts rather than full text.

    However, as Adsense and its clones begin to fade – which is already happening among the leading edge web users – the issue will become moot because there will be less and less economic incentive for sploggers to grab feeds and put them up.

  • Pingback: » Blog Archive » Blog plagiarism — Steal this blog!()

  • Jeff,

    Then you must support the RIAA when they say that all peer to peer file sharing networks should self police and remove all the clearly stolen content.

    Is that assumption correct?

    You should also know that Google already has a healthy department of people to handle subpoenas and take down notices and all the other nasty detritus of running a successful web enterprise. Your subpoena will just be one in a constant flow.


  • Jeff,

    I don’t get it. You’re so upset by this, but it’s way out of scale to the problem. These scraper sites *make no money*. Who visits them? No one. Who clicks on the ads? No one. We watch them all the time, and they go away very quickly because they have no value and they never get anywhere. No one is “losing” any revenue. No regular reader of your site who might click on your ads is suddenly being magically sucked away by these scraper sites. The only time people visit these sites is when others complain about them.

    Worrying about it is a waste of time. Time that could be put towards much more productive purposes.

  • >You’re so upset by this, but it’s way out of scale to the problem. These scraper sites *make no money*. Who visits them? No one. Who clicks on the ads? No one.

    Sorry Mike, it’s hard to imagine there would be so many splog sites if they were unvisited and unprofitable. Actually I know quite a few people who have made money using this model.

    As Jeff write – Google is benefitting from this. Arguably though so are the advertisers. When the user arrives at these pages the content is often so useless, they have little choise but to click the adverts in an effort to find what they are seeking.

    It’s a basic advertising 101 that the important thing is to get your message across to the target group. The exposure that advertisers get on splogs is arguably more beneficial than many other sites – their content is relatively more attractive.

  • Destruction of marrow cells leads to reduction in the number of circulating white blood cells, and increased susceptibility to infections. Damage to the lining of the stomach and intestine causes nausea, vomiting, and diarrhea.