Interestingness

Thomas Hawk wonders whether Flickr’s interestingness will allow Yahoo to leap ahead of Google in at least one arena of search: photos.

I wonder two things:

First, why just photos? Couldn’t interestingess become valuable in an overall search algorithim?

But second, in an interesting comment discussion under my post yesterday on interestingness, KirkH asks whether interestingness requires that the content judged be hosted on one site. That’s a good question, for interestingness appears to be about both vectors of interest and also about relationships and I’m not sure whether or how the data to feed that algorithim can be done across a distributed network.

  • http://thomashawk.com Thomas Hawk

    Where interestingness works for Flickr is that there is a great deal of community interaction that is built around the photos of Flickr. All of this data, favorites, comments, notes, views, etc. represent activity and interest. Especially when people are marking a photograph as a favorite it says that they think it is good. When you have enough unrelated people saying something is good, there is a good chance that it is interesting.

    Flickr is like an addiction for many. I’ve got over 11,000 photos marked as favorites (and believe it or not they are all truly amazing). You get sucked in and can spend hours and hours and hours participating in it. So it is this massive amount of free labor that allows a finely tuned human filter to run through the photos of Flickr. There are still some little things that Flickr can do to get the tagging better (a spellchecker for tags, promoting and incentivizing tagging even more, actually hiring college interns or other low cost labor to tag their top photographers with additional tags — especially highly searched tags where appropriate, etc.). But what makes the interestingness algorithm most compelling is that it works really really well. And it works really really well because of the massive amount of human time that goes into reviewing the photographs on Flickr.

    Even still, with Flickr today image search still has a long way to go. The biggest problem with image search at Flickr is simply that the library is too small. Image Search at Google and Yahoo! is completely exhaustive. If I want to find a photo of some rare river in Africa, there is a good chance that I can find one. It won’t be a very good one, but it will be there. Flickr on the other hand only has something over 1 million members (I haven’t seen any updates on members since the Flickr blog announced their one millionth member a while back). Although one million sounds like a lot, it’s not. One million people are fantastic at getting you the very best photo possible of the Empire State Building, or of a rose, or of the Golden Gate Bridge, or of “San Diego”, etc. You get the idea. But you won’t find that rare river in Africa on Flickr… yet.

    The key to Flickr’s continued dominance in image search will be to continue to recruit the very best photographers. Eventually one of them will make their way to Africa and we’ll actually get that fantastic photo of the rare river that we are looking for. I’ve suggested in the past that Flickr create a very inexpensive rewards type program to incentivize referral activity to the site. Certainly giving top photographers even free Pro accounts, etc. could help. But these are the ones you want essentially shooting your images for your future image searches at Flickr.

    Flickr also most likely (and should) assigns rankings to their members. I’m not sure on this for sure as interestingness is “secret sauce” or so they say. But I suspect that Flickr is also looking at how reliable a persons comments are towards superior photography. If I’m a somewhat uninvolved Flickr member but my Aunt Mary posts her wedding photos on Flickr and I fav them all, these should carry less weight than if I’m involved in the community and consistently rank well. Reputation ranking will also be something to watch in the future.

    Perhaps the thing that is most important to consider with all search though is how little anything beyond the 2nd page matters. Almost all searches stop after two pages and this is as deep as any human filter needs to go really. The rest of the stuff will be there for the rare times that it’s needed. And a lot of the long tail stuff as well.

    So there are two questions really. 1. Why in the blazes has Yahoo! still not integrated Flickr’s interestingness into their own image search where applicable? and 2. How do you translate the superior model of image search that Flickr has achieved to other areas of the internet?

    Perhaps the closest thing I’ve seen that may have potential is Digg (who congrats just received more funding). Similar to Flickr the idea behind Digg is that people either dig a story or they don’t. They seem to be building a large community quickly and for the posts of mine that have shown up there ranked highly the traffic has been massive. Mining Digg’s top stories (which are run through their members human filter) and assigning high search page ranks to highly ranked stories would make sense.

    Like Flickr though, Digg will depend on a community. And you need to keep this community happy and incentivized to continue to be your human filter. I think that building online communities will represent great opportunity in the next five years and the ones that catch on will have much more value in the form of the byproduct of smart filters than people realize today. Yahoo! was very smart to pick up Flickr. I suspect that Digg may be the next one scooped up.

    Flickr and Digg can never fully replace search as we know it today. They can merely enhance it. Neither is exhaustive enough to cover the breadth needed. But when horses race and the winner by a nose gets all the money any small enhancement in search is magnified.

    Thanks for the link Jeff.

  • http://unbeknownst.net KirkH

    For an example of distributed interestingness in the form of MP3s go here:
    http://del.icio.us/popular/system:filetype:mp3

    That is Flickr interestingness with decentralized content. Instead of clicking “Favorite”, Del.icio.us bases interestingness on how many tag something.

    Similarly, http://del.icio.us/popular does the same thing Digg does and often has the same content except it doesn’t have the fancy interface of Digg. The community aspect of Digg, seen in the comments, could be replicated by extracting the notes from Del.icio.us posts and presenting them above the notes form field and creating a decentralized discussion forum. I’m going to try to do this in my spare time, I’m not sure if it’d be a first but things get interesting when you start to tag things with sentences instead of just single words.

    That type of conversation wouldn’t be totally decentralized because the content is all on the Del.icio.us servers but… I found an academic paper about decentralized search engines:

    http://security.riit.tsinghua.edu.cn/share/coopeer.pdf
    “Insertion of personalized factor for searching results by routing in self-organized user community. These advantages are only possible in P2P network where the information and cost is shared among all the members.”

    If you look at the performance problems del.icio.us is having then it seems a truly distributed system is the only solution. I think it’ll soon be possible to distribute work of a search engine to the edges. What if web browsers acted as agents in a big distributed search engine? Maybe that’s what Google’s trying to accomplish with their toolbar but their search algorithm is as secret as the recipe for Coke and I doubt they’ll ever have the nerve to send it out to us. The Google toolbar did integrate the Folding@Home distributed simulation client for a while, maybe they were thinking about this stuff a couple of years ago.

    From News.com

    “We’re never going to open-source PageRank,” DiBona said, referring to the algorithm the company uses to choose which search results to present. “It’s the thing that makes Google Google.”

    They can keep buying servers and dark fiber but at some point a the BitTorrent of search and CPUs of the masses will eat their lunch.

    There have been cases where DUIs and speeding tickets were overturned because the manufacturers (radar guns, breathalyzers) wouldn’t allow the defense to see the code or algorithms that determine guilt. Why shouldn’t we hold Google to the same standards when it comes to their authority?

  • http://unbeknownst.net KirkH

    Last idea, a well designed user interface could automatically tag comment tags with some standardized reference system so you could create a structured discussion.

    So what word might be used to tag a tagged tag? I don’t know, it makes my head hurt :)

  • Pingback: Flickr Interestingness Rankings Released » SEO by the SEA

  • Not very interesting

    Flickr’s interestingness really doesn’t work. The most interesting photos according to Flickr are often not interesting. In fact they tend to be technically skilled photos of fairly boring things – sunsets, boats on a lake, flower macros, lame “arty” photos of pretty girls, etc. Flickr actually shows off its most interesting pictures via a feature called “Explore” where you can see the most interesting photos from the most recent days, and it’s not very impressive.

  • http://www.qualitybacklinks.in Isaias Pery

    Hello, I found your blog in a new directory of blogs. I don’t know how your blog came up, must have been a type, Your blog looks good. Have a nice day.

  • Pingback: Flickr Interestingness Rankings Patents Released