The Citizens’ Survey: Open-source polling

The Citizens’ Survey: Open-source polling

: Question: Is it possible to create an open-source polling service that gets a more accurate picture of public opinion and also allows the public to not only answer questions but also to ask them?

Yes___ No___ Don’t know/care___

I think we could create a community-based polling service that answers many needs and deficiencies in the opinion industry today:

: Lots of questions that need to be asked aren’t getting asked: I have my own pet polling wishes: I’d like to see more questions that really determine whether we are a nation at red-v-blue war or whether that is the fantasy of the fringes and media. I’d like to see how many people think the Parents Television Council speaks for them. I know you have your own questions.

: There is what I sense to be a growing storm about bias in polls: One response to that is to have competing polls.

: Polling is innaccurate: Exit the exit polls. I have a theory they’re screwed up because people are gaming the system out of a hostility to polling not unlike the hostility to big media.

: Polls are affecting — and can thus skew — the public discourse and media coverage: Would politicians govern better and the press cover better if they had a truer sense of what their constituencies think?

: New businesses cannot afford market research: Imagine the businesses that could be created if entrepreneurs could sniff out new needs.

: Polling is too damned expensive.

So why shouldn’t the opinion industry find new competition in an open-source citizens’ effort?

I first started thinking about this as I wondered how to get my pet polling questions asked and answered. I wondered whether there could be a way to get established pollsters to charitably add questions to their surveys. But, of course, that’s utterly unworkable: politics, money, work, logistics, and self-interest all get in the way. But then it occurred to me that it would be possible to set up a system for the people to take over polling:

It’s the wikification, blogification, Craigsification, bittorrentification, linuxification of opinion.

The idea is simply that an open polling service allows anyone to answer questions (when they meet sampling requirements) and ask questions (with gating by the community). The requirements (I think):

: It must have scientific sampling: The system has to know the demographics of the nation, gather the demographics of the respondents, and create samples to accurately reflect the public or a slice of it (e.g., young people, women, parents, Democrats, FoxNews viewers…). That requires expertise.

: It must defeat gaming: This is the killer. The obvious fear is that Republicans will masquerade as Democrats, or vice versa, to fake the results (“95 percent of Democrats think Hillary is an alien”). I think the way to deal with this is to align the interests of giving answers and giving personal data: We create a methodology to define “conservative” not just by asking you whether you’re conservative (though we can do that) but primarily by your answers to a larger series of questions (taxes, federalism, etc.) and you won’t know which are questions aimed at determining your categorization or questions from polls to be published. If you want to get your conservative views into this statement of American public opinion, then you are disincented from lying about whether you want to raise taxes, for example.

Also, Wikipedia style, we let the collective define the questions that define the descriptive words: Does tolerance of a federal deficit make you conservative or liberal today? If the collective can’t agree on that, then deficit attitudes do not make up a definition of political ideology.

And, importantly, we do not allow respondents to pick the questions they will answer: You can’t come — USAToday poll-like — to flood the results. The system asks you what it wants to ask you based on its needs to fill certain samples and ask certain questions.

: It must attract a critical mass of respondents: Why would anyone bother answering questions here? I think they would if they knew they were participating in a poll that could influence discourse, government, and industry. I think most of us have enough ego to love to be asked what we think about anything (or is that just bloggers and pundits?). And I think the apparent randomness of the questions you’re asked — so long as they’re not too laborious — makes this fun

Also, if you want to ask questions, you have to answer them to get influence points to push your questions. Perhaps you can align yourself with groups trying to get questions asked and donate your influence points to that group so the group’s questions have a better chance of being selected.

: It must prioritize questions: It can’t ask every version of every question of everyone. So the system needs a means of first pooling similar questions. (Think Flicker: You tag your question “Democrats, Hillary Clinton” to see what questions have already been submitted.) Then the system needs a means for the community to decide which questions get asked (this is where your answer points come in) based on a calculation of system capacity (based on total available respondents in demographic groupings and the number of questions a respondent can be expected to answer) and community demand.

: It must edit questions: Asking polling questions is both a science and an art. The system needs to collect the wisdom of both the crowd and of experts in forming questions in a way that results are meaningful. I think this could be done in wikipedia style with experts — pollsters, statisticians — getting added juice. There is also a need for FAQs and even wizards to guide people through the best way to ask a particular kind of question.

: It must be transparent: Otherwise, there is no way to ferret out bias and opinion spam.

: It must be free: So to support the system, I suggest that companies and campaigns be allowed to use the system for a fee. And wouldn’t this be a heckuvan environment for advertising (you can reach soccer moms for sure).

Could this be a for-profit company? Perhaps. But I don’t think this will work unless the community believes it owns it. I think it needs to be public and transparent.

There is another important benefit that comes from doing this online with a large and ongoing poll of respondents:

The system brings context to polling. It can create panels of individuals and ask them the same questions over time to get a sense of shifting opinion. It can become self-correcting: It can find out whether likely Bush voters really voted for Bush so it can start to measure the likelihood of “likely.” It can also link to other data as a check on results (e.g., the poll says people don’t care about Michael Jackson news and Technorati links back that up among bloggers but the Tyndall Report finds lots of Jackson coverage).

I can’t build this. And for all I know — and I’m sure you’ll tell me — elements of this already exist somewhere. But I’d love to see some brilliant stats wonks bring it all together.

So what do you think?

LIke it___ Hate it___ Don’t know/care___

  • Present polling by telephone skews the results toward the older population, as not many other members of the population answer the phone, talk to anyone they don’t know, own landline phones, or give answers to questions from strangers.
    Isn’t an internet polling possibly a balance to that, but not taken exclusively as giving all the answers?
    I like the idea, but again it presupposes that those answering the poll have the means to maintain a system that the poll uses.
    In my experience, the young and the indigent are left out in polling that uses established landline phones with constant numbers – cell phones by rental are much used by daily laborers and young or impermanent populations. Wouldn’t the same be true of internet polling?

  • cormaggio

    I really like this idea – it seems to really fit with the Live Journal etc. generation – people love to get a chance to air their views and this is what open-source and the internet in general has made possible. I do think though that sampling would be a critical issue. Blogs (like political pundits) have the problem of acting like an echo chamber sometimes – I’m thinking about what will motivate people to participate and what type of people will come. Would it be representative? I suppose that depends on the questions themselves, so it might sort itself out in a circular kind of way. Nice idea

  • Jeff, how much do you know about the mathematical and scientific theories and practices behind survey research, aka polling?
    The biggest problem with your design is that it violates a couple of the most important assumptions and demands of polling and surveying.
    There are polls and there are polls. In theory, the polls done by news outlets are “real” polls that seek to sample a population that represents the true population, or that the results are weighted in such a way to give a mathematical representaion of a subset of the population.
    On the fly polls sometimes done by campaigns are more about “taking the temperature” of a few selected voters, rather than actually reflecting the population as a whole.
    Honestly, open source polling has zero reliability. Go run it my Mystery Pollster for more.
    Also, with regard to 2004 pre-election and exit polls, there’s some evidence that the exit polls were biased due to respondent motivation, and poor research design. The poll was wrong…because it wasn’t executed with care.

  • Jenny: You’re discarding without considering. This method, just like phone polls or in-person polls, would rely on creating scientifically based samples. The only difference is that you make it known you are available for polling. That’s it. As I said in the post, you cannot — as in online polls — pick the questions you answer It is random.
    And it is an open system that lets you or me get a question asked of the nation.

  • Big problem: Your construction of samples is tainted by the “volunteer” effect. You cannot have a survey of people who have volunteered in advance to be survyed and then generalize to the population.
    Part of creating a sample population is that it must be randomly selected from the population you want to generalize to. If, for any reason, the sample population is not random, then your survey goes to pot. That’s why the exit polls were wrong.
    Until you can overcome that necessity for validity, this looks shaky. I just emailed Mystery Pollster, who can speak quite authoritatively to such matters….

  • Like many above have said…it would be biased by internet users, who like it or not, are somewhat “different” than the existing population. Remember the argument about “missing cell phone users” complaints – ie. the young, mobile, no-landline crew?
    You’d have to re-weight (which is where the problems come in) your samples on some known characteristics. Where the interweb respondents are different than the real world would require recalculations: off the top of my head, that means age, sex, political party, income…
    For more on this type of stuff, check out the Mystery Pollster.

  • ThunderDad

    There’s some geek out there who’s probably about two thirds of the way through correcting all the snags you mentioned, sweating bullets as he sees his potential millions vanish into the blogosphere.

  • And wouldn’t this be a heckuvan environment for advertising
    You just destroyed your entire concept. Turning polling into an advertising delivery medium defeats whatever lofty goal you hold for polling in general.
    In fact, I fail to understand the overall goal of such an effort. Do you think better polling is really what our society needs in order to engage in meaningful discourse and political involvement?

  • Bill Bradbrooke

    – etc.
    total agreement!

  • Great thinking.
    Along this line, I’d love to see a new tradition to go with candidate debates: Sets of written questions from an “open source” presented to every candidate.
    The debates give us the chance to watch them under pressure, but the questions are a joke, as candidates always “stay on message” and ignore the questions to simply reguritate the official line. Well, you’d be a fool to do otherwise given how things work.
    But I’d still like to make them answer a few questions- in writing- no “break for commercial” or “90 seconds” or othe convenient excuses.

  • JennyD

    Jeff, this post has bothered me all morning and I still have things to say.
    First, I’ve read here that you deeply concerned about transparency in journalism and the doing of the press. Good. There isn’t enough transparency there. The methodology of the press is invisible, which is why we don’t trust it, why it seems not to work anymore. We’re too savvy to believe things we can’t see.
    What you seem to be concerned about is that polls claim to show how people think, feel, react to something–whatever is asked about. You sound unhappy with your perception that polls are biased and non-transparent.
    But polls are quite transparent. The methodology is nearly always included in the report of the poll itself. For anyone interested in such things, there it is. The New York Times, to its credit, always publishes a box with small type describing the methodology. There’s nothing mysterious about it.
    Look at Gallup’s website today, for example. In its report on public opinion on the Schiave case, it includes a few lines about methodology at the end. I suspect that anyone interested could easily obtain the much longer, much more detailed design for the survey.
    Survey researchers, then, have the ability to see what’s up with a poll, where bias could be introduced. Also, the best polls include the actual wording of the questions in their reports, which allows for analysts and everyone to consider how that affected outcomes. (The NYT does not always do that, unfortunately.)
    AS I said before, internal polls done by campaigns aren’t published except as tips to journalists from isniders, and the methodology used in the polls is invisible.
    My big concern about polling is that what isn’t discussed is how many attempts it requires to get a sample of 909 adults, or whatever. It concerns me if it takes 5000 attempts to get that sample. That might introduce bias, just as having a pool of only volunteers would bias the other way.
    IN the end, my real worry is that journalists and the press fails to report the entire poll and its methodology. They pick and choose, and thus the reports they give on polls may be totally unreliable–even though the polls themselves may be quite valid.

  • Jim in Texas

    Don’t companies like Harris have a stable of individuals that could be used?
    I’m a known polling respondent to Harrisonline, having volunteered and after filling out a questionnaire. They send me surveys and I respond to them.
    Would some scheme like that be sufficient? Or would it lack the spontaneity you seem to want?

  • EverKarl

    IIRC, the Mystery Pollster has written that the 2004 polls were not skewed by the “younger people-cellphone” effect — at least not in a statistically significant way. But internet usage might be a different case.
    An interim suggestion might be to have blogs raise funds for traditional polling. I think Zogby (for one, albeit with controversial methodology) allows people (for a fee) to ask questions in his regular polling.

  • Avatar

    Precisely how would you go about doing something like that?
    If you do it based on volunteers, you get a good idea of how the volunteers feel, but you -cannot- extrapolate out to the general population. (This is something of a problem with traditional polling as well, and one reason exit polls skew off from results. But both of those types of polls at least attempt to ask people at random; they get people who say “no”, but they at least ask people who haven’t said “yes”.)
    If you take a non-random sample and attempt to weight the results in order to reflect the population as a whole, you introduce weighting error. If you think there are more Republicans or Democrats than there really are, and weight accordingly, you’re going to end up pretty wrong, no?
    If you attempt any kind of a “random” sampling, through e-mail or whatever, you’ve got no idea of whether the person on the other end is a human or not. In other words, it’d be pretty easy to set up a “poll-bot” to pollute the results the way you wanted. This is one reason that online polls are even more unreliable than a conventional straw poll with self-selected participants. If anybody cares at all about the results, they’ll fix those same results.
    One specific problem is that you have to, HAVE TO have adults running the store, so to speak, or your poll is going to be full of annoying questions designed to piss off the respondent. “Has George Bush stopped beating his wife? Yes / No” But then, unless you’ve got some selfless and dedicated managers who are actively trying towards nonpartisan results, you introduce a lot of bias in the design/editorial phase and the project dies horribly anyway. I submit that the Wikipedia model has shown that mass participation with a few editors is NOT a good model to follow on topics of political controversy.
    I hate to be a naysayer, but to do it right, you really DO need dedicated personnel, which means doing it for a living and charging people money.

  • Mavis Beacon

    Good idea. Here are some problems I foresee:
    1. True random sampling, already a problem for the big boys, will be nonexistant in this case. So it’s a poll of internet users who want their opinons voiced in “yes” or “no” form. Not useless but hardly a real window into America.
    2. You want, “scientific sampling,” meaning that a webmaster or the program itself will divide things by demographic data. So participants will have to register and honestly provide their age, ethnicity, location, religion, gender, etc. That takes time and requires a level of honesty I’m not so sure exists on the net. (Who doesn’t live in Ohio in November 2008?)
    3. I can’t imagine, as a poll consumer, trusting it any more than and ABC online poll. It’d be an interesting quick measure of public opinion and useful on questions that don’t get asked, but when there’s conflict between open source and Gallup, I’ll go Gallup every time.

  • Captain Mainline, what you seek already exists on Project Vote-Smart – it’s called the National Political Awareness Test or NPAT. It should come as no suprise that the majority of politicians do not complete it.

  • good idea.

  • garrett

    Let the participants decide what it means to be a conservative, or a liberal (and any other descriptive categories that you wan), but don’t ask me for a self-description. On many issues I am conservative, on a few, I remain flaming liberal. Opinion will change over time as to who fits in what category. Include regular updates of category definitions.
    Transperancy is essential. Include some open material about the polling methods. Questions should be as neutral as possible. The most destructive thing about modern polling is the way questions are phrased to ellicit the desired trend.
    I like the business applications. Support of entrepenurial support is badly needed in this country as industry goes away and corporations become less reliable employers.

  • I think it would be interesting to ask participants to self identify and to also include a series of questions that could “index” their political ideologies. This way you would know what they think they are and what they actually are. It could be part of a membership profile that participants could change if their views shifted.
    I think it would be very interesting to find out the difference between how people self identify and how their views define them.

  • Jeff,
    Poll analysis is my forte. The respondant who told you that your proposal cannot work was correct. You correctly identified that there must be scientific sampling for it to work. But your proposal is based on self-selection. Self selection and scientific sampling are, by definition, mutually exclusive. It would be like saying that you want a red shirt that is 100% green.

  • Craig Walters

    Great idea.
    I would love to see polling without the media or political party bias.

  • Pollsters face the same problem that the Justice Department faces in addressing terrorism, where it is no longer tasked with the apprehension of crimminals but with the prevention of crimes. Gathering the information needed to predict a crime effectively before the fact requires the invasion of privacy on a massive scale. That’s the essential, unacknowledged, Catch 22 inherent in “scienfic sampling.”
    Determining whether or not your sample is, in fact, representative of the whole, requires the actual accumulation of demographic data that statistical sampling is ostensibly designed to obviate. In other words, if you don’t know everything about everybody, you don’t really know enough to design a representative sample.
    There’s a double-whammy when you’re trying to sample opinion which you can neither compel nor confirm independently. It’s a moving target which may or may not conform to demographic data, as we know it, at any given point in time. You can never fit all the actual variables into a single package, which is where the controversial art of weighting comes in.
    My Step #1 is the reverse of yours: First, abandon the self-limiting concept of a “scientific” sample as your basis for extrapolation. It may well be part of the problem, not the solution. It could also be the very (self-imposed) constraint which closes the door you’re trying to open here.

  • david

    I founded a company in 1998 to build a system for finding voter information and creating a database of voters that could be randomly sorted for oppinon polling. When the Internet bubble burst, we went down, but the software and concept still could work. We have been discussing resurrecting the idea. Jeff, If you’d like to email me, I can describe the idea in greater detail.

  • Gerry —
    In re: “Self selection and scientific sampling are, by definition, mutually exclusive.”
    And that’s a fundamental fly in the scientific sampling ointment. You may be able to control self-selection IN, but where opinion in particular is concerned, controlling for those who self-select OUT (e.g. hang-ups, cell-phones — on the obvious end of the opt out spectrum) is an entirely different proposition.

  • david

    You can handle the “opt-in” problem of self selecting by creating a polling sample that isn’t re-created every time you poll. In other words, a database large enough (that’s essentially what a phone book is to the phone polling approach) that you don’t need to worry about self selected opinions for each poll. Then you have the problem of identifying the demographics of the population you select for the poll. That can be solved, but itís tricky and expensive. Hint: geographic information system (gis) software!

  • “but where opinion in particular is concerned, controlling for those who self-select OUT (e.g. hang-ups, cell-phones — on the obvious end of the opt out spectrum) is an entirely different proposition.”
    True dat. From a purely theoretical point of view, you are 100% correct.
    However, in practice to date it has not worked out that way. Despite the fact that the refusal rate has become quite high for traditional polling techniques (to where I think the typical hit rate is in the 25ish%), validation studies and the actual results of elections compared to topline horserace numbers has shown that, in practice, the distortion injected by people self-selecting out (by refusing to participate, hanging up, etc) after being selected at random (as sampling theory dictates) has yet to become a problem.
    The same is not true for purely self-selected samples. The distortion there is so high that it completely overwhelms the margin of error due to sampling.
    For Jeff’s idea to be implemented, it would require some takeoff on the model Harris uses for their online polling– which is similar to what Zogby has tried to do but without the egregious errors in concept that the latter’s has. Basically, Harris collected email addresses from non-political sources for a significant amount of time. They did it from retailers, from contests, you name it. From anything but politics, or from recruiting drives. Then they pull random samples from there and contact them by email. They have managed to get things to work to some degree (they were accurate down to the state level in 2000, but if they repeated the effort in 2004 they did not do so publicly). Zogby, on the other hand, built his registry of email addresses from his own website and from partnerships with other political websites– and as such his sample universe is disproportionally interested in politics, and as such not represenational of the general public, and as such his “Interactive” polls were every bit as inaccurate as the extremely flawed Mitofsky/Edison exit polls. Heck, he even had Kerry ahead in Tennessee for a portion of the campaign.

  • Gerry–
    “Basically, Harris collected email addresses from non-political sources for a significant amount of time.”
    Letís assume I stipulate that self-selected randomness is an oxymoron! In adjusting for opt outs, you’re essentially back to my original point in an earlier comment:
    You have to start with sufficient data on the known universe of samplees, which includes enough info to identify and eliminate samplees who will distort your results (theoretical randomness bites the dust again, eh?). Even so, things may only ìwork to some degree,î which you happen to know in this case, because you actually have the means (an election) for ascertaining accuracy. In addition, the veracity of the answers you get in pre-election polling may actually be less pivotal than the pattern or array of the answers youíre interpreting.
    Electiona are a relatively simple either/or proposition compared to what Jeff is looking for. For example, we all know that the ìvaluesî vote went to Bush in the last election, but folks are still arguing about what that actually means. The problem with recent exit polling may, or may not, be entirely a matter of sampling errors.
    Tough to brainstorm something new and different when you (in the general sense) start out imposing the same old framework now in use. Youíre just asking for better, cheaper (& maybe even bigger) polls so you can afford to ask your own questions. Of course, maybe thatís all Jeff really has in mind.

  • By same old framework in the above, I meant requiring a scientific sampling based solution, of course.

  • Todd David

    Why not “open source” all professional services?
    If you’re one of the “brilliant stats wonks,” why would you want to give away your services?
    Comrade Jarvis, methinks you’re agitating for a digital commune. Last I checked we were living in a capitalist economy.

  • David Bennett

    Since the sample is self selected you are not going to get a set of views that mirror the population.
    This isn’t necessarily bad, you get the input from people who care about the issue. Most important decisions are made by a few.
    One thing that interests me more than a static sample is the shifting and evolution of concepts. For example you mention the red/blue divide, it’s real to many (in part because it’s become a mass truth,) it reflects a number of important debates; it also goes away when problems are defined with different priorities and contexts. I would argue that this is why the comments section of blogs is so important. Especially systems like those of the scoop engine (open source) that allow complex structuring of comments, independant addressing, weighing and can support things like polls.
    Personally I would find more value in an approximate system which gathered together a fairly complete record of blogs and other net interactive forums, had crude methods (XML tags, what little AI we have, polls) for gathering numbers, used a lot of human intervention to organize this data and spotted not only current opinions, but shifts in ideas as they evolve.
    Commercially these tools are valuable for companies. One of the things they need to do is put less money into surveys, have their executives scan the blogs and elsewhere for referances, try to spot trends as they develop and so on.
    There are so many variables that go into deciding pinions, for example our politicians give greater weight to an originally composed letter, delivered by the US postal service. Standard polls were crude approximations with supposedly solid numbers. With more accurate tools the numbers will be less solid, but the measurements will start to sort the intricacies and the complex flow of “memes” (I use this word not because it has any specific meaning but because it gives a poetic sense of the complexity of idea transmission and effect.)
    To do this kind of thing we will need people who use semi rigorous methods to examine domains and sub domains (eg. popular, expert, you name the category) of various medias and show some integrity in putting together reports. They can also attempt polling by sking the various publishers to inform their members, almost everyone is eager for attention. The results will be imperfect, and obviously so, but I will argue far more accurate than a system which pretends mathematical rigor, but maps so little of the complexity.
    Rewarding this kind of work is another problem. I would suggest a “nickel and dime” system such as the following:

  • Gerry

    “Tough to brainstorm something new and different when you (in the general sense) start out imposing the same old framework now in use.”
    Well, put it this way. The new and different things I saw suggested aren’t new, and one important way that they are different than existing polls is that they are much less accurate.

  • CiderJack

    I think it’s an excellent idea! Clearly the results wouldn’t (couldn’t?) by any means be “scientific,” but it would certainly stimulate discussion.
    Check out OK Cupid (yeah yeah, a singles site) They allow users to write their own questions, and I believe have cycled over 10,000 through their system, just in the last year alone. A very innovative approach that I believe could be easily modified to address what you are proposing here.
    (btw, this form mistakenly calls my website (www.freewebsDOTcom/fightterrornetwork) “questionable content,” fwiw)