A few unpopular opinions about AI

In a conversation with Jason Howell for his upcoming AI podcast on the TWiT network, I came to wonder whether ChatGPT and large language models might give all of artificial intelligence cultural cooties, for the technology is being misused by companies and miscast by media such that the public may come to wonder whether they can ever trust the output of a machine. That is the disaster scenario the AI boys do not account for.

While AI’s boys are busy thumping their chests about their power to annihilate humanity, if they are not careful — and they are not — generative AI could come to be distrusted for misleading users (the companies’ fault more than the machine’s); filling our already messy information ecosystem with the data equivalent of Styrofoam peanuts and junk mail; making news worse; making customer service even worse; making education worse; threatening jobs; and hurting the environment. What’s not to dislike?

Below I will share my likely unpopular opinions about large language models — how they should not be used in search or news, how building effective guardrails is improbable, how we already have enough fucking content in the world. But first, a few caveats:

I do see limited potential uses for synthetic text and generative AI. Watch this excellent talk by Emily Bender, one of the authors of the seminal Stochastic Parrots paper and a leading critic of AI hype, suggesting criteria for acceptable applications: cases where language form and fluency matter but facts do not (e.g., foreign language instruction), where bias can be filtered, and where originality is not required.

Here I explored the idea that large language models could help extend literacy for those who are intimidated by writing and thus excluded from discourse. I am impressed with Google’s NotebookLM (which I’ve seen thanks to Steven Johnson, its editorial director), as an augmentative tool designed not to create content but to help writers organize research and enter into dialog with text (a possible new model for interaction with news, by the way). Gutenberg can be blamed for giving birth to the drudgery of bureaucracy and perhaps LLMs can save us some of the grind of responding to it.

I value much of what machine learning makes possible today — in, for example, Google’s Search, Translate, Maps, Assistant, and autocomplete. I am a defender of the internet (subject of my next book) and, yes, social media. Yet I am cautious about this latest AI flavor of the month, not because generative AI itself is dangerous but because the uses to which it is being put are stupid and its current proprietors are worrisome.

So here are a few of my unpopular opinions about large language models like ChatGPT:

It is irresponsible to use generative AI models as presently constituted in search or anywhere users are conditioned to expect facts and truthful responses. Presented with the empty box on Bing’s or Google’s search engines, one expects at least a credible list of sites relevant to one’s query, or a direct response based on a trusted source: Wikipedia or services providing the weather, stock prices, or sports scores. To have an LLM generate a response — knowing full well that the program has no understanding of fact — is simply wrong.

No news organization should use generative AI to write news stories, except in very circumscribed circumstances. For years now, wire services have used artificial intelligence software to generate simple news stories from limited, verified, and highly structured data — finance, sports, weather — and that works because of the strictly bounded arena in which such programs work. Using LLMs trained on the entire web to generate news stories from the ether is irresponsible, for it only predicts words, it cannot discern facts, and it reflects biases. I endorse experimenting with AI to augment journalists’ work, organizing information or analyzing data. Otherwise, stay away.

The last thing the world needs is more content. This, too, we can blame on Gutenberg (and I do, in The Gutenberg Parenthesis), for printing brought about the commodification of conversation and creativity as a product we call content. Journalists and other writers came to believe that their value resides entirely in content, rather than in the higher, human concepts of service and relationships. So my industry, at its most industrial, thinks its mission is to extrude ever more content. The business model encourages that: more stuff to fill more pages to get more clicks and more attention and a few more ad pennies. And now comes AI, able to manufacture no end of stuff. No. Tell the machine to STFU.

There will be no way to build foolproof guardrails against people making AI do bad things. We regularly see news articles reporting that an LLM lied about — even libeled — someone. First note well that LLMs do not lie or hallucinate because they have no conception of truth or meaning. Thus they can be made to say anything about anyone. The only limit on such behavior is the developers’ ability to predict and forbid everything bad that anyone could do with the software. (See, for example, how ChatGPT at first refused to go where The New York Times’ Kevin Roose wanted it to go and even scolded him for trying to draw out its dark side. But Roose persevered and led it astray anyway.) No policy, no statute, no regulation, no code can prevent this. So what do we do? We try to hold accountable the user who gets the machine to say bad shit and then spread it, just as we would if you printed out nasty shit on your HP printer and posted it around the neighborhood. Not much else we can do.

AI will not ruin democracy. We see regular alarms that AI will produce so much disinformation that democracy is in peril — see a recent warning from Jon Naughton of The Guardian that “a tsunami of AI misinformation will shape next year’s knife-edge elections.” But hold on. First, we already have more than enough misinformation; who’s to say that any more will make a difference? Second, research finds again and again that online disinformation played a small role in the 2016 election. We have bigger problems to address about the willful credulity of those who want to signal their hatreds with misinformation and we should not let tropes of techno moral panic distract us from that greater peril.

Perhaps LLMs should have been introduced as fiction machines. ChatGPT is a nice parlor trick, no doubt. It can make shit up. It can sound like us. Cool. If that entertaining power were used to write short stories or songs or poems and if it were clearly understood that the machine could do little else, I’m not sure we’d be in our current dither about AI. Problem is, as any novelist or songwriter or poet can tell you, there’s little money in creativity anymore. That wouldn’t attract billions in venture capital and the stratospheric valuations that go with it whenever AI is associated with internet search, media, and McKinsey finding a new way to kill jobs. As with so much else today, the problem isn’t with the tool or the user but with capitalism. (To those who would correct me and say it’s late-stage capitalism, I respond: How can you be so sure it is in its last stages?)

Training artificial intelligence models on existing content could be considered fair use. Their output is generally transformative. If that is true, then training machines on content would not be a violation of copyright or theft. It will take years for courts to adjudicate the implications of generative AI on outmoded copyright doctrine and law. As Harvard Law Professor Lawrence Lessig famously said, fair use is the right to hire an attorney. Media moguls are rushing to do just that, hiring lawyers to force AI companies to pay for the right to use news content to train their machines — just as the publishers paid lobbyists to get legislators to pass laws to get search engines and social media platforms to pay to link to news content. (See how well that’s working out in Canada.) I am no lawyer but I believe training machines on any content that is lawfully acquired so it can be inspired to produce new content is not a violation of copyright. Note my italics.

Machines should have the same right to learn as humans; to say otherwise is to set a dangerous precedent for humans. If we say that a machine is not allowed to learn, to read, to extract knowledge from existing content and adapt it to other uses, then I fear it would not be a long leap to declare what we as humans are not allowed to read, see, or know some things. This puts us in the odd position of having to defend the machine’s rights so as to protect our own.

Stopping large language models from having access to quality content will make them even worse. Same problem we have in our democracy: Pay walls restrict quality information to the already rich and powerful, leaving the field — whether that is news or democracy or machine learning — free to bad actors and their disinformation.

Does the product of the machine deserve copyright protection? I’m not sure. A federal court just upheld the US Copyright Office’s refusal to grant copyright protection to the product of AI. I’m just as happy as the next copyright revolutionary to see the old doctrine fenced in for the sake of a larger commons. But the agency’s ruling was limited to content generated solely by the machine and in most cases (in fact, all cases) people are involved. So I’m not sure where we will end up. The bottom line is that we need a wholesale reconsideration of copyright (which I also address in The Gutenberg Parenthesis). Odds of that happening? About as high as the odds that AI will destroy mankind.

The most dangerous prospect arising from the current generation of AI is not the technology, but the philosophy espoused by some of its technologists. I won’t venture deep down this rat hole now, but the faux philosophies espoused by many of the AI boys — in the acronym of Émile Torres and Timnit Gebru, TESCREAL, or longtermism for short — is noxious and frightening, serving as self-justification for their wealth and power. Their philosophizing might add up to a glib freshman’s essay on utilitarianism if it did not also border on eugenics and if these boys did not have the wealth and power they wield. See Torres’ excellent reporting on TESCREAL here. Media should be paying attention to this angle instead of acting as the boys’ fawning stenographers. They must bring the voices of responsible scholars — from many fields, including the humanities — into the discussion. And government should encourage truly open-source development and investment to bring on competitors that can keep these boys, more than their machines, in check.