Gibberish from the machine

I’m honored that Germany’s Stern asked me to write about AI and journalism for a 75th anniversary edition. Here’s a version prior to final editing and trimming for print and translation. And I learned a new word: Kauderwelsch (“The variety of Romansch spoken in the Swiss town of Chur (Kauder) in canton Graubünden) means gibberish. 

We have Gutenberg to blame. It is because of his invention, print, that society came to think of public discourse, creativity, and news as “content,” a commodity to fill the products we call publications or lately websites. Journalists believe that their value resides primarily in making content. To fill the internet’s insatiable maw, reporters at some online sites are given content quotas, and their news organizations no longer appoint editors-in-chief but instead “chief content officers.” For the record, Stern still has actual editors, many of them.

And now here comes a machine — generative artificial intelligence or large language models (LLMs), such as ChatGPT — that can create no end of content: text that sounds just like us because it has been trained on all our words. An LLM maps the trillions of relationships among billions of words, turning them and their connections into numbers a computer can calculate. LLMs have no understanding of the words, no conception of truth. They are programmed only to predict the next most likely word to occur in a sentence.

A New York lawyer named Steven Schwartz had to learn his lesson about ChatGPT’s factual fallibility the hard way. In a now-infamous case, attorney Schwartz asked ChatGPT for precedents in a lawsuit involving an errant airline snack cart and his client’s allegedly injured knee. Schwartz needed to find cases relating to highly technical issues of international treaties and bankruptcy. ChatGPT dutifully delivered more than a half-dozen citations.

As soon as Schwartz’s firm filed the resulting legal brief in federal court, opposing counsel said they could not find the cases, and the judge, P. Kevin Castel, directed the lawyers to produce them. Schwartz returned to ChatGPT. The machine is programmed to tell us what we want to hear, so when Schwartz asked whether the cases were real, ChatGPT said they were. Schwartz then asked ChatGPT to show him the complete cases; it did, and he sent them to the court. The judge called them “gibberish” and ordered Schwartz and his colleagues into court to explain why they should not be sanctioned. I was there, along with many more journalists, to witness the humbling of the attorneys at the hands of technology and the media.

“The world now knows about the dangers of ChatGPT,” the lawyers’ lawyer told the judge. “The court has done its job warning the public of these risks.” Judge Castel interrupted: “I did not set out to do that.” The problem here was not with the technology but with the lawyers who used it, who failed to heed warnings about the dubious citations, who failed to use other tools — even Google — to verify them, and who failed to serve their clients. The lawyers’ lawyer said Schwartz “was playing with live ammo. He didn’t know because technology lied to him.”

But ChatGPT did not lie because, again, it has no conception of truth. Nor did it “hallucinate,” in the description of its creators. It simply predicted strings of words, which sounded right but were not. The judge fined the lawyers $5,000 each and acknowledged that they had suffered humiliation enough in news coverage of their predicament.

Herein lies a cautionary tale for news organizations that are rushing to have large language models write stories — because they want to be cool and trendy, or save work, or perhaps to eliminate jobs, and manufacture ever more content. The news companies CNET and G/O Media have gotten into hot water for using AI to produce content that turned out to be less than factual. America’s largest newspaper chain, Gannett, just turned off artificial intelligence that was producing embarrassing sports stories that would call a football game “a close encounter of the athletic kind.” I have heard online editors plead that they are in a war to produce more and more content to attract more likes and clicks so they may earn more digital advertising pennies. Their problem is that they think their mission is only to make content.

My advice to editors and publishers is to steer clear of large language models for writing the news, except in well-proven use cases, such as turning highly structured financial reports into basic news stories, which must be checked before release. I would give the same advice to Microsoft and Google about connecting LLMs with their search engines. Fact-free gibberish coming out of the machine could ruin the authority and credibility of both news and technology companies — and affect the reputation of artificial intelligence overall.

There are good uses for AI. I benefit from it every day in, for example, Google Translate, Maps, Assistant, and autocomplete. As for large language models, they could be useful to augment — not replace — journalists’ work. I recently tested a new Google tool called NotebookLM, which can take a folder filled with a journalist’s research and summarize it, organize it, and allow the writer to ask questions of it. LLMs could also be used in, for example, language education, where what matters is fluency, not facts. My international students use these programs to smooth out their English for school and work. I even believe LLMs could be used to extend literacy, to help people who are intimidated by writing to communicate more effectively and tell their own stories.

Ah, but therein lies the rub for writers, like me. We believe we are special, that we hold a skill — a talent for writing — that few others can boast. We are storytellers and wield the power to tell others’ tales, to decide what tales are told, who shall be heard in them, and how they will begin and neatly end. We think that gives us the ability to explain the world in what journalists like to call the first draft of history — the news.

Now writers and journalists see both the internet and AI as competition. The internet enables the silent mass of citizens who were not heard in media to at last have their say — and to create a lot of content. And by producing credible prose in seconds, AI devalues writing and robs writers of their special status.

This is one reason why I believe we see hostile coverage of technology in media these days. News organizations and their proprietors claim that Google, Facebook, et al steal away audience, attention, and advertising money (as if God granted publishers those assets in perpetuity). Journalists are engaged in their latest moral panic — another in a long line of panics over movies, television, comic books, rock lyrics, and video games. They warn about the dangers of the internet, social media, our phones, and now AI, claiming that these technologies will make us stupid, addict us, take away our jobs, and destroy democracy under a deluge of disinformation.

They should calm down. A 2020 study found that in the US no age group “spent more than an average of a minute a day engaging with fake news, nor did it occupy more than 0.2% of their overall media consumption.” The issue for democracy isn’t so much disinformation but the willingness — the eagerness — of some citizens to believe lies that stoke their own fears and hatreds. Journalism should be reporting on the roots of bigotry and extremism rather than simplistically blaming technology.

In my book, The Gutenberg Parenthesis, I track society’s entry into the age of print as we now leave it for the digital age that follows. Print’s development as an institution of authority took time. Not until fifty years after Gutenberg’s Bible, around 1500, did the book take the shape we know today, with titles, title pages, and page numbers. It took another century, a few years either side of 1600, before the technology and its technologists — printers — faded into the background, making way for tremendous innovation with print: the birth of the modern novel with Cervantes, the essay with Montaigne, and the newspaper. A business model for print did not arrive until one century more, in 1710, with the advent of copyright. Come the 1800s, the technology of print — which had hardly changed since Gutenberg — evolved at last with the arrival of steam-powered presses and typesetting machines, leading to the birth of mass media. The twentieth century brought print’s first competitors, radio and television. And here we are today, just over a quarter century past the introduction of the commercial web browser. This is to say that we are likely at just the beginning of a long transition into the digital age. It is only 1480 in Gutenberg years.

In the beginning, rumor was trusted more than print because any anonymous printer could produce a book or pamphlet — just as anyone today can make a web site or tweet. In 1470 — only fifteen years after Gutenberg’s Bible came off the press — Latin scholar Niccolò Perotti made what is said to be the first call for censorship of print. Offended by a bad translation of Pliny, he wrote to the Pope demanding that a censor be assigned to approve all text before it came off the press. As I thought about this, I realized Perroti was not seeking censorship. Instead, he was anticipating the establishment of the institutions of editing and publishing, which would assure quality and authority in print for centuries.

Like Perotti in his day, media and politicians today demand that something must be done about harmful content online. Governments — like editors and publishers — cannot cope with the scale of speech now, so they deputize platforms to police and censor all that is said online. It is an impossible task.

Journalists must be careful using AI to produce the news. At the same time, there is a danger in demonizing the technology. In the best case, the rise of AI might force journalists to examine their role in society, to ask how they improve public discourse. The internet provides them with many new ways to connect with communities, to build relationships of trust and authority with them, to listen to their needs, to discover and share voices too long not heard in the public sphere, to expand the work of journalism past publishing to the wider canvas of the internet.

Journalists think their content is what makes them valuable, and so publishers and their lawyers and lobbyists are threatening to sue AI companies, dreaming of huge payments for machines that read their content. That is no strategy for the future of journalism. Neither is Axel Springer’s plan to replace journalists in content factories with AI. That is not where the value of journalism lies. It lies with reporting on and serving communities. Like Nicollò Perotti, we should anticipate the creation of new services to help internet users cope with the abundance of content today, to verify the truth and falsity of what we see online, to assess authority, to discover more diverse voices, to nurture new talent, to recommend content that is worth our time and attention. Could such a service be the basis of a new journalism for the online, AI age?

Copyright and AI and journalism

The US Copyright Office just put out a call for comment on copyright and artificial intelligence. It is a thoughtful document based on listening sessions already held, with thirty-four questions on rights regarding inclusion in learning sets, transparency, the copyrightability of generative AI’s output, and use of likeness. Some of the questions — for example, on whether legislation should require assent or licensing — frighten me, for reasons I set forth in my comments, which I offer to the Office in the context of journalism and its history:

I am a journalist and journalism professor at the City University of New York. I write — speaking for myself — in reply to the Copyright Office’s queries regarding AI, to bring one perspective from my field, as well as the context of history. I will warn that precedents set in regulating this technology could impinge on freedom of expression and quality of information for all. I also will share a proposal for an updated framework for copyright that I call creditright, which I developed in a project with the World Economic Forum at Davos.

First, some context from present practice and history in journalism. It is ironic that newspaper publishers would decry AI reading and learning from their text when journalists themselves read, learn from, rewrite, and repurpose each others’ work in their publications every day. They do the same with sources and experts, without remuneration and often without credit. This is the time-honored tradition in the field.

The 1792 US Post Office Act provided for newspapers to send copies to each other for free for the express purpose of allowing them to copy each other, creating a de facto network of news in the new nation. In fact, many newspapers employed “scissors editors” — their actual job title — to cut out stories to reprint. As I recount in my book, The Gutenberg Parenthesis: The Age of Print and Its Lessons for the Age of the Internet (Bloomsbury Academic, 2023, 217), the only thing that would irritate publishers was if they were not credited.

As the Office well knows, the Copyright Act of 1790 covered only books, charts, and maps, and not newspapers or magazines. Not until 1909 did copyright law include newspapers, but even then, according to Will Slauter in Who Owns the News?: A History of Copyright (Stanford University Press, 2019), there was debate as to whether news articles, as opposed to literary features, were to be protected, for they were often anonymous, the product of business interest more than authorship. Thus the definition of authorship — whether by person, publication, or now machine — remains unsettled.

As to Question 1, regarding the benefits and risks of this technology (in the context of news), I have warned editors away from using generative AI to produce news stories. I covered the show-cause hearing for the attorney who infamously asked ChatGPT for citations for a federal court filing. I use that tale as an object lesson for news organizations (and search platforms) to keep large language models far away from any use involving the expectation of facts and credibility. However, I do see many uses for AI in journalism and I worry that the larger technological field of artificial intelligence and machine learning could be swept up in regulation because of the misuse, misrepresentation, factual fallibility, and falling reputation of generative AI specifically.

AI is invaluable in translation, allowing both journalists and users to read news around the world. I have tested Google’s upcoming product, NotebookLM; augmentative tools such as this, used to summarize and organize a writer’s research, could be quite useful in improving journalists’ work. In discussing the tool with the project’s editorial director, author Steven Johnson, we saw another powerful use and possible business model for news: allowing readers to query and enter into dialogue with a publisher’s content. Finally, I have speculated that generative AI could extend literacy, helping those who are intimidated by the act of writing to help tell — and illustrate — their own stories.

In reviewing media coverage of AI, I ask you to keep in mind that journalists and publishers see the internet and now artificial intelligence as competition. In an upcoming book, I assert that media are embroiled in a full-fledged moral panic over these technologies. The arrival of a machine that can produce no end of fluent prose commodifies the content media produce and robs writers of our special status. This is why I teach that journalists must understand that their value is not resident in the commodity they produce, content, but instead in qualities of authority, credibility, independence, service, and empathy.

As for Question 8 on fair use, I am no lawyer, but it is hard to see how reading and learning from text and images to produce transformative works would not be fair use. I worry that if these activities — indeed, these rights — are restricted for the machine as an agent for users, precedent is set that could restrict use for us all. As a journalist, I fear that by restricting learning sets to viewing only free content, we will end up with a problem parallel to that created by the widespread use of paywalls in news: authoritative, fact-based reporting will be restricted to the privileged few who can and choose to pay for it, leaving too much of public discourse vulnerable to the misinformation, disinformation, and conspiracies available for free, without restriction.

I see another potential use for large language models: to provide researchers and scholars with a window on the presumptions, biases, myths, and misapprehensions reflected in the relationships of all the words analyzed by them — the words of those who had the power and privilege of publishing them. To restrict access skews that vision and potentially harms scholarly uses that have not yet been imagined.

The speculation in Question 9, about requiring affirmative permission for any copyrighted material to be used in training AI models, and in Question 10, regarding collective management organizations or legislatively establishing a compulsory licensing scheme, frightens me. AI companies already offer a voluntary opt-out mechanism, in the model of robots.txt. As media report, many news organizations are availing themselves of that option. To legally require opt-in or licensing sets up unimaginable complications.

Such complication raises the barrier to entry for new and open-source competitors and the spectre of regulatory capture — as does discussion in the EU of restricting open-source AI models (Question 25.1). The best response to the rising power of the already-huge incumbent companies involved in AI is to open the door — not close it — to new competition and open development.

As for Questions 18–21 on copyrightability, I would suggest a different framework for considering both the input and output of generative AI: as an intellectual, cultural, and informational commons, whose use and benefits we cannot not predict. Shouldn’t policy encourage at least a period of development, research, and experimentation?

Finally, permit me to propose another framework for consideration of copyright in this new age in which connected technologies enable collaborative creation and communal distribution. In 2012, I led a series of discussions with multiple stakeholders — media executives, creative artists, policymakers — for a project with the World Economic Forum in Davos on rethinking intellectual property and the support of creativity in the digital age. In the safe space of the mountains, even entertainment executives would concede that copyright law could be considered outmoded and is due for reconsideration. The WEF report is available here.

Out of that work, I conceived of a framework I call “creditright,” which I write about in Geeks Bearing Gifts (CUNY Journalism Press, 2014) and in The Gutenberg Parenthesis (221–2): “This is not the right to copy text but the right to receive credit for contributions to a chain of collaborative inspiration, creation, and recommendation of creative work. Creditright would permit the behaviors we want to encourage to be recognized and rewarded. Those behaviors might include inspiring a work, creating that work, remixing it, collaborating in it, performing it, promoting it. The rewards might be payment or merely credit as its own reward. I didn’t mention blockchain; but the technology and its automated contracts could be useful to record credit and trigger rewards.” I do not pretend that this is a fully thought-through solution, only one idea to spark discussion on alternatives for copyright.

The idea of creditright has some bearing on your Questions 15–17 on transparency and recordkeeping — what might ledgers of credit in creation look like? — though I am trying to make a larger argument about the underpinnings of copyright. As I have come to learn, 1710’s Statute of Anne was not formulated at the urging of — or to protect the rights of — authors, so much as it was in response to the demands of publishers and booksellers, to create a marketplace for creativity as a tradable asset. Said historian Peter Baldwin in The Copyright Wars: Three Centuries of Trans-Atlantic Battle (Princeton University Press, 2016, 53–6): “The booksellers claimed to be supporting authors’ just and natural right to property. But in fact their aim was to take for themselves what nature had supposedly granted their clients.”

I write in my book that the metaphor of creativity as property — of art as artifact rather than an act — “might be appropriate for land, buildings, ships, and tangible possessions, but is it for such intangibles as creativity, inspiration, information, education, and art? Especially once electronics — from broadcast to digital — eliminated the scarcity of the printed page or the theater seat, one need ask whether property is still a valid metaphor for such a nonrivalrous good as culture.”

Around the world, copyright law and doctrine are being mangled to suit the protectionist ends of those lobbying on behalf of incumbent publishers and producers, who remain flummoxed by the challenges and opportunities of technology, of both the internet and now artificial intelligence. In the context of journalism and news, Germany’s Leistungsschutzrecht or ancillary copyright law, Spain’s recently superseded link tax, Australia’s News Media Bargaining Code, the proposed Journalism Competition and Preservation Act in the US, and lately Canada’s C-18 Online News Act do nothing to protect the public’s interest in informed discourse and, in Canada’s case, will end up harming news consumers, journalists, and platforms alike as Facebook and Google are forced to take down links to news.

I urge the Copyright Office to continue its process of study as exemplified by this request for comments and not to rush into the frenzied discussion in media over artificial intelligence, large language models, and generative AI. It is too soon. Too little is known. Too much is at stake.