Downstream Implications of AI Authorship | The Columbia Journal of Law & the Arts

Christopher TenEyck

The past few years have seen the rapid growth in popularity of artificial intelligence programs designed to mimic human writing. These programs are increasingly effective. For example, a recent study was conducted in which participants were tasked with distinguishing between human and AI authored articles. Even after being explicitly told that half of the articles in front of them would be written by AI, human subjects had only a 52% success rate of distinguishing which articles were written by AI, only a little better than blind guessing.[1] However, the use of language-based AI models is not only limited to populating websites with skimmable articles. AI is increasingly being used by authors to help produce and hone long-form written work. While most authors who use AI employ it as a brainstorming aid or to produce a high-volume of text that the author can edit and refine, recent months have also seen the proliferation of books that have been written exclusively by AI programs.[2]

Recent court decisions concerning AI have provided clarity about how courts will view AI authorship going forward. For example, a federal court recently determined that an AI program could not be listed as an inventor on a patent application, despite playing a crucial role in the formulation of the patent.[3] Similarly, both the U.S. Patent and Trademark Office and the Copyright Office have repeatedly denied registration applications for AI-generated work because the programs are not human and copyright protection requires human authorship. However, while a robust conversation exists in legal circles about AI authorship and intellectual property protections, there has been a surprising lack of discussion over the serious downstream policy implications that the current jurisprudence regarding AI authorship has created. In short, the current system may lead to a deepening divide between well-established authors, who can easily train AI to write in their style, and less-established authors who will have a more difficult time evoking helpful responses from the same AI programs, and will therefore struggle to match an AI-assisted author’s volume.

To understand this phenomenon, one must first understand the basics of how autoregressive language-based AI models operate. It may be helpful to think of these programs as the next step in a very familiar technology: anyone with a cell phone is familiar with the predictive texting function that helps users compose their messages. This predictive function works by analyzing the text you have already inputted and anticipating what the next word is most likely to be. Autoregressive AI programs work essentially the same way. These programs analyze input (for example, a prompt that the user enters) and produces an output. The output is created based on the training that the AI receives. Training an AI model involves giving the model tokens (small pieces of text) and allowing the program to search the tokens for patterns. Eventually, a well-trained model can effectively predict words that follow a given prompt, and can even capture the individual style of a specific author. For example, a prompt on an autoregressive AI model like GPT-3 could read: “Write an essay, in the style of David Foster Wallace, about a couple breaking up as they walk through Riverside Park.” In order to accomplish this, the program will generate an output based on the style of David Foster Wallace that it has recognized from the text it has been trained with. In fact, an open-source model would probably do well with this prompt. David Foster Wallace was an incredibly prolific writer during his lifetime and produced enough written work to easily train an AI program on his style and linguistic patterns. However, a college student attempting to write their debut model will receive no help from a similar program—or, at the very least, any output that this author receives would be incredibly generic and lacking their signature literary style. This hypothetical author has two choices: hastily produce a large body of work to train an AI program, or accept that AI will not be as helpful to her as it will be to a more established author with a large oeuvre.

Outcomes like the above hypothetical will create a tiered system of authorship where prolific and established writers become more and more prolific and established, and authors who have not yet created a large body of work or achieved commercial success may struggle to break into a market that is already skewed in favor of previously published authors. This reality is also troubling for readers. A market dominated by an upper class of writers receiving heavy, meaningful assistance from AI could drown out new voices with fresh perspectives and lead to int

[1] Tom B. Brown et al., Language Models are Few-Shot Learners, Advances in Neural Info. Processing Sys. 1, 26 (July 22, 2020), https://arxiv.org/pdf/2005.14165.pdf [https://perma.cc/S7P4-EZCQ] [https://web.archive.org/web/20220921182200/https://arxiv.org/pdf/2005.14165.pdf].

[2] Erik Hoel, I Got an Artificial Intelligence to Write My Novel, Electric Literature (July 10, 2021), https://electricliterature.com/i-got-an-artificial-intelligence-to-write-my-novel/ [https://perma.cc/M7PF-8ALT] [https://web.archive.org/web/20220921182200/https%3A%2F%2Felectricliterature.com%2Fi-got-an-artificial-intelligence-to-write-my-novel%2F].

[3] See Thaler v. Vidal, 43 F.4th 1207, 1212 (Fed. Cir. 2022).