•  
  •  
 
Vanderbilt Journal of Entertainment & Technology Law

First Page

323

Abstract

Pointing to Authors Guild, Inc. v. Google Inc., Authors Guild, Inc. v. HathiTrust, Sega Enterprises Ltd. v. Accolade, Inc. and other leading technology-driven fair use precedents, artificial intelligence (AI) companies and those who advocate for their interests claim that mass unauthorized reproduction of books, music, photographs, visual art, news articles, and other copyrighted works to train generative AI systems is a fair use of those works. Though acknowledging that works are copied without permission for the training process, the proponents of fair use maintain that an AI machine learns only uncopyrightable information about the works during that process. Once trained, they say, the model does not incorporate or make use of the content of the training works. As such, they contend, copying for the purposes of AI training is a fair use under US law.

This Article challenges the above narrative by examining generative AI training and functionality. Despite wide employment of anthropomorphic terms to describe their behavior, AI machines do not learn or reason as humans do. Instead, they employ an algorithmic process to store the works they are fed during the training process. They do not “know” anything independently of the works on which they are trained, so their output is a function of the copied materials.

More specifically, large language models (LLMs) are trained by breaking textual works down into small segments, or “tokens” (typically individual words or parts of words), and converting the tokens into vectors—numerical representations of the tokens and where they appear in relation to other tokens in the text. The training works do not vanish, as suggested, but instead are encoded, token by token, into the model and relied upon to generate output. AI image generators are trained somewhat differently through a “diffusion” process in which they learn to reconstruct particular training images in conjunction with associated descriptive text. Like an LLM, however, an AI image generator relies on encoded representations of training works to generate its output.

Included in

Law Commons

Share

COinS