Fair(ly) Use(less)? A New Approach to Assess Copyright Infringement in Machine Learning

Jake Glendenning

Though once only familiar to those in the tech industry, the phrase “machine learning” is now increasingly appearing in new and unexpected places. Those interested in art may find the term familiar, since machine learning is now often used to create new artworks replicating the styles of famous human artists.[1] The same is true for the finance world, as some analysts predict that machine learning could contribute up to $15.7 trillion to the global economy by 2030.[2] A similar transformation is taking place in the transportation,[3] medicine,[4] and insurance industries.[5]

At its core, “machine learning” is a process “that uses statistics to find patterns.”[6] To use machine learning effectively, one must rely on immense quantities of data.[7] Generally, such data is referred to in the context of “data sets.”[8] These data sets contain training data, which are used to create a computational model.[9] The model is then used to generalize to new data so that it can perform tasks such as creation, classification, or prediction.[10] For example, if someone wants to train a model to identify ducks, they could use many images of ducks as their training data so their model could “learn” how to identify them.[11] Such images and other training data, like songs or written works, are often copyrighted.[12] However, though machine learning methods involve copying images, songs, and written works, there have been surprisingly few lawsuits, so far, claiming copyright infringement.[13]

The Fair Use Doctrine

To explain why there have not yet been many high-profile copyright cases involving machine learning, experts have pointed to the “fair use” doctrine. As one scholar put it, fair use has led to a situation where “quietly, invisibly, almost by accident, copyright has concluded that reading by robots doesn’t count.”[14] The author attributes this phenomena to a historically human-centric understanding of copyright, in which things like the “substantial similarity” test, for infringement purposes, is a matter of readers' perceptions of works, rather than inhering in the works themselves.[15] In the words of Judge Learned Hand, a defendant's work infringes on the plaintiff's if “the ordinary observer, unless he set out to detect the disparities, would be disposed to overlook them, and regard their aesthetic appeal as the same.”[16] There, this human-centric understanding of copyright law, over time, has manifested in judicial opinions that hold that mass copying by computers is transformative fair use.[17] To illustrate this point, the author points out two commonalities in many recent copyright cases. First, courts often find fair use when copying is not done for the purpose of using copyrighted material for its expressive content, but to “extract some unprotected, functional, non-expressive information contained within [the works].”[18] Second, courts often consider a use fair when copying is done en masse by machines.[19]

This period of relative quiet may be coming to an end, however. Another scholar recently analyzed two copyright cases and predicted an incoming “loud” and “visibl[e]” push back “against the copyright system’s permissive attitude toward machine copying.”[20] The author observed that Associated Press v. Meltwater U.S. Holdings and Fox News v. TVEyes both dealt with a party that bulk copied content to make it searchable. In both cases, however, the court rejected the argument that bulk copying was fair use, unlike similar preceding cases.

These scholars, among several others, have focused on copyright’s “fair use” doctrine to determine whether it might protect machine learning.[21] While much of the literature on machine learning and copyright has focused on fair use, however, surprisingly little research focuses on the question of whether copies created in the process of machine learning are infringing copies in the first place.

Infringement

Whether or not such copies might constitute infringement is likely subject to a circuit split.[22] The Seventh,[23]Ninth,[24] and D.C.[25] Circuits have held that solely downloading an unauthorized reproduction onto a computer creates an infringing copy.[26] Conversely, the Second[27] and Fourth[28] Circuits have held that more than mere downloading is required to constitute an infringing copy.[29] In particular, the Ninth Circuit’s decision, in Mai Systems, and the Second Circuit’s decision, in Cartoon Network, seem to provide the clearest disagreements.[30] Neither provides a perfectly clear answer when applied to machine learning, however. The decision in Cartoon Network critiques a common interpretation of Mai Systems—that any copy is infringing, even if it is intermediate and quickly deleted after use.[31] It proceeds to provide an interpretation of the Mai Systems decision that does not conflict with its own ruling, but that is inconsistent with many lower courts’ interpretations of Mai Systems, before rendering a decision that scholars believe is more consistent with the Copyright Act.[32] Complicating matters, some argue that Congress amended the Copyright act to endorse Mai Systems,[33] and that Cartoon Network fails to provide a clear rule that courts can apply to future cases. Therefore, it is still unclear which case best reflects the law, and which best accounts for the complexities of machine learning.

Looking Forward

The question of whether copies made during the machine learning process infringe will likely emerge soon, as the fair use doctrine is upheld less often by courts, and as machine learning is used more often, generally. Courts would do well to adopt the logic of Cartoon Network to questions of whether machine learning creates infringing copies. They would do so because Cartoon Network reads the Copyright Act in a way that is more consistent with the Act’s legislative history by finding that copies made for a transitory duration, like those made during the process of machine learning, are not “fixed.”[34] Therefore they do not infringe merely because they copied to a hard drive for a short time.[35] Further, though some argue that Congress adopted the rationale of Mai Systems into the Copyright Act, such an argument ignores the relevant legislative history in which Congress declined to endorse the Mai Systems interpretation.[36]

Courts can either adopt the rationale of Cartoon Network and continue to take machine learning questions on a case-by-case basis, or they can use the question as an opportunity to draw clearer contours around the “transitory duration” requirement. One example of how to draw such contours is provided in an article in which the author argues for an interpretation of Cartoon Network that focuses on the typical characteristics of a given medium to determine whether a work within that medium exists for more than a transitory period.[37] This approach would also apply qualitative questions like whether a work is created automatically in a machine’s operation and deleted automatically after use. In any case, by choosing to apply Cartoon Network, and thereby limiting Mai Systems, the legal field can keep up with the many industries incorporating “machine learning” into their lexicons, and help law adapt alongside technology.

------------------------------------------------------

[1] See, e.g.; The Next Rembrandt, https://www.nextrembrandt.com (last visited Nov. 3, 2020) (describing a project to use machine learning to create new paintings in the style of Rembrandt); Will Knight, This AI-generated Musak Shows Us the Limit of Artificial Creativity, MIT TECH. REV. (Apr. 26, 2019), https://www.technologyreview.com/s/613430/this-ai-generated-musak-shows-us-the-limit-of-artificial-creativity/ (describing a neural network that can reproduce music in the style of famous artists and composers).

[2] See Sizing the prize: PwC’s Global Artificial Intelligence Study: Exploiting the AI Revolution, PwC Global, https://www.pwc.com/gx/en/issues/data-and-analytics/publications/artificial-intelligence-study.html

[3] See Carolyn Said, Yes, you’re seeing more robot cars in San Francisco. Here’s why self-driving is picking up, SF Chronicle (Oct 24, 2020) speedhttps://www.sfchronicle.com/business/article/Self-driving-cars-in-San-Francisco-Cruise-15671419.php.

[4] See Nicola Davis, AI Equal with Human Experts in Medical Diagnosis, Study Finds, THE GUARDIAN (Sept. 24, 2019), https://www.theguardian.com/technology/2019/sep/24/ai-equal-with-human-experts-in-medical-diagnosis-study-finds.

[5] See Jason Pontin, How AI-Driven Insurance Could Reduce Gun Violence, WIRED (Feb. 27, 2018), https://www.wired.com/story/how-ai-driven-insurance-could-reduce-gun-violence/

[6] Will Knight, This AI-Generated Musak Shows Us The Limit Of Artificial Creativity, MIT TECHNOLOGY REVIEW (April 26, 2019),

https://www.technologyreview.com/2018/11/17/103781/what-is-machine-learning-we-drew-you-another-flowchart/

[7] See TRAINING AND TEST SETS: SPLITTING DATA, GOOGLE, https://developers.google.com/machine-learning/crash-course/training-and-test-sets/splitting-data.

[8] Id.

[9] Id.

[10] Ayush Pant, Introduction to Machine Learning for Beginners, TOWARD DATA SCIENCE (Jan. 7, 2019) https://towardsdatascience.com/introduction-to-machine-learning-for-beginners-eed6024fdb08

[11] Id.

[12] See 35 USC § 101 (defining “copies” as “material objects, other than phonorecords, in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device”).

[13] James Grimmelmann, Copyright for Literate Robots, 101 IOWA L. REV. 657, 658 (2016).

[14] Id.

[15] Id. at 658.

[16] Id. (citing Peter Pan Fabrics, Inc. v. Martin Weiner Corp., 274 F.2d 487, 489 (2d Cir. 1960)).

[17] Id. at 660.

[18] Id. at 662.

[19] Id.

[20] Mark A. Lemley and Bryan Casey, Fair Learning (January 30, 2020). Available at SSRN: https://ssrn.com/abstract=3528447 or http://dx.doi.org/10.2139/ssrn.3528447 (citing Fox News Network, LLC v. TVEyes, Inc., 883 F. 3d 169 (2nd Cir. 2018); Associated Press v. Meltwater U.S. Holdings, Inc., 931 F. Supp. 2d 537, 543-44 (S.D.N.Y. 2013).

[21] See 17 U.S.C. § 107.

[22] Jessica L. Gillotte, Copyright Infringement in AI-Generated Artworks, 53 U.C. DAVIS L. REV. 2655, 2672-73 (2020).

[23] See NLFC, Inc. v. Devcom Mid-America, Inc., 45 F.3d 231, 235 (7th Cir. 1995)

[24] See MAI Sys. Corp. v. Peak Comput., Inc., 991 F.2d 511, 518-19 (9th Cir. 1993).

[25] See Stenograph L.L.C. v. Bossard Associates, Inc.

[26] Jessica L. Gillotte, Copyright Infringement in AI-Generated Artworks, 53 U.C. DAVIS L. REV. 2655, 2672-73 (2020).

[27] See Cartoon Network v. CSC Holdings, 536 F.3d 121, 127 (2d Cir. 2008).

[28] See CoStar Grp., Inc. v. LoopNet, Inc., 373 F.3d 544, 551 (4th Cir. 2004))

[29] Jessica L. Gillotte, Copyright Infringement in AI-Generated Artworks, 53 U.C. DAVIS L. REV. 2655, 2673 (2020).

[30] See Aaron Perzanowski, Fixing Ram Copies, 104 Nw. U.L. Rev. 1067 (2010).

[31] Cartoon Network v. CSC Holdings, 536 F.3d 121, 128 (2d Cir. 2008).

[32] Aaron Perzanowski, Fixing Ram Copies, 104 Nw. U.L. Rev. 1067 (2010).

[33] See Jane C. Ginsburg, Copyright Legislation for the “Digital Millennium”, 23 Colum. J.L. & Arts 137, 141 n.14 (1999) (suggesting that § 117(c) endorses the Mai Systems RAM holding).

[34] Aaron Perzanowski, Fixing Ram Copies, 104 Nw. U.L. Rev. 1067, 1076-77 (2010).

[35] Id. at 1077.

[36] Jonathan Band & Jeny Marcinko, A New Perspective on Temporary Copies: The Fourth Circuit's Opinion in Costar v. Loopnet, 2005 Stan. Tech. L. Rev. 1, 18 (noting that Congress declined to endorse the decisions that “determined a RAM copy was a copy for copyright purposes" but "simply acknowledged that the courts had so found").

[37] Id.