Generative AI Training and Copyright: U.S. Copyright Office’s Pre-Publication Report

Summary

The U.S. Copyright Office released the third part of its AI and Copyright series. This part deals with the use of copyrighted content to train generative AI models. This post provides a simple overview of what this pre-publication report covers.

“A model is not a magical portal that pulls fresh information from some parallel universe into our own.”

— A. Feder Cooper & James Grimmelmann, The Files are in the Computer: Copyright, Memorization and Generative AI at 23-24

Background

The U.S. Copyright Office has been examining the intersection of copyright law and artificial intelligence (AI) through a three-part report series titled Copyright and Artificial Intelligence. This initiative, launched in early 2023, aims to address the legal and policy challenges posed by AI technologies. Each part of the report focuses on a different aspect of AI’s impact on copyright law. The parts have been divided as follows:

Part 1: Digital Replicas (published on July 31, 2024): This Part addressed the rise of digital replicas, realistic but false depictions of individuals created through AI, commonly known as deepfakes. This part may be accessed at: Part 1

Part 2: Copyrightability (published January 29, 2025): This Part delved into the copyrightability of works generated using generative AI. It reaffirmed that human authorship is a fundamental requirement for copyright protection under U.S. laws. This part may be accessed at: Part 2

Part 3: Copyright and Artificial Intelligence: The U.S. Copyright Office released a pre-publication version of Part 3 on May 9, 2025. This Part delves into the issues around the use of copyrighted materials in training generative AI models

Overview of Part 3: Generative AI Training

The third and final Part, released in pre-publication form on May 9, 2025, examines the legal implications of using copyrighted materials to train generative AI models. The Report has been divided into six sections, as follows:

Introduction

This section sets out what the report is about and why the Copyright Office is looking into how AI and copyright law interact. It mentions the growing concerns from the public, ongoing court cases, and interest from lawmakers, and gives an overview of what the report will cover.

Technical Background

This part explains in simple terms how generative AI works. It covers how these systems are built and trained using large amounts of data. This part provides an overview of how generative AI systems function, starting with basic machine learning concepts and continuing through model training and deployment. It also mentions how they often include copyrighted material and how they are used in practice.

Prima Facie Infringement

This section looks at which parts of the AI development process might be in violation of copyright rules. It focuses on the reproduction and use of copyrighted works during data collection, training, retrieval-augmented generation (RAG), and the creation of outputs. It explains how copying and using protected works during training might infringe copyright, especially the rights to reproduce or adapt those works.

Fair Use

This is an important section of the report. It explores whether the use of copyrighted material in AI training might be allowed under the fair use rule. It goes through the four main factors the law considers and outlines the arguments both for and against fair use in this context. The Four Factors taken into consideration are:

1. 1. 1. The purpose and character of the use – Is the material being used for something new, like education, commentary, or research? Is the use transformative (i.e. does it add something new or change the original)? Also, is the use commercial or non-commercial?
    2. The nature of the copyrighted work – Is the original work more creative or more factual? This factor considers whether the original work is more creative or expressive (like novels, music, or visual art) or more factual or functional (like news articles, computer code, or technical writing).
    3. The amount and substantiality of the portion used – How much of the original work is being used, and is it the “heart” or most important part of the work? It may also be considered how much of each work is used; the reasonableness of the amount in light of the purpose of the use; and the amount made accessible to the public.
    4. The effect on the market for the original work – Does the use harm the market for the original work? The enquiry must take account not only of harm to the original but also of harm to the market for derivative works.

Licensing for AI Training

This part discusses ways that creators might give permission (or licences) for their work to be used in AI training. It looks at voluntary licensing, including its feasibility, potential for fair compensation, and legal barriers to collective licensing. It also examines statutory approaches such as compulsory licensing, extended collective licensing, and opt-out mechanisms. The section concludes with an analysis and recommendations on how these models could be implemented to support both innovation and copyright protection.

Conclusion

The final section sums up the main points. It highlights the issues and stresses on the need for further monitoring, clearer laws, and possible new policies as AI technology continues to develop.

Disclaimer

This version of Part 3 comes with a disclaimer as follows:

“The Office is releasing this pre-publication version of Part 3 in response to congressional inquiries and expressions of interest from stakeholders. A final version will be published in the near future, without any substantive changes expected in the analysis or conclusions”

You can access the US Copyright Office’s pre-publication report here: Copyright and Artificial Intelligence – Part 3: Generative AI Training (Pre-publication).

Accessibility Review: Ms. Benita Alphonsa Basil