While many governments around the world are still trying to understand how to regulate AI, the challenges regarding its use intensify in the real world. As generative artificial intelligence (AI) systems using language models to create art, music, and diverse content are developed and disseminated, concerns regarding copyright protection grow.
If about a year ago the first legal cases opposing copyright and artificial intelligence emerged, the number of disputes opposing copyright and AI has now multiplied. In the United States, from individuals to big companies are contesting the use of their work by AI companies to train their machines1.
In France, the cultural sector shares the same concerns. On November 17, 80 organizations from the audiovisual, publishing, music, visual arts, and photography sectors have submitted a text to the French government requesting transparency both in the training data and the content generated by generative intelligence models, considering it as an absolutely imperative for the development of an ethical AI.
It appears that the European Parliament and the Council have responded to their demands, as the provisional agreement on the AI Act reached by them last December 9th2 upholds transparency obligations for generative-purposes AI systems, including drawing up technical documentation, complying with EU copyright law and disseminating detailed summaries about the content used for training.
Nevertheless, achieving transparency in AI systems is one of the main questions that remain unanswered in the intersection of intellectual property (IP) rights and AI. In this context, we wonder about the possibility of achieving transparency in AI as requested by the authors and imposed by the European Parliament. If feasible, would such transparency suffice to ensure the advancement of an innovative AI while also safeguarding copyright and other IP rights?
Generative AI systems are large, pre-trained models serving as a starting point for further application in varied domains. Due to their nature, they rely on extensive datasets for training, which often contain great amounts of copyrighted content. Their mode of function is precisely what lies at the core of the conflict between authors and AI developers, also constituting the basis of the French authors’ recent claims.
In this context, in the text recently forwarded to the French government, the authors consider that “only the intangible principle of transparency on both training data and generated content will be able to provide citizens and creators with guarantees that their rights will be respected”. The text also expresses the authors’ “astonishment” at France’s purported recent position regarding the regulation of foundation models.
Amid the European “trilogue” final negotiations to approve the AI Act’s final version, France, Germany, and Italy would have reached an agreement supporting “mandatory self-regulation through codes of conduct” according to a non-official document that Reuters states it has had access to. The document would also outline that the developers of foundation models would need to define model cards to provide information about their machine learning model. In addition, an AI governance body would assist in developing guidelines and check the application of model cards, with no sanctions imposed initially, the document would suggest.
The three countries have indeed engaged in extensive discussions to enhance their cooperation in the AI domain and have “agreed on the need to reduce administrative burdens and simplify European procedures for projects involving several member states”3. However, the specific conditions of such cooperation have not been confirmed. Additionally, while the position adopted by the French, German, and Italian ministers may suggest that they would not be in favor of imposing transparency obligations on copyright, it was not incorporated in the provisional agreement of the AI Act recently settled by the European Parliament and the Council.
In this sense, the provisional agreement maintains the conditions of the AI Act’s text voted in June 2023 by the European Parliament, which not only requires AI providers to inform if the content has been artificially generated or manipulated (under Article 52.3) but also to specify if the data used to feed the AI system is protected by copyright. Generative AI providers are also required to publish a sufficiently detailed summary of uses concerning works protected by copyright under Article 28 ter4.5
As AI progresses, its intersection with IP, in particular copyright, becomes increasingly evident. While the rapport between IP and AI must reciprocal, their overlap can be conflictual, notably concerning the use of copyright-protected content for AI training and the associated lack of transparency.
In this sense, even though the use of copyrighted material in the training of AI systems can be encompassed within the object of a mandatory exception created by the European lawmakers through articles 3 and 4 of Directive 2019/7906, identifying it, and its creators or rightsholders is currently a real issue. Despite their right of opt-out being expressed by the legislator, executing it might be very difficult in the AI context.
The core issue lies in both the unavoidable use of copyrighted data in the training of AI systems and the current substantial obstacles to tracking it, as generative AI models usually require huge and varied amounts of data scraped from the entire web to generate new content7.
If the volume and variety of data used as input overshadows its origin, the very nature of copyright also contributes to the difficulty of fulfilling the authors’ transparency requests, as well as complying with the transparency obligation imposed by art. 28 ter of the European Parliament’s AI Act’s text.
Firstly, as copyright protection may cover a wide array of content, such conditions of transparency would create substantial administrative complexities for foundation model providers, as an enormous amount of content would have to be documented and disclosed. Moreover, as the conditions to evaluate whether copyright protection applies or not are subjective, foundation model providers may not be in the best position to determine if the content used as input to train their machines is protected or not by copyright.
Thus, the feasibility of disclosing training data, whether for technological or legal reasons, remains uncertain. As a result, there may be not a practical way for a copyright owner to verify whether his works have been used in data training for commercial purposes or if his opposition to commercial data training has been effective. This uncertainty questions the effectiveness of imposing transparency obligations on AI developers.
It appears that both the transparency request made by French authors and the transparency obligations posed by the European Parliament and Council may be based on a flawed understanding of the function of AI systems. As these measures are currently very difficult to comply with, there is a concern that they may hinder AI innovation in the EU. That is why more guidance on their implementation is much needed, particularly concerning the interpretation of the concept of a “sufficiently detailed summary”.
As with any disruptive technology, it is normal that the development of AI evokes both apprehensions and high expectations, which is why the concerns expressed by the authors are understandable. However, AI shall not be villainized, rather, it must be seen and used as a tool for fostering our creativity.
In this sense, to favor the interests of authors and copyright holders, a viable solution may be found in establishing agreements between AI developers, collective management societies, cultural and entertainment companies, and the authors or copyright owners themselves – similar to the recent collective agreement signed the Writers Guild of America (WGA) and Hollywood producers securing screenwriter’s rights in the face of generative AI use in the audiovisual productions8.
These agreements would regulate the use of AI in the creation of cultural products and should encompass a fair remuneration due to the authors and/or copyright owners, according to the reality and specific needs of each cultural sector. We believe that these measures can foster innovation while also encouraging our authors to continue creating, thus promoting a harmonious coexistence of creation and innovation.
1 For example, in October 2023, Universal Music, ABKCO, and Concord Publishing filed a lawsuit in Tennesse federal court against the artificial intelligence company Anthropic, accusing it of misusing a vast number of copyrighted song lyrics to train its chatbot Claude. In September 2023, the Authors Guild (a group representing a large number of authors in the U.S.), and authors like George RR Martin and John Grisham, the authors of the novels on which the TV show “Game of Thrones” is based, filed a lawsuit against OpenAI in the Southern District of New York claiming copyright infringement for the non-authorized use of their works to train its language model, ChatGPT.
2 After months of negotiations among the European Commission, the EU Council, and the European Parliament (the “trilogue”), a provisional agreement on the AI Act was finally reached last Saturday, December 9th, 2023. For further information about the AI Act, please check our previous article : Artificial Intelligence : Joe Biden’s executive order vs. the European AI Act – Common Goals?
3 As stated in the joint press release from France, Germany, and Italy from October 30 2023.
4 We clarify that we refer to the articles of the text approved in June by the European Parliament. Please note that the text agreed upon by the Parliament and the Council still needs to be formally adopted by both Parliament and Council to become an EU Regulation.
5 Please see "File Regulation on AI" and "MEPs ready to negotiate first-ever rules for safe and transparent AI".
6 Articles 3 and 4 of Directive 2019/790 of the European Parliament and Council on copyright and related rights in the digital single market (adopted on April 17, 2019) created an exception on reproducing content protected by copyright for “text and data mining” purposes, which concerns collecting data to transform them. Article 3, already existing in French law, is a mandatory academic exception, benefiting research bodies and cultural heritage institutions conducting mining for scientific research purposes, against which right holders cannot object. Article 4, transposed into French law through the Ordonnance No. 2021-1518 of November 24, 2021, extends the exception for alluses, regardless of the purpose (including commercial), provided that the copyright holder has not expressed opposition.
7 Except for limited and specific systems, whose scope is limited and the training of data is based on specific data.
8 On September 27, 2023, the WGA concluded a preliminary agreement with the producers of the Alliance of Motion Picture and Television Producers (AMPTP) following a 148-day strike. This accord includes key commitments from the producers: ensuring that screenwriters are not replaced by AI, permitting authors to use AI with employer approval without diminishing their final compensation; prohibiting AI developers from using scenarios authored by unionized writers to train their robots; and scheduling biannual meetings with the WGA to discuss AI utilization in film development and production. A parallel agreement was also reached with actors in early November 2023, specifically aimed at preventing the use of their images for training AI systems.