Copyright Infringement by AI Training? Key Takeaways from the Hamburg Court Hearing

It is probably the first case of its kind ever brought to court in Germany, perhaps even in the EU: A photographer is suing the developer of a huge AI training dataset for infringing his copyright in an image crawled from the Internet. This post outlines the underlying facts and summarizes the preliminary and non-binding views of the judges, who prepared the case very thoroughly and provided a detailed analysis of the most important legal aspects.

The case:

The defendant is a non-profit organization that offers a free dataset to commercial and non-commercial licensees. The dataset is based on an earlier Common Crawl dataset consisting of more than five billion online photo URLs and corresponding descriptions. The defendant refined this dataset by downloading all of these photos from their online sources and verifying the accuracy of the descriptions. The refined model does not contain the photos, but only the related links in addition to the validated descriptions. The downloads themselves were deleted after the descriptions were validated.

The plaintiff found out that one of his photos was part of the data set. The relevant URL led to a stock photo platform where the image was freely available, but in a watermarked version. The website of this stock photo platform contained license terms that included a very broad prohibition of any use of the photos except under a license granted by the platform owner; there was no explicit reference to text and data mining.

The plaintiff claims that the defendant’s dataset is likely to be used for commercial GenAI applications that will compete with him in creating images. Therefore, he sued the defendant and demanded that the defendant cease and desist from using his photo (or its URL) in the dataset.

The law:

Works such as photographs are protected by copyright law. Usage without allowance by law or by a license might lead to a copyright breach. However, German and EU copyright laws provide for an exception from copyright protection if works are used for the purposes of text and data mining. In a nutshell, these exceptions mean: Unless the right holder has expressly reserved them by adding a so-called “usage reservation”, everyone is allowed to make reproductions of legally accessible works for text and data mining purposes. However, these reproductions must be deleted when they cease to be necessary for the mining process. For works available online, the right holder’s usage reservation is effective only if it is presented in a machine-readable format. There is a broader right to make reproductions of legally accessible works for text and data mining purposes for – mainly non-profit – scientific purposes. This means that the text and data mining exception applies, even in case the right holder has expressly reserved the general reproduction right for text and data mining purposes.

The legal aspects and the preliminary views of the court:

The case focuses on the question of whether the reproduction of the photo for the purpose of refining and validating the defendant’s dataset was legal under German copyright law, which in this respect is based on EU directives and in particular the 2019 DSM Directive (Directive (EU) 2019/790).

The core of the legal discussion is the so-called text and data mining exception, which is defined in Art. 3 and 4 of the DSM Directive (Sections 44b, 60d of the German Copyright Act). In this respect, the preliminary views of the Court are as follows

– Works do not require a license from the copyright holder to be „lawfully accessible“; a watermarked representation of a photo or thumbnail is lawfully accessible on the Internet.

– Crawling the Internet for the purpose of creating a corpus for an AI model is an „automated analytical technique aimed at analyzing text and data“ and serves the purpose of generating „correlations“ in the present case; however, it may be worth discussing whether the legislature had AI training purposes in mind when it introduced the text and data mining exception in 2019.

– Art. 5 para. 5 of the InfoSoc Directive, which protects the rightholder against an understanding of copyright exceptions that „unreasonably prejudice the legitimate interests of the rightholder“, cannot be invoked against preparatory works where the later use of an AI model is not in the hands of the person performing the crawl.

The Court leans towards an interpretation of the law that would in principle allow crawling for AI training purposes. However, it states that it will focus on whether the license terms are to be understood as an express reservation of rights in machine-readable form. This issue will be critical to the Court’s decision. The Court and the parties weighed the arguments for both a broad understanding of machine-readability (including any text made available on a website) and a narrower understanding focused on a more technical understanding (highlighting robots.txt integrations).

If the Court were to apply a narrower interpretation, the case would shift to the question of whether the defendant is a research organization and could therefore rely on broader text and data mining rights.

In each case, the decision may have a huge impact on the interpretation and scope of the text and data mining exception in EU copyright law. The Court’s decision may not only influence the legal debate in Germany and abroad, but it is likely to lay the foundation for the interplay between creators’ rights and the evolving AI industry. The Court also mentioned that it will not refer the case to the European Court of Justice, but that it will leave this decision to the Higher Regional Court or the German Federal Court.

The Court will deliver its judgment or decide on the next steps on September 27. We will keep you informed!