Text and Data Mining: Hamburg Court Propagates Broad Understanding of Machine-Readability

1. Oktober 2024 Alexander Tribess

It was the first AI training case ever brought to court in Germany, perhaps even in Europe! The Hamburg court has ruled that web scraping for AI training purposes falls within the scope of the copyright exception for text and data mining (Articles 3, 4 of Directive (EU) 2019/790). However, the AI industry is not out of the woods yet, as the court propagates a very broad understanding of the machine readability of the copyright holder’s reservation of rights. Read more about the Court’s ruling in our blog.

Background

In this case, a photographer sued a non-profit organization for infringement of his copyright in an image that was scraped from an online platform and then used to refine a massive dataset of images and associated verbal descriptions. This dataset was made available for free and used to train generative AI applications. The defendant invoked Articles 3 and 4 of Directive (EU) 2019/790 (§§ 44b, 60d of the German Copyright Act (UrhG)). These provisions provide an exception to the rights of copyright holders when works are used for text and data mining. (For more information on the case and the legal background, see our previous blog).

The Court’s Ruling

In its judgment of 27 September 2024, the Hamburg court sided with the defendant, rejecting the claims. It holds that the defendant must be considered a research organization, and therefore is allowed to reproduce works to which it has lawful access in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter.

The court clarified that “scientific research” should not be interpreted narrowly. It is sufficient that the activity is aimed at acquiring knowledge eventually, even if the immediate step does not result in new knowledge. Thus, creating a dataset for AI training qualifies as scientific research.

Moreover, the court determined that the defendant’s activities constituted “scientific research”, despite the dataset being used for commercial AI applications and that at least some of the defendant’s chairs and active members were involved in the AI industry.

The Court Taking a Broader Look

Apart from these specifics of the case, the court took the opportunity to express its views on the narrower exception for text and data mining for non-scientific purposes (Article 4 of Directive (EU) 2019/790). In particular, it issued an obiter dictum regarding the right of the copyright holder to reserve the right to text and data mining for itself. Where works are publicly available online, such a reservation must be machine-readable.

There has been an ongoing debate amongst legal scholars and practitioners about this provision since its enactment. The majority tends towards a narrow interpretation in the sense that machine-readability means a computer command (e.g. by implementing robots.txt). The Hamburg court, however, propagates a broader understanding and argues that mere license texts should suffice to satisfy the machine-readability requirement, even if those license texts do not explicitly mention text and data mining.

Discussion of the Ruling’s Shortcomings

This view has not been decisive in the present case. Still, it may motivate other courts to follow this example in future cases. However, in my personal opinion, the arguments presented by the Hamburg court are not convincing and include a false interpretation of the underlying EU laws:

The court states that machine-readability must be interpreted in view of the technical means available at the time of the alleged infringement. Hence, machine-readability is a dynamic process, and the prerequisites may differ over time. This is a valid foundation for the interpretation of the law.

However, the court then refers to Article 53 para. 1 lit. c of the AI Act, which imposes an obligation on providers of general-purpose AI models to “put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790”. And it concludes that „state-of-the-art technologies” would include AI tools that are capable of identifying and “understanding” license texts.

This assumption may be correct. However, it cannot serve as an argument in the context of Article 4 para. 3 of Directive (EU) 2019/790. First, because users of text and data mining techniques are not necessarily providers of general-purpose AI models. In the same judgment, the court explains that a clear distinction must be made between (a) the creation of a dataset that can also be used to train the AI, (b) the subsequent training of the artificial neural network with this dataset, and (c) the subsequent use of the trained AI to create new content. Unfortunately, the court conflates these three steps when it uses provisions as an argument for step (a) that are clearly designed for and directed at players at a later stage. Secondly, this reference does not work because text and data mining is not necessarily related to general-purpose AI models but can be used as a preparation for many other purposes. Article 53 para. 1 lit. c of the AI Act is a special provision with which the EU legislator intends to address the fact that text and data mining is the basis for most general-purpose AI models. The provision emphasizes that providers of such models must take special measures due to the inherent risk. Therefore, Art. 53 para. 1 lit. c of the AI Act implies the exact opposite of what the Hamburg court found: In principle, state-of-the-art technologies are not the benchmark for machine-readability within the meaning of Article 4 para. 3 of Directive (EU) 2019/790, but only for providers of general-purpose AI models.

Key Takeaways and Next Steps

  • In the first ever German case dealing with the copyright text and data mining exception in the field of AI, the Hamburg Court has rejected the claim.
  • The court held that the defendant was a research organization and that it did have the right to use the plaintiff’s work without seeking consent.
  • As an obiter dictum, the court also expressed its views with regard to the machine-readability of the rightsholder’s reservation of text and data mining, thereby propagating a broad understanding of the term.
  • The court’s main argument, though, is not convincing because the court conflates the different steps in the AI training process, and it falsely applies EU laws.

The plaintiff has already announced that he is considering an appeal, in which case the Hamburg Higher Regional Court may have to weigh the facts and arguments again.