EU’s AI Office holds promise for right holders to enforce copyright rules

Copyright holders are pushing the EU's new AI Office to take swift action to ensure that OpenAI's ChatGPT, Dall-E and other general-purpose AI systems respect EU copyright law when they scrape the Internet to train foundation models. Copyright is a major issue with the rise of generative AI, which trains on vast amounts of data — some of which is copyright protected. This tension has sparked lawsuits in the US and the UK.

Keep scrolling for the story, or start your 14-day free trial now for full access to specialist news and analysis on AI regulation from MLex® correspondents around the globe.

19 February 2024
By Matthew Newman, Mike Swift and Jakub Krupa

Copyright holders are pushing the EU's new AI Office to take swift action to ensure that OpenAI's ChatGPT, Dall-E and other general-purpose AI systems respect EU copyright law when they scrape the Internet for vast amounts of content to train foundation models.

The EU’s AI Act, the world’s first comprehensive AI legislation, includes specific measures to ensure that foundation models respect the EU’s 2019 Copyright Directive.

Right holders scored a victory in December, when EU negotiators agreed to include obligations for foundation models to respect copyright law.

The European Commission’s 2021 proposal didn’t include copyright protection measures because the EU executive argued the Copyright Directive already covered them. However, European Parliament members insisted on adding them, particularly with ChatGPT's widespread popularity after its release in November 2022.

Copyright protection has emerged as a key issue around the world, with lawsuits pitting right holders against AI providers making front-page news, including The New York Times and other publishers’ fight with ChatGPT in the US, and Getty Images’ legal tussle with Stability AI in the UK.

Policymakers are trying to balance conflicting interests: how to encourage AI developers and preserve creative industries. While politicians want to promote a technology that promises to boost productivity, attract massive investment and spawn new industries and jobs, they also want to avoid undermining the future of actors, writers, video creators, filmmakers and musicians.

AI Office

The EU’s AI Office, a new unit within the commission’s digital department, will start work on Feb. 21, with a staff of around 100. It will hit the ground running.

Its first urgent task is to draft a code of practice for general-purpose AI models. The code will lay out how providers of those models can comply with their specific obligations, which become applicable only a year after the AI Act formally comes into force — set to happen this summer.

Foundation models must also draft a detailed summary of all content used for training their general-purpose AI models, and make this summary publicly available. This was a key demand of right holders, who want as much detail as possible to enforce the copyright law.

Copyright holders were delighted with the AI Act's transparency obligation, but they view it as only the first step in exercising their rights. They must know which material was used for foundation models' training, so regulators can check that it wasn't unlawfully exploited.

The AI Office’s work is vital for copyright holders because it’s responsible for developing a template for the detailed summary of content used to train foundation models.

Without that summary, the office won’t be able to verify whether AI providers have put in place policies to respect copyright holders’ decisions to reserve their rights, or whether there was any unlawful access to their material.

Text and data mining

The advent of generative AI, which uses machine-learning algorithms that analyze massive amounts of data, poses a particular challenge for copyright holders.

While the EU’s Copyright Directive contains specific measures that allow “text and data mining” — systems that identify patterns and associations from large sets of unrelated data — right holders want to make sure that the AI Act reinforces their right to be excluded from mining.

First, the AI Act requires foundation models to put in place a policy to respect EU copyright law.

Second, they must identify and respect when copyright holders have invoked their “reservation of rights” when they want their material to be excluded from scraping by AI systems.

Under Article 4(3) of the Copyright Directive, non-commercial “research organizations and cultural heritage institutions” can use copyright-protected content without authorization from right holders. The exception covers material that's been lawfully accessed and is meant to encourage scientific research.

However, under Article 4(4) of the directive, copyright holders can prevent the use of their material for text and data mining by asserting through state-of-the-art technologies — such as machine-readable means — the reservation of their rights.

While the AI Office isn’t specifically tasked with coming up with a standard for reserving these rights, copyright holders are pushing the EU to develop a commonly accepted standard, so they can efficiently prevent the scraping of text, images and videos for commercial use.

Enforcement

Among right holders' main victories in the AI Act was that AI systems must respect EU copyright law, no matter where their models have been trained. This “extra-territorial” aspect of the AI Act was essential so that AI companies wouldn’t race to countries with lower copyright standards to train their models and then make them available in the EU.

The AI Office will also be essential in enforcing the new rules, particularly with general-purpose models, the systems that could pose the biggest risks.

Right holders don’t believe that the text- and data-mining provisions apply to the scraping of copyright-protected images, text and videos.

That’s because commercial foundation models aren’t research organizations, and copyright-protected material hasn’t been accessed legally.

“We simply need to get the EU AI Office to say what these companies are doing is clearly a breach of the basic framework of European copyright framework,” John Phelan, director general of the International Confederation of Music Publishers, said in an interview.

This means that copyright holders are counting on another key duty of the AI Office: its role in monitoring foundation models’ compliance with EU copyright law and the publication of their training data summary.

“Article 4 [of the EU Copyright Directive] says you can't text and data mine unless you have lawful access to the content,” Phelan said. “Google, Microsoft and OpenAI may have access to YouTube music, but they are expressly prohibited within our licensing agreements for further usage. They do not have the legal right to text and data mine in Europe for what they’re doing.”

The AI Act has teeth. The regulator can impose fines ranging from 35 million euros ($37.7 million) or 7 percent of global sales if an organization violates the law's provisions.

US lawsuits

The enforcement of copyright law is front of mind in the US as well.

OpenAI, Google, Meta Platforms and Microsoft have faced a wave of lawsuits alleging copyright and privacy violations since the middle of 2023. These cases remain early in the litigation cycle.

The US Congress, however, doesn’t appear anywhere close to passing legislation that would mirror the EU’s AI Act.

For now, important cases have been filed in the Southern District of New York and the Northern District of California. On copyright claims, a few early decisions suggest that for copyright litigation to succeed, plaintiffs will need to show that the outputs — not just the training data — of an AI model contain copyrighted works.

Other suits that press privacy claims against companies including Google, Microsoft and OpenAI have yet to face their first test in a motion to dismiss. But Google and Microsoft and OpenAI have filed motions to dismiss in recent days that are scheduled for oral argument in April and May — suggesting that US judges could hand down their first privacy rulings on AI systems by early summer.

“Plaintiffs imagine they have a property or privacy right in information they shared publicly on the Internet that entitles them to stop anyone from gathering and using such information in ways they don’t like, such as for generative AI,” Google argued in its motion to dismiss litigation filed in US District Court in San Francisco. “But outside copyright law (including its protection for fair use), there is no general right to control publicly available information.”

UK debate

The copyright debate also causes friction in the UK, as it continues its post-Brexit drive to avoid over-regulating and hopes to strike a voluntary agreement between AI developers and rights holders.

The country has aligned itself more with the US's principles-based approach than with the EU's prescriptive rulebook. Ministers have rejected multiple calls for legislation and instead adopted a lax — or "pro-innovation," as they would argue — attitude to attract investment from Big Tech and emerging AI companies.

The UK government has struggled to resolve some of the tensions arising from this approach, despite experts warning that the rapid rise of AI left the copyright legal regime "in tatters" and in need of urgent legal clarity.

Earlier this month, ministers confirmed that months-long talks between developers and right holders to devise a voluntary code of conduct on copyright had collapsed as the two sides could not find a way forward.

The negotiations — hosted by the Intellectual Property Office, and bringing together the likes of DeepMind, Microsoft and Stability AI on one side, and the Alliance for IP, the Publishers Association and the Premier League on the other — turned sour, despite the government's attempts to raise pressure on them by threatening to intervene if no agreement were reached.

Key government ministers insisted they had an idea of how this issue could be resolved and would set out proposals soon, but "did not want to rush" in a bid to get the right balance.

But these delays only aggravate the existing problem, with lawmakers also becoming impatient and urging the government to make up its mind — and ideally, to side with the right holders.

Moreover, the UK’s pending general election puts a question mark over how any future government — likely to be formed by the current opposition party — will look at this issue, only adding to the uncertainty faced by all sides of the debate.

In the meantime, some clarity may come from Getty Images’ copyright lawsuit against Stability AI. In December, the court allowed it to proceed to trial, making it the first case of its kind in UK courts.

For access to breaking news and predictive analysis on AI regulation from our specialist journalists across the world, start your free trial today.

Free 14-day trial