Red Tape and Reluctance: The case of Europe’s AI lag

18 Oct 2024  –  Written by Federico Dante De Falco

On 1 August 2024, the new European Artificial Intelligence Act (AI Act) entered into force. With this legislation, the European Union (EU) and its Member States are among the first jurisdictions to horizontally regulate so-called ‘high-risk’ AI systems (e.g. those in education and employment), as well as ‘General Purpose AI’ such as large language models used for Generative AI applications.  

Despite its regulatory efforts, Europe seems to be lacking a market for AI and GenAI development. The US and China lead with respectively $248.9 billion and $95.1 billion of total private investment in AI between 2013 and 2022; the first European countries to follow in the list are Germany and France with respectively $7 and $6.6 billion dollars (Stanford AI Index Report 2023). Consequently, major commercial GenAI models like GPT-4, Gemini, Claude, and Llama are all being developed outside of Europe – with few notable exceptions such as Mistral AI.

This article will briefly explore three aspects of Europe’s delay in AI development and how its regulatory and policy initiatives will influence this situation.

 

I. AI Infrastructure: Semiconductors and cloud

The AI supply chain is increasingly complex and multi-layered, going well deeper than the model developers. The OECD identified four macro-layers in the AI chain:

  1. Design of AI chips and processors, such as GPUs;
  2. Fabrication and packaging of these chips;
  3. Computing power providers – or, cloud computing companies – that integrate chips;
  4. AI developers.

While lack of continuous European presence can be observed along the four levels of the AI supply chain (save for cases such as the Dutch ASML when it comes to chips tech), it is with cloud companies and data centres that Europe’s lag becomes evident.

Lehdonvirta, Wu and Hawkins (2024) observed a ‘global compute divide’, where the US and China are home to the most regions hosting cloud services with advanced AI capabilities, while the EU 27 Member States trail behind. This can be attributed to various factors, such as path dependency arising from established industrial clusters (especially in the US) or technology-specific drivers – such as the impracticality of duplicating investments in data centres.

The scarcity of AI-capable public cloud computing in Europe has one immediate observable effect from a policy and regulatory standpoint: it reduces options for EU and national legislators to foster AI development. Although public funding, tax incentives, and R&D support remain viable policy choices, there are fewer options available when it comes to strict regulatory frameworks – which is what the EU legislators arguably had in mind when drafting the AI Act.

For instance, using compute providers as ‘intermediate regulators’ to oversee AI risks, prevent bad actors from training models, and require certain standards to AI developers would be impractical in Europe due to limited territorial jurisdiction over major cloud providers and the international location of data centres.

The result of such limited availability has been the EU legislators’ reliance on the ‘product safety’ approach where AI systems are equated to any other physical product that is to be made available on the EU market, and subject to a set of obligations (e.g. conformity assessment, CE marking, etc) – hoping to trigger a ‘Brussels effect’. While this remains to be seen, it can be argued that the Brussels effect of the AI Act will not be feasible due to its reliance on domestic product safety framework and little international consensus around policy options.

 

II. The use of personal data for training

Large language models used to power GenAI applications are trained on vast amount of data – the larger the training dataset, the more powerful the AI model. The question of where and how to retrieve training data is therefore crucial for AI development.

As training data can be web text, images, videos or scientific articles, it is estimated that – at the current levels of scalability of AI models – developers could run into data scarcity by around 2030.

While the data privacy aspect of AI training is complex and multi-faceted, one issue worth exploring is the use of synthetic data – i.e. artificially generated data that ‘imitates’ real-world data and that can be created using machine learning algorithms.

In Europe the use of synthetic data can be significantly more burdensome than in other regions in light of obligations stemming from the General Data Protection Regulation (GDPR). For example – if synthetic data is generated from personal data, the initial processing of such personal data will likely be subject to GDPR obligations. Moreover, for synthetic data to be outside of the scope of the GDPR, specific anonymisation standards must be met.

While it is still debated to what extent synthetic data are subject to GDPR (see for example Beduschi, 2024), an immediate implication for AI developers is that the use of such datasets will be more costly than in other regions, and surrounded by legal uncertainty. A chilling effect on AI development in Europe could follow.

In this context, the EU AI Act does not provide more certainty as it – rightfully – reliant on the GDPR for data protection aspect.

 

III. AI and Intellectual property (IP)

The AI Act opened a pandora’s box of copyright protection for online material used to train AI models. The recent AI legislation grants rightsholders the possibility to opt their content out from online ‘text and data mining’ activities, including for AI training. As discussed above – GenAI models need to be fed with large datasets during their training phase, with mining techniques allowing the automated scraping of the internet for materials such as books, articles, images.

The AI Act’s regulatory intervention on this matter raises two practical issues: on one hand, it will arguably make training data more difficult to retrieve in the EU territory. This assumption can be observed in the recent streak of litigation between media publishers and AI developers happening globally. Comparatively, US copyright law allows the ‘fair use’ of copyrighted materials, inter alia when unlicensed use does not harm the existing or future market for the copyright owner’s original work (US Copyright Office, 2023). While this more economic approach gives AI developers leeway for text and data mining, enforcement remains with courts which can grant more protection to rightsholders on a case by case basis – and arguably favour license agreements between parties.

On the other hand, the AI Act claims an extraterritorial effect as it applies to training happening anywhere in the world, whenever the output of the AI model is intended for use in the EU. Rather than fostering the right environment for AI development in Europe, this could well lead to ‘regulatory shopping’ – a situation where companies choose to train their models in more favourable jurisdictions (regardless of the above considerations on the availability of AI-capable cloud computing) and possibly litigate infringement warnings with the EU.

Finally, and incidental to the AI and IP issue – it is worth noting how Europe only has one major patent owner in the field of GenAI, the German company Siemens, with 208 total patents published between 2014 and 2023 – compared to the 2074 patents owned by Chinese software company Tencent.

IDRN does not take an institutional position and we encourage a diversity of opinions and perspectives in order to maximise the public good.

Recommended citation:

De Falco, F. D. (2024) Red Tape and Reluctance: The case of Europe’s AI lag, IDRN, 17 October. Available at: https://idrn.eu/red-tape-and-reluctance-the-case-of-europes-ai-lag/ [Accessed dd/mm/yyyy].