Artificial Intelligence (AI) has seen remarkable advancements in recent years, revolutionising various industries and transforming the way we live and work. However, as AI becomes more sophisticated and powerful, concerns regarding its unpredictability and potential dangers have emerged. 

Emad Mostaque, the CEO of Stability AI, one of the UK’s leading AI start-ups, highlights the urgent need for greater control over the training data used for large AI models. 

The current practice of training models like OpenAI’s GPT4 and Google’s LaMDA on the vast expanse of the internet is leading to increasingly unpredictable outcomes, raising questions about the safety and potential risks associated with these technologies.

Unpredictability and the Existential Threat

Mostaque echoes the concerns expressed by experts, emphasising that continuing to train large AI models on vast amounts of uncontrolled internet data could pose an existential threat to humanity. 

Sam Altman, the head of OpenAI, explicitly warned the United States Congress about the potential dangers associated with AI, calling for regulation to mitigate the risks. 

The unpredictable nature of AI technologies, if left unchecked, could lead to unforeseen consequences and a loss of control over AI systems, amplifying the need for responsible development and governance.

Image Source: TechVidvan

Invidious and Dangerous Nature of AI

The headteacher of Epsom College has also expressed apprehension about the invidious and dangerous nature of AI. 

As AI technology becomes more integrated into various aspects of society, including healthcare, finance, and transportation, it becomes paramount to address the potential risks associated with its deployment. 

While AI offers immense potential for progress and innovation, it also brings forth challenges that must be carefully navigated to ensure its benefits outweigh the risks.

The Predicament of Training Data

Stability AI, known for its text-to-image AI products, has faced legal challenges due to its reliance on scraped internet data for training. The company has encountered lawsuits related to copyright infringement, with millions of copyright images leading to legal action against them. 

This raises important questions about ownership and the ethical use of data within AI development. 

Stability AI’s involvement in developing Stable Diffusion, a prominent text-to-image AI, further underscores its commitment to advancing AI technologies and addressing the challenges they present.

Deep Floyd: Advancements and Safety Measures

Stability AI recently introduced Deep Floyd, claiming it to be the most advanced image-generating AI to date. Recognising the importance of AI safety, the senior researcher at Stability AI highlights the necessity of removing illegal, violent, and pornographic images from the training data. 

By doing so, the AI is prevented from “imagining” and potentially recreating harmful or explicit content. However, despite these precautions, the training process still requires a massive dataset of two billion online images, raising concerns about the potential biases and ethical considerations associated with such vast training data.

Respecting Data Ownership and Rights

Stability AI acknowledges the significance of training AI models on datasets that respect people’s rights to their data. However, the company currently faces a lawsuit filed by Getty Images, alleging the unauthorised use of 12 million copyrighted images in their training dataset. 

Stability AI argues that their usage falls under the “fair use” rules, highlighting the complex intersection of copyright laws and AI development. This legal battle sheds light on the need to establish clear guidelines and ethical frameworks for utilising copyrighted materials in AI training processes.

The Web’s Integrity at Stake

The exponential growth of AI-generated content poses challenges to the integrity of the web. Increasingly, AI systems are generating vast amounts of online content, including news reports. This development has given rise to concerns about the proliferation of “fake news” websites generated by AI. 

These sites disseminate deliberately misleading or harmful information, creating a significant risk to the authenticity and trustworthiness of online content. The interplay between human-generated and AI-generated content further exacerbates these concerns, emphasising the importance of responsible AI development and content verification mechanisms.

Reevaluating Data Selection for AI Training

To address the challenges posed by large AI models, Mostaque suggests a reevaluation of the data used for training. He argues that AI models should be trained on data that is more specific to the intended users. 

By tailoring the training data, AI systems can better align with users’ needs and produce more reliable and personalised outcomes. Additionally, Mostaque emphasises the importance of diversifying the locations of AI development beyond California, where the majority of AI development currently takes place. 

It’s felt that this broader perspective can foster a more inclusive and representative approach to AI development and training.

What Could All This Mean for the Future of AI?

The concerns raised regarding the unpredictability and potential dangers of large AI models have significant implications for the future of artificial intelligence. As AI continues to advance and permeate various aspects of society, it becomes essential to address these issues to ensure the safe and responsible development of AI technologies. 

The calls for regulation and open discussions reflect a growing recognition of the need to establish safeguards and ethical guidelines to mitigate the risks associated with AI. 

By proactively controlling and monitoring the training data used for AI models, developers can enhance transparency, accountability, and user-specificity, fostering a more trustworthy and beneficial AI ecosystem. 

As AI becomes an increasingly integral part of our lives, it is crucial to shape its future in a way that prioritises human well-being, protects data rights, and upholds the integrity of the information ecosystem.


The concerns surrounding the unpredictability and potential dangers of large AI models underscore the need for comprehensive regulation and open discussions within the AI community. 

Striking a balance between innovation and safety is crucial to ensure the responsible development and deployment of AI technologies. Efforts to respect data ownership, address copyright concerns, and maintain the integrity of online content are essential to foster trust and reliability in AI systems. 

By reevaluating data selection and focusing on user-specific training, AI can be harnessed to serve the best interests of society while minimising the potential risks associated with its rapid advancement.