A New Signal for AI Startups: Internet Content Is No Longer a Free Resource

Generative artificial intelligence has developed rapidly in recent years, and one of the key drivers of that development has been the massive datasets collected from the internet. Texts, books, images, and other digital content have often been used to train AI models under the assumption that public availability also implies permission for their use.

However, an increasing number of disputes between authors, publishers, and technology companies shows that this approach is being rapidly challenged. Creative industries are increasingly insisting that the use of their works for training commercial AI systems requires a clear legal basis and appropriate compensation.

One example of this trend comes from the United States, where a court in California recently granted preliminary approval toa settlement worth approximately $1.5 billion in a dispute between a group of authors and the AI company Anthropic over the alleged use of books to train models without the permission of the rights holders. Although case law in this area is still developing, such cases send a strong signal to the entire market.

For companies developing AI products, this means that the way data is collected and used is becoming just as important as the technological development itself. A dataset is no longer merely a technical resource, but also a legal matter that can affect investments, partnerships, and the long-term sustainability of a product.

This is particularly relevant for AI startups, which often build models using large volumes of heterogeneous data. It is becoming increasingly important to understand the origin of the content used to train systems, as well as the potential obligations toward copyright holders. Current trends suggest that the market may evolve toward content licensing models or other forms of compensation for authors and publishers.

In such an environment, a legal strategy regarding data becomes part of the AI infrastructure itself. As the regulatory framework and case law continue to develop, it is becoming clear that the future of artificial intelligence will depend not only on the quality of algorithms, but also on the lawful and transparent use of the data on which these systems are trained.

The law firm Injac Attorneys, with a particular focus on IT law, artificial intelligence, and digital technologies, closely follows regulatory and judicial trends affecting the development of the AI industry and provides legal support to technology companies and startups in matters related to dataset structuring, copyright, and legal risk management in the development of AI products.

As the regulatory framework and case law continue to evolve, it is clear that the future of artificial intelligence will depend not only on the quality of algorithms, but also on the lawful and transparent access to the data used to train these systems.

The information in this document does not constitute legal advice regarding any specific issue and is provided for general information purposes only.

Share the Post:
Need legal support? Get in touch — our team is here to guide you every step of the way. When the law gets complicated, we make things clear — and get things done.

Email:

inquiry@injac.rs

Tel:

+381 11 2458 945

Address:

Makenzijeva 17,

11000 Belgrade - Serbia

Contact Us: