In the past, generative AI tools heavily relied on publicly available data sourced from the internet for training purposes. However, there has been a noticeable shift in recent times with data sources becoming more restricted and pushing for licensing agreements. This change has sparked the emergence of new licensing startups aiming to provide a continuous flow of source material for AI applications.
The Dataset Providers Alliance is a trade group consisting of seven AI licensing companies, including prominent names like Rightsify, Pixta, and Calliope Networks. This alliance aims to standardize and promote fairness in the AI industry by advocating for an opt-in system. This means that data can only be utilized with explicit consent from creators and rights holders, a stark contrast to the opt-out approach adopted by some major AI companies.
The push for an opt-in system is not only seen as a more ethical approach but also a pragmatic one. Alex Bestall, CEO of Rightsify and Global Copyright Exchange, stresses the importance of artists and creators being on board with this initiative. Selling publicly available datasets without consent can lead to legal repercussions and damage to a company’s credibility. Ed Newton-Rex of Fairly Trained emphasizes that opt-outs are fundamentally unfair to creators and praises the DPA for championing opt-ins as a more ethical practice.
While the DPA’s efforts to source data ethically are commendable, challenges remain, especially regarding the practicality of the opt-in standard. Shayne Longpre from the Data Provenance Initiative raises concerns about the feasibility of this approach considering the vast amount of data required by modern AI models. He highlights the potential consequences of being data-starved or facing high costs for licensing data, which could limit access to only a few players, such as large tech companies.
The DPA opposes government-mandated licensing and instead advocates for a free-market approach where data originators and AI companies engage in direct negotiations. The alliance also proposes various compensation structures to ensure that creators and rights holders are adequately rewarded for their data. These include subscription-based models, usage-based licensing, and outcome-based licensing tied to profitability. According to Bestall, these compensation structures can be applied across different industries, from music and images to film and books.
The evolution of AI data licensing in the digital age highlights the importance of ethical practices, fair compensation, and transparent partnerships between data providers and AI companies. Collaborative efforts, such as those led by the Dataset Providers Alliance, are crucial in shaping industry standards and promoting responsible data usage in the ever-evolving landscape of artificial intelligence.