YouTube Transcripts Without Permission to Train AI Models Title: YouTube Transcripts Without Permission to Train AI Models

According to an investigation by Proof News, various notable AI companies have used thousands of YouTube videos to train AI models, despite YouTube’s strict rules for sourcing materials without permission.

This dataset is created by EleutherAI, a non-profit company, which is part of a compilation called The Pile that doesn’t include any images or videos, but only transcripts, which are free to access. Thus, it is being used by Apple, NVIDIA, Anthropic, and other notable companies.

It included more than 170,000 transcripts from YouTube from over 48,000 channels, including various creators, such as MrBeast, Marques Brownlee, PewDiePie, The Wall Street Journal, BBC, Jimmy Kimmel Live, and more.

Representatives at EleutherAI are refusing to comment on the allegations of using content without consent.

Apple, Nvidia, and Salesforce have mentioned in their research papers that they use Pile to train AI, alongside Anthropic. Anthropic and Salesforce have commented on these allegations, but NVIDIA, Apple, Databricks, and Bloomberg have remained silent on the matter.

Along with these companies, OpenAI was also found using YouTube videos without consent. However, the company representatives neither confirmed nor denied the allegations.

The concerning part of this situation is that the creators whose work is being misused are unaware of this fact.

Dave Farina, the host of Professor Dave Explains, has expressed his opinions, “If you’re profiting off of work that I’ve done [to build a product] that will put me out of work or people like me out of work, then there needs to be a conversation on the table about compensation or some kind of regulation.”

This is causing uncertainty among YouTubers that, in the future, AI will be able to generate similar content, parroting the creator, i.e., parrot.

Akriti Rana

Tech Journalist

Investigation Reveals Apple, NVIDIA, Anthropic, and More Used Thousands of YouTube Transcripts Without Permission to Train AI Models