Elon Musk says all human data for AI training  will be exhausted  by 2024

Started by bosman, 2025-01-09 09:25

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Elon Musk says all human data for AI training  will be exhausted  by 2024. 

Tesla CEO Elon Musk said  last year that all human data  available for AI training, including books, was  exhausted, joining other experts who  have come to similar  conclusions.
Musk, who also owns an AI company, xAI,  made the remarks during a  live chat with Stagwell  CEO Mark Penn, which  aired on  X.
Former OpenAI chief  scientific officer Ilya Sutskever had  previously hinted  at this in December, noting that the AI  industry had reached what he called "peak  data" and  predicting that the lack of training data would force a shift from the way models are developed  today. .
Moving to synthetic  data
According to Musk, the next  option available  for AI  training is now synthetic data, which is data generated by  the AI  itself. "AI is advancing on the hardware  side, and on the software  side it's now moving to synthetic data, because  we've exhausted all human data.  We've literally  exhausted the entire internet,  every book ever written, and  every interesting  video.
"We have exhausted the cumulative  amount of human knowledge  when it comes to AI  training, and  this happened last year. So, the only way to  achieve this is  to use synthetic data, which AI  creates.
"He'll write an essay or come up with a thesis, and then  he'll self-assess and go through this  self-learning process with synthetic  data," Musk  said. Challenges of  Using Synthetic Data
The Tesla CEO, however, noted that using synthetic data to train AI comes with its own challenges,  particularly in  verifying the  accuracy of its  response.
"It's always  a challenge, because how do you know  if the  response is  hallucinatory or real?  Then it's  hard to find the  underlying truth," he  said.
Furthermore, some researchers have also suggested that synthetic data can lead to model collapse, where a model becomes less "creative" and more biased in its  results, ultimately seriously compromising its functionality.
Gartner estimates  that 60% of the data used for AI and  analytics projects  by 2024  will be synthetically generated. Microsoft's Phi-4,  whose source code was  made public Wednesday morning, was trained on synthetic data  in parallel with real data.  The same goes for Google's Gemma  models. Anthropic used synthetic data to develop one of its most  successful systems,  the Claude 3.5 Sonnet, and Meta  refined its  latest Llama  model series  with AI-generated data.

[attachment deleted by admin]