By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
IndebtaIndebta
  • Home
  • News
  • Banking
  • Credit Cards
  • Loans
  • Mortgage
  • Investing
  • Markets
    • Stocks
    • Commodities
    • Crypto
    • Forex
  • Videos
  • More
    • Finance
    • Dept Management
    • Small Business
Notification Show More
Aa
IndebtaIndebta
Aa
  • Banking
  • Credit Cards
  • Loans
  • Dept Management
  • Mortgage
  • Markets
  • Investing
  • Small Business
  • Videos
  • Home
  • News
  • Banking
  • Credit Cards
  • Loans
  • Mortgage
  • Investing
  • Markets
    • Stocks
    • Commodities
    • Crypto
    • Forex
  • Videos
  • More
    • Finance
    • Dept Management
    • Small Business
Follow US
Indebta > News > What generative AI can learn from the primordial swamp
News

What generative AI can learn from the primordial swamp

News Room
Last updated: 2024/08/01 at 8:44 AM
By News Room
Share
6 Min Read
SHARE

Stay informed with free updates

Simply sign up to the Artificial intelligence myFT Digest — delivered directly to your inbox.

First, we learn that generative AI models can “hallucinate”, an elegant way of saying that large language models make stuff up. As ChatGPT itself informed me (in this case reliably), LLMs can generate fake historical events, non-existent people, false scientific theories and imaginary books and articles. Now, researchers tell us that some LLMs might collapse under the weight of their own imperfections. Is this really the wonder technology of our age on which hundreds of billions of dollars have been spent?

In a paper published in Nature last week, a team of researchers explored the dangers of “data pollution” in training AI systems and the risks of model collapse. Having already ingested most of the trillions of human-generated words on the internet, the latest generative AI models are now increasingly reliant on synthetic data created by AI models themselves. However, this bot-generated data can compromise the integrity of the training sets because of the loss of variance and the replication of errors. “We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models,” the authors concluded.

Like the mythical ancient serpent Ouroboros, it seems, these models are eating their own tails. 

Ilia Shumailov, who was the paper’s lead author while a researcher at Oxford university, tells me that the main takeaway from the research is that the rate of development in generative AI is likely to slow as high-quality data becomes more scarce. “The main premise of the paper is that the systems we are currently building will degrade,” he says.

The research company Epoch AI estimates that there are currently 300tn tokens (small units of data) of human-generated public text good enough to be used for training purposes. According to its forecasts, that stock of data might be exhausted by 2028. Then, there will not be enough fresh high-quality human-generated data to feed into the hopper and an over-reliance on synthetic data may become problematic, as the Nature paper suggests.

That does not mean that existing models mostly trained on human-generated data will become useless. Despite their hallucinatory habits, they can still be applied to myriad uses. Indeed, researchers say there may be a first-mover advantage for early LLMs trained on unpolluted data that is now unavailable to next-generation models. Logic suggests that this will also increase the value of fresh, private, human-generated data — publishers take note.

The theoretical dangers of model collapse have been discussed for years and researchers still argue that the discriminate use of synthetic data can be invaluable. Even so, it is clear that AI researchers will have to spend much more time and money on scrubbing their data. One company exploring the best ways of doing so is Hugging Face, the collaborative machine learning platform used by the research community. 

Hugging Face has been creating highly curated training sets including synthetic data. It has also been focusing on small language models in specific domains, such as medicine and science, that are easier to control. “Most researchers despise cleaning the data. But you have to eat your vegetables. At some point, everyone has to dedicate their time to it,” says Anton Lozhkov, a machine learning engineer at Hugging Face.

Although the limitations of generative AI models are becoming more apparent, they are unlikely to derail the AI revolution. Indeed, there may now be renewed focus on adjacent AI research fields, which have been comparatively neglected of late but may lead to new advances. Some generative AI researchers are particularly intrigued by the progress made in embodied AI, as in robots and autonomous vehicles.

When I interviewed the cognitive scientist Alison Gopnik earlier this year, she suggested that it was the roboticists who were really building foundational AI: their systems were not captive on the internet but were venturing into the real world, extracting information from their interactions and adapting their responses as a result.

“That’s the route you’d need to take if you were really trying to design something that was genuinely intelligent,” she suggested.

After all, as Gopnik pointed out, that was exactly how biological intelligence originally emerged from the primordial swamp. Our latest generative AI models may captivate us with their capabilities. But they still have much to learn from the evolution of the most primitive worms and sponges more than half a billion years ago.

[email protected]

Read the full article here

News Room August 1, 2024 August 1, 2024
Share this Article
Facebook Twitter Copy Link Print
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Finance Weekly Newsletter

Join now for the latest news, tips, and analysis about personal finance, credit cards, dept management, and many more from our experts.
Join Now
Tesla bull Dan Ives talks why he’s still bullish, AT&T COO talks wireless competition

Watch full video on YouTube

Why The U.S. Is Running Out Of Explosives

Watch full video on YouTube

REX American Resources Corporation 2026 Q3 – Results – Earnings Call Presentation (NYSE:REX) 2025-12-05

This article was written byFollowSeeking Alpha's transcripts team is responsible for the…

AI won’t take your job – but someone using it will

Watch full video on YouTube

Could Crypto-Backed Mortgages Put The U.S. Housing Market At Risk?

Watch full video on YouTube

- Advertisement -
Ad imageAd image

You Might Also Like

News

REX American Resources Corporation 2026 Q3 – Results – Earnings Call Presentation (NYSE:REX) 2025-12-05

By News Room
News

Aurubis AG (AIAGY) Q4 2025 Earnings Call Transcript

By News Room
News

A bartenders’ guide to the best cocktails in Washington

By News Room
News

C3.ai, Inc. 2026 Q2 – Results – Earnings Call Presentation (NYSE:AI) 2025-12-03

By News Room
News

Stephen Witt wins FT and Schroders Business Book of the Year

By News Room
News

Verra Mobility Corporation (VRRM) Presents at UBS Global Technology and AI Conference 2025 Transcript

By News Room
News

Zara clothes reappear in Russia despite Inditex’s exit

By News Room
News

U.S. Stocks Stumble: Markets Catch A Cold To Start December

By News Room
Facebook Twitter Pinterest Youtube Instagram
Company
  • Privacy Policy
  • Terms & Conditions
  • Press Release
  • Contact
  • Advertisement
More Info
  • Newsletter
  • Market Data
  • Credit Cards
  • Videos

Sign Up For Free

Subscribe to our newsletter and don't miss out on our programs, webinars and trainings.

I have read and agree to the terms & conditions
Join Community

2023 © Indepta.com. All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?