By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
IndebtaIndebta
  • Home
  • News
  • Banking
  • Credit Cards
  • Loans
  • Mortgage
  • Investing
  • Markets
    • Stocks
    • Commodities
    • Crypto
    • Forex
  • Videos
  • More
    • Finance
    • Dept Management
    • Small Business
Notification Show More
Aa
IndebtaIndebta
Aa
  • Banking
  • Credit Cards
  • Loans
  • Dept Management
  • Mortgage
  • Markets
  • Investing
  • Small Business
  • Videos
  • Home
  • News
  • Banking
  • Credit Cards
  • Loans
  • Mortgage
  • Investing
  • Markets
    • Stocks
    • Commodities
    • Crypto
    • Forex
  • Videos
  • More
    • Finance
    • Dept Management
    • Small Business
Follow US
Indebta > News > What generative AI can learn from the primordial swamp
News

What generative AI can learn from the primordial swamp

News Room
Last updated: 2024/08/01 at 8:44 AM
By News Room
Share
6 Min Read
SHARE

Stay informed with free updates

Simply sign up to the Artificial intelligence myFT Digest — delivered directly to your inbox.

First, we learn that generative AI models can “hallucinate”, an elegant way of saying that large language models make stuff up. As ChatGPT itself informed me (in this case reliably), LLMs can generate fake historical events, non-existent people, false scientific theories and imaginary books and articles. Now, researchers tell us that some LLMs might collapse under the weight of their own imperfections. Is this really the wonder technology of our age on which hundreds of billions of dollars have been spent?

In a paper published in Nature last week, a team of researchers explored the dangers of “data pollution” in training AI systems and the risks of model collapse. Having already ingested most of the trillions of human-generated words on the internet, the latest generative AI models are now increasingly reliant on synthetic data created by AI models themselves. However, this bot-generated data can compromise the integrity of the training sets because of the loss of variance and the replication of errors. “We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models,” the authors concluded.

Like the mythical ancient serpent Ouroboros, it seems, these models are eating their own tails. 

Ilia Shumailov, who was the paper’s lead author while a researcher at Oxford university, tells me that the main takeaway from the research is that the rate of development in generative AI is likely to slow as high-quality data becomes more scarce. “The main premise of the paper is that the systems we are currently building will degrade,” he says.

The research company Epoch AI estimates that there are currently 300tn tokens (small units of data) of human-generated public text good enough to be used for training purposes. According to its forecasts, that stock of data might be exhausted by 2028. Then, there will not be enough fresh high-quality human-generated data to feed into the hopper and an over-reliance on synthetic data may become problematic, as the Nature paper suggests.

That does not mean that existing models mostly trained on human-generated data will become useless. Despite their hallucinatory habits, they can still be applied to myriad uses. Indeed, researchers say there may be a first-mover advantage for early LLMs trained on unpolluted data that is now unavailable to next-generation models. Logic suggests that this will also increase the value of fresh, private, human-generated data — publishers take note.

The theoretical dangers of model collapse have been discussed for years and researchers still argue that the discriminate use of synthetic data can be invaluable. Even so, it is clear that AI researchers will have to spend much more time and money on scrubbing their data. One company exploring the best ways of doing so is Hugging Face, the collaborative machine learning platform used by the research community. 

Hugging Face has been creating highly curated training sets including synthetic data. It has also been focusing on small language models in specific domains, such as medicine and science, that are easier to control. “Most researchers despise cleaning the data. But you have to eat your vegetables. At some point, everyone has to dedicate their time to it,” says Anton Lozhkov, a machine learning engineer at Hugging Face.

Although the limitations of generative AI models are becoming more apparent, they are unlikely to derail the AI revolution. Indeed, there may now be renewed focus on adjacent AI research fields, which have been comparatively neglected of late but may lead to new advances. Some generative AI researchers are particularly intrigued by the progress made in embodied AI, as in robots and autonomous vehicles.

When I interviewed the cognitive scientist Alison Gopnik earlier this year, she suggested that it was the roboticists who were really building foundational AI: their systems were not captive on the internet but were venturing into the real world, extracting information from their interactions and adapting their responses as a result.

“That’s the route you’d need to take if you were really trying to design something that was genuinely intelligent,” she suggested.

After all, as Gopnik pointed out, that was exactly how biological intelligence originally emerged from the primordial swamp. Our latest generative AI models may captivate us with their capabilities. But they still have much to learn from the evolution of the most primitive worms and sponges more than half a billion years ago.

[email protected]

Read the full article here

News Room August 1, 2024 August 1, 2024
Share this Article
Facebook Twitter Copy Link Print
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Finance Weekly Newsletter

Join now for the latest news, tips, and analysis about personal finance, credit cards, dept management, and many more from our experts.
Join Now
Why Stocks Are Sinking (Despite Record Earnings Growth)

Watch full video on YouTube

How The Economic Fallout From The Iran War Could Get Worse

Watch full video on YouTube

Osotspa Public Company Limited 2026 Q1 – Results – Earnings Call Presentation (OTCMKTS:OSOPF) 2026-05-19

This article was written byFollowSeeking Alpha's transcripts team is responsible for the…

LIVE: Fed Chair Jerome Powell delivers remarks at Harvard University

Watch full video on YouTube

Why Wall Street Is Investing In Trading Cards

Watch full video on YouTube

- Advertisement -
Ad imageAd image

You Might Also Like

News

Osotspa Public Company Limited 2026 Q1 – Results – Earnings Call Presentation (OTCMKTS:OSOPF) 2026-05-19

By News Room
News

Fidelity International Small Cap Fund Q1 2026 Commentary (FISMX)

By News Room
News

Equinor ASA (EQNR) Shareholder/Analyst Call Prepared Remarks Transcript

By News Room
News

Credit Saison Co., Ltd. 2026 Q4 – Results – Earnings Call Presentation (OTCMKTS:CSASF) 2026-05-16

By News Room
News

ABN AMRO Stock: Cost Cuts And Capital Returns Support A Buy Rating (OTCMKTS:AAVMY)

By News Room
News

ConocoPhillips: More Upside Given Long-Term Cash Flow Tailwinds (NYSE:COP)

By News Room
News

MaxCyte, Inc. (MXCT) Q1 2026 Earnings Call Transcript

By News Room
News

Draganfly Inc. (DPRO) Q1 2026 Earnings Call Transcript

By News Room
Facebook Twitter Pinterest Youtube Instagram
Company
  • Privacy Policy
  • Terms & Conditions
  • Press Release
  • Contact
  • Advertisement
More Info
  • Newsletter
  • Market Data
  • Credit Cards
  • Videos

Sign Up For Free

Subscribe to our newsletter and don't miss out on our programs, webinars and trainings.

I have read and agree to the terms & conditions
Join Community

2023 © Indepta.com. All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?