By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
IndebtaIndebta
  • Home
  • News
  • Banking
  • Credit Cards
  • Loans
  • Mortgage
  • Investing
  • Markets
    • Stocks
    • Commodities
    • Crypto
    • Forex
  • Videos
  • More
    • Finance
    • Dept Management
    • Small Business
Notification Show More
Aa
IndebtaIndebta
Aa
  • Banking
  • Credit Cards
  • Loans
  • Dept Management
  • Mortgage
  • Markets
  • Investing
  • Small Business
  • Videos
  • Home
  • News
  • Banking
  • Credit Cards
  • Loans
  • Mortgage
  • Investing
  • Markets
    • Stocks
    • Commodities
    • Crypto
    • Forex
  • Videos
  • More
    • Finance
    • Dept Management
    • Small Business
Follow US
Indebta > News > What generative AI can learn from the primordial swamp
News

What generative AI can learn from the primordial swamp

News Room
Last updated: 2024/08/01 at 8:44 AM
By News Room
Share
6 Min Read
SHARE

Stay informed with free updates

Simply sign up to the Artificial intelligence myFT Digest — delivered directly to your inbox.

First, we learn that generative AI models can “hallucinate”, an elegant way of saying that large language models make stuff up. As ChatGPT itself informed me (in this case reliably), LLMs can generate fake historical events, non-existent people, false scientific theories and imaginary books and articles. Now, researchers tell us that some LLMs might collapse under the weight of their own imperfections. Is this really the wonder technology of our age on which hundreds of billions of dollars have been spent?

In a paper published in Nature last week, a team of researchers explored the dangers of “data pollution” in training AI systems and the risks of model collapse. Having already ingested most of the trillions of human-generated words on the internet, the latest generative AI models are now increasingly reliant on synthetic data created by AI models themselves. However, this bot-generated data can compromise the integrity of the training sets because of the loss of variance and the replication of errors. “We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models,” the authors concluded.

Like the mythical ancient serpent Ouroboros, it seems, these models are eating their own tails. 

Ilia Shumailov, who was the paper’s lead author while a researcher at Oxford university, tells me that the main takeaway from the research is that the rate of development in generative AI is likely to slow as high-quality data becomes more scarce. “The main premise of the paper is that the systems we are currently building will degrade,” he says.

The research company Epoch AI estimates that there are currently 300tn tokens (small units of data) of human-generated public text good enough to be used for training purposes. According to its forecasts, that stock of data might be exhausted by 2028. Then, there will not be enough fresh high-quality human-generated data to feed into the hopper and an over-reliance on synthetic data may become problematic, as the Nature paper suggests.

That does not mean that existing models mostly trained on human-generated data will become useless. Despite their hallucinatory habits, they can still be applied to myriad uses. Indeed, researchers say there may be a first-mover advantage for early LLMs trained on unpolluted data that is now unavailable to next-generation models. Logic suggests that this will also increase the value of fresh, private, human-generated data — publishers take note.

The theoretical dangers of model collapse have been discussed for years and researchers still argue that the discriminate use of synthetic data can be invaluable. Even so, it is clear that AI researchers will have to spend much more time and money on scrubbing their data. One company exploring the best ways of doing so is Hugging Face, the collaborative machine learning platform used by the research community. 

Hugging Face has been creating highly curated training sets including synthetic data. It has also been focusing on small language models in specific domains, such as medicine and science, that are easier to control. “Most researchers despise cleaning the data. But you have to eat your vegetables. At some point, everyone has to dedicate their time to it,” says Anton Lozhkov, a machine learning engineer at Hugging Face.

Although the limitations of generative AI models are becoming more apparent, they are unlikely to derail the AI revolution. Indeed, there may now be renewed focus on adjacent AI research fields, which have been comparatively neglected of late but may lead to new advances. Some generative AI researchers are particularly intrigued by the progress made in embodied AI, as in robots and autonomous vehicles.

When I interviewed the cognitive scientist Alison Gopnik earlier this year, she suggested that it was the roboticists who were really building foundational AI: their systems were not captive on the internet but were venturing into the real world, extracting information from their interactions and adapting their responses as a result.

“That’s the route you’d need to take if you were really trying to design something that was genuinely intelligent,” she suggested.

After all, as Gopnik pointed out, that was exactly how biological intelligence originally emerged from the primordial swamp. Our latest generative AI models may captivate us with their capabilities. But they still have much to learn from the evolution of the most primitive worms and sponges more than half a billion years ago.

[email protected]

Read the full article here

News Room August 1, 2024 August 1, 2024
Share this Article
Facebook Twitter Copy Link Print
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Finance Weekly Newsletter

Join now for the latest news, tips, and analysis about personal finance, credit cards, dept management, and many more from our experts.
Join Now
SpaceX weighs June IPO timed to planetary alignment and Elon Musk’s birthday

Unlock the Editor’s Digest for freeRoula Khalaf, Editor of the FT, selects…

Japan’s discount election: why ‘dirt cheap’ shoppers became the key voters

In the bicycle park outside OK supermarket in Tokyo’s Togoshi district, Fumiko…

Michael Burry takes aim at Tesla’s valuation and Musk’s pay package

Watch full video on YouTube

How Boeing Turned Things Around After Years Of Decline

Watch full video on YouTube

Logitech International S.A. (LOGI) Q3 2026 Earnings Call Transcript

FollowPlay Earnings CallPlay Earnings Call Logitech International S.A. (LOGI) Q3 2026 Earnings…

- Advertisement -
Ad imageAd image

You Might Also Like

News

SpaceX weighs June IPO timed to planetary alignment and Elon Musk’s birthday

By News Room
News

Japan’s discount election: why ‘dirt cheap’ shoppers became the key voters

By News Room
News

Logitech International S.A. (LOGI) Q3 2026 Earnings Call Transcript

By News Room
News

US to invest $1.6bn into rare earths group in bid to shore up key minerals

By News Room
News

China probes last two military leaders to have survived previous purges

By News Room
News

Uber Stock: A Platform The Market Still Underestimates (NYSE:UBER)

By News Room
News

Mark Rutte, Europe’s Trump whisperer-in-chief

By News Room
News

Ukraine must give up territory for war to end, Russia insists ahead of talks

By News Room
Facebook Twitter Pinterest Youtube Instagram
Company
  • Privacy Policy
  • Terms & Conditions
  • Press Release
  • Contact
  • Advertisement
More Info
  • Newsletter
  • Market Data
  • Credit Cards
  • Videos

Sign Up For Free

Subscribe to our newsletter and don't miss out on our programs, webinars and trainings.

I have read and agree to the terms & conditions
Join Community

2023 © Indepta.com. All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?