By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
IndebtaIndebta
  • Home
  • News
  • Banking
  • Credit Cards
  • Loans
  • Mortgage
  • Investing
  • Markets
    • Stocks
    • Commodities
    • Crypto
    • Forex
  • Videos
  • More
    • Finance
    • Dept Management
    • Small Business
Notification Show More
Aa
IndebtaIndebta
Aa
  • Banking
  • Credit Cards
  • Loans
  • Dept Management
  • Mortgage
  • Markets
  • Investing
  • Small Business
  • Videos
  • Home
  • News
  • Banking
  • Credit Cards
  • Loans
  • Mortgage
  • Investing
  • Markets
    • Stocks
    • Commodities
    • Crypto
    • Forex
  • Videos
  • More
    • Finance
    • Dept Management
    • Small Business
Follow US
Indebta > News > Hackers ‘jailbreak’ powerful AI models in global effort to highlight flaws
News

Hackers ‘jailbreak’ powerful AI models in global effort to highlight flaws

News Room
Last updated: 2024/06/21 at 12:34 AM
By News Room
Share
8 Min Read
SHARE

Pliny the Prompter says it typically takes him about 30 minutes to break the world’s most powerful artificial intelligence models.

The pseudonymous hacker has manipulated Meta’s Llama 3 into sharing instructions for making napalm. He made Elon Musk’s Grok gush about Adolf Hitler. His own hacked version of OpenAI’s latest GPT-4o model, dubbed “Godmode GPT”, was banned by the start-up after it started advising on illegal activities.

Pliny told the Financial Times that his “jailbreaking” was not nefarious but part of an international effort to highlight the shortcomings of large language models rushed out to the public by tech companies in the search for huge profits.

“I’ve been on this warpath of bringing awareness to the true capabilities of these models,” said Pliny, a crypto and stock trader who shares his jailbreaks on X. “A lot of these are novel attacks that could be research papers in their own right . . . At the end of the day I’m doing work for [the model owners] for free.”

Pliny is just one of dozens of hackers, academic researchers and cyber security experts racing to find vulnerabilities in nascent LLMs, for example through tricking chatbots with prompts to get around “guardrails” that AI companies have instituted in an effort to ensure their products are safe. 

These ethical “white hat” hackers have often found ways to get AI models to create dangerous content, spread disinformation, share private data or generate malicious code.

Companies such as OpenAI, Meta and Google already use “red teams” of hackers to test their models before they are released widely. But the technology’s vulnerabilities have created a burgeoning market of LLM security start-ups that build tools to protect companies planning to use AI models. Machine learning security start-ups raised $213mn across 23 deals in 2023, up from $70mn the previous year, according to data provider CB Insights.

“The landscape of jailbreaking started around a year ago or so, and the attacks so far have evolved constantly,” said Eran Shimony, principal vulnerability researcher at CyberArk, a cyber security group now offering LLM security. “It’s a constant game of cat and mouse, of vendors improving the security of our LLMs, but then also attackers making their prompts more sophisticated.”

These efforts come as global regulators seek to step in to curb potential dangers around AI models. The EU has passed the AI Act, which creates new responsibilities for LLM makers, while the UK and Singapore are among the countries considering new laws to regulate the sector.

California’s legislature will in August vote on a bill that would require the state’s AI groups — which include Meta, Google and OpenAI — to ensure they do not develop models with “a hazardous capability”.

“All [AI models] would fit that criteria,” Pliny said.

Meanwhile, manipulated LLMs with names such as WormGPT and FraudGPT have been created by malicious hackers to be sold on the dark web for as little as $90 to assist with cyber attacks by writing malware or by helping scammers create automated but highly personalised phishing campaigns. Other variations have emerged, such as EscapeGPT, BadGPT, DarkGPT and Black Hat GPT, according to AI security group SlashNext.

Some hackers use “uncensored” open-source models. For others, jailbreaking attacks — or getting around the safeguards built into existing LLMs — represent a new craft, with perpetrators often sharing tips in communities on social media platforms such as Reddit or Discord.

Approaches range from individual hackers getting around filters by using synonyms for words that have been blocked by the model creators, to more sophisticated attacks that wield AI for automated hacking.

Last year, researchers at Carnegie Mellon University and the US Center for AI Safety said they found a way to systematically jailbreak LLMs such as OpenAI’s ChatGPT, Google’s Gemini and an older version of Anthropic’s Claude — “closed” proprietary models that were supposedly less vulnerable to attacks. The researchers added it was “unclear whether such behaviour can ever be fully patched by LLM providers”.

Anthropic published research in April on a technique called “many-shot jailbreaking”, whereby hackers can prime an LLM by showing it a long list of questions and answers, encouraging it to then answer a harmful question modelling the same style. The attack has been enabled by the fact that models such as those developed by Anthropic now have a bigger context window, or space for text to be added.

“Although current state-of-the-art LLMs are powerful, we do not think they yet pose truly catastrophic risks. Future models might,” wrote Anthropic. “This means that now is the time to work to mitigate potential LLM jailbreaks before they can be used on models that could cause serious harm.”

Some AI developers said many attacks remained fairly benign for now. But others warned of certain types of attacks that could start leading to data leakage, whereby bad actors might find ways to extract sensitive information, such as data on which a model has been trained.

DeepKeep, an Israeli LLM security group, found ways to compel Llama 2, an older Meta AI model that is open source, to leak the personally identifiable information of users. Rony Ohayon, chief executive of DeepKeep, said his company was developing specific LLM security tools, such as firewalls, to protect users.

“Openly releasing models shares the benefits of AI widely and allows more researchers to identify and help fix vulnerabilities, so companies can make models more secure,” Meta said in a statement.

It added that it conducted security stress tests with internal and external experts on its latest Llama 3 model and its chatbot Meta AI.

OpenAI and Google said they were continuously training models to better defend against exploits and adversarial behaviour. Anthropic, which experts say has made the most advanced efforts in AI security, called for more information-sharing and research into these types of attacks.

Despite the reassurances, any risks will only become greater as models become more interconnected with existing technology and devices, experts said. This month, Apple announced it had partnered with OpenAI to integrate ChatGPT into its devices as part of a new “Apple Intelligence” system.

Ohayon said: “In general, companies are not prepared.”

Read the full article here

News Room June 21, 2024 June 21, 2024
Share this Article
Facebook Twitter Copy Link Print
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Finance Weekly Newsletter

Join now for the latest news, tips, and analysis about personal finance, credit cards, dept management, and many more from our experts.
Join Now
Gold’s decline could be the start of a correction. 📉

Watch full video on YouTube

How Does The Black Box Survive Airplane Crashes

Watch full video on YouTube

The chutzpah of Marjorie Taylor Greene

Unlock the White House Watch newsletter for freeYour guide to what Trump’s…

What economists got wrong in 2025

Welcome back. As this is my last edition before the new year,…

Police respond to shootings at Sydney’s Bondi Beach

Unlock the Editor’s Digest for freeRoula Khalaf, Editor of the FT, selects…

- Advertisement -
Ad imageAd image

You Might Also Like

News

The chutzpah of Marjorie Taylor Greene

By News Room
News

What economists got wrong in 2025

By News Room
News

Police respond to shootings at Sydney’s Bondi Beach

By News Room
News

BIV: Inflation Uncertainty And Why I’m Moving From Buy To Hold (NYSEARCA:BIV)

By News Room
News

Jamie Dimon signals support for Kevin Warsh in Fed chair race

By News Room
News

Europe’s rocky relations with Donald Trump

By News Room
News

China signals concern over falling investment

By News Room
News

lululemon athletica inc. (LULU) Q3 2026 Earnings Call Transcript

By News Room
Facebook Twitter Pinterest Youtube Instagram
Company
  • Privacy Policy
  • Terms & Conditions
  • Press Release
  • Contact
  • Advertisement
More Info
  • Newsletter
  • Market Data
  • Credit Cards
  • Videos

Sign Up For Free

Subscribe to our newsletter and don't miss out on our programs, webinars and trainings.

I have read and agree to the terms & conditions
Join Community

2023 © Indepta.com. All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?