By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
IndebtaIndebta
  • Home
  • News
  • Banking
  • Credit Cards
  • Loans
  • Mortgage
  • Investing
  • Markets
    • Stocks
    • Commodities
    • Crypto
    • Forex
  • Videos
  • More
    • Finance
    • Dept Management
    • Small Business
Notification Show More
Aa
IndebtaIndebta
Aa
  • Banking
  • Credit Cards
  • Loans
  • Dept Management
  • Mortgage
  • Markets
  • Investing
  • Small Business
  • Videos
  • Home
  • News
  • Banking
  • Credit Cards
  • Loans
  • Mortgage
  • Investing
  • Markets
    • Stocks
    • Commodities
    • Crypto
    • Forex
  • Videos
  • More
    • Finance
    • Dept Management
    • Small Business
Follow US
Indebta > News > AI start-up Anthropic accused of ‘egregious’ data scraping
News

AI start-up Anthropic accused of ‘egregious’ data scraping

News Room
Last updated: 2024/07/26 at 10:51 PM
By News Room
Share
6 Min Read
SHARE

Unlock the Editor’s Digest for free

Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.

Artificial intelligence start-up Anthropic has been accused of aggressively scraping data from websites to train its systems, potentially breaching publishers’ terms of service in the process, according to those affected.

AI developers rely on ingesting vast quantities of data drawn from a wide variety of sources to create large language models, the technology behind chatbots such as OpenAI’s ChatGPT and Anthropic’s rival, Claude.

Anthropic was founded by a group of former OpenAI researchers on the promise to develop “responsible” AI systems.

However, Matt Barrie, the chief executive of Freelancer.com accused the San Francisco-based company of being “the most aggressive scraper by far” of his portal for freelancers, which has millions of daily visits.

Other web publishers have echoed Barrie’s concerns that Anthropic is swarming their sites and ignoring their instructions to stop collecting their content to train its models.

Freelancer.com received 3.5mn visits from an Anthropic-linked web “crawler” in the space of four hours, according to data shared with the Financial Times. That makes Anthropic “probably about five times the volume of the number two” AI crawler, Barrie said.

The search engines have always done a lot of scraping, but it’s gone up a whole level with training generative AI

Visits from its bot continued to increase even after Freelancer.com attempted to refuse its access requests, using standard web protocols for guiding crawlers, he added. After that, Barrie decided to block traffic from Anthropic’s internet addresses altogether.

“We had to block them because they don’t obey the rules of the internet,” Barrie said. “This is egregious scraping [which] makes the site slower for everyone operating on it and ultimately affects our revenue.”

Anthropic said it was investigating the case and that it respected publishers’ requests and aimed not to be “intrusive or disruptive”.

Scraping publicly available data from across the web is generally legal. But the practice is contentious, can breach websites’ terms of service and can be costly for site hosts.

Kyle Wiens, chief executive of iFixit.com, said his electronic repairs site received 1mn hits from Anthropic bots in the space of 24 hours. “We have a load of alarms [for high traffic], people get woken up at 3am. This set off every alarm we have,” he said.

iFixit’s terms of service prohibited the use of its data for machine learning, said Wiens. “My first message to Anthropic is: if you’re using this to train your model, that’s illegal. My second is: this is not polite internet behaviour. Crawling is an etiquette thing.”

Websites use a protocol known as ‘robots.txt’ to try to keep crawlers and other web robots off portions of their sites. However, it relies on voluntary compliance.

“We respect robots.txt and our crawler respected that signal when iFixit implemented it,” said Anthropic. The company also said its crawlers respected “anti-circumvention technologies” such as CAPTCHAs, and that “our crawling should not be intrusive or disruptive. We aim for minimal disruption by being thoughtful about how quickly we crawl the same domains”.

Data scraping is not a new practice but it has ramped up dramatically in the last two years as a result of the AI arms race. That has imposed new costs on websites.

“AI crawlers have cost us a significant amount of money in bandwidth charges, and caused us to spend a large amount of time dealing with abuse,” wrote Eric Holscher, co-founder of document hosting website Read the Docs in a blog post on Thursday. “AI crawlers are acting in a way that is not respectful to the sites they are crawling, and that is going to cause a backlash against AI crawlers in general,” he added.

Anthropic has created some of the world’s most advanced chatbots — rivalling OpenAI’s ChatGPT — which can respond to an array of prompts in natural language, while positioning itself as a more ethical actor than some rivals. Anthropic’s stated purpose is “the responsible development and maintenance of advanced AI for the long-term benefit of humanity”.

As leading AI companies compete to create evermore capable and dexterous models, they are pushing deeper into untapped corners of the web, partnering with publishers or creating synthetic training data.

OpenAI has struck a number of deals in recent months with publishers and content providers including Reddit, The Atlantic and The Financial Times. Anthropic has not publicly announced similar partnerships.

“The search engines have always done a lot of scraping,” said Barrie, “but it’s gone up a whole level with training generative AI.”

iFixit’s mission “is to give information away”, said Wiens, to encourage people to repair their own. “We’re not opposed to them using our content to train models, we just want to be part of the conversation.”

He added: “I’m not a crusader on this topic, I’m just trying to keep a website online.”

Read the full article here

News Room July 26, 2024 July 26, 2024
Share this Article
Facebook Twitter Copy Link Print
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Finance Weekly Newsletter

Join now for the latest news, tips, and analysis about personal finance, credit cards, dept management, and many more from our experts.
Join Now
Tesla bull Dan Ives talks why he’s still bullish, AT&T COO talks wireless competition

Watch full video on YouTube

Why The U.S. Is Running Out Of Explosives

Watch full video on YouTube

REX American Resources Corporation 2026 Q3 – Results – Earnings Call Presentation (NYSE:REX) 2025-12-05

This article was written byFollowSeeking Alpha's transcripts team is responsible for the…

AI won’t take your job – but someone using it will

Watch full video on YouTube

Could Crypto-Backed Mortgages Put The U.S. Housing Market At Risk?

Watch full video on YouTube

- Advertisement -
Ad imageAd image

You Might Also Like

News

REX American Resources Corporation 2026 Q3 – Results – Earnings Call Presentation (NYSE:REX) 2025-12-05

By News Room
News

Aurubis AG (AIAGY) Q4 2025 Earnings Call Transcript

By News Room
News

A bartenders’ guide to the best cocktails in Washington

By News Room
News

C3.ai, Inc. 2026 Q2 – Results – Earnings Call Presentation (NYSE:AI) 2025-12-03

By News Room
News

Stephen Witt wins FT and Schroders Business Book of the Year

By News Room
News

Verra Mobility Corporation (VRRM) Presents at UBS Global Technology and AI Conference 2025 Transcript

By News Room
News

Zara clothes reappear in Russia despite Inditex’s exit

By News Room
News

U.S. Stocks Stumble: Markets Catch A Cold To Start December

By News Room
Facebook Twitter Pinterest Youtube Instagram
Company
  • Privacy Policy
  • Terms & Conditions
  • Press Release
  • Contact
  • Advertisement
More Info
  • Newsletter
  • Market Data
  • Credit Cards
  • Videos

Sign Up For Free

Subscribe to our newsletter and don't miss out on our programs, webinars and trainings.

I have read and agree to the terms & conditions
Join Community

2023 © Indepta.com. All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?