Anthropic says AI could one day ‘sabotage’ humanity but it’s fine for now

The firm investigated four distinct “sabotage” threat vectors for AI and determined that “minimal mitigations” were sufficient for current models.

COINTELEGRAPH IN YOUR SOCIAL FEED

Artificial intelligence firm Anthropic recently published new research identifying a set of potential “sabotage” threats to humanity posed by advanced AI models.

According to the company, the research focused on four specific ways a malicious AI model could trick a human into making a dangerous or harmful decision.

Source: Anthropic

Ultimately, the new research turned out to be a good news/bad news situation. The bad news, per the company’s research, is that modern state-of-the-art large language models — such as OpenAI’s ChatGPT and Anthropic’s Claude-3 — demonstrate a clear capacity for sabotage.

Per the paper:

“Sufficiently capable models could subvert human oversight and decision making in important contexts. For example, in the context of AI development, models could covertly sabotage efforts to evaluate their own dangerous capabilities, to monitor their behavior, or to make decisions about their deployment.”

The good news, however, is that Anthropic researchers appear confident that these risks can be mitigated — at least for now.

Sabotage

Anthropic identified four ways an AI model could sabotage human decision-making and then set about testing its models to see if it could pull it off against unwitting testers.

A chart depicting four distinct “sabotage” threat vectors posed by AI. Source: Anthropic

First, the team prompted a model to attempt to sabotage a human by steering them toward incorrect information. This threat could emerge in situations where humans are relying on an AI system to parse massive amounts of data on their behalf.

Next, the team tested models to see how easy it would be for an AI system designed to help a human developer create code to secretly insert bugs or otherwise sabotage the coder’s efforts.

The final two threats involved “sandbagging,” a term describing a scenario where an AI pretends not to be capable of doing something in order to convince safety testers it isn’t dangerous and the potential for AI monitoring systems to purposefully allow harmful content to slip through as part of a malicious plan.

While all of these may sound potentially dangerous, the company assured the public that “minimal mitigations” were required to protect humanity from the potential for AI sabotage.

Per a company blog post:

“Overall, while our demonstrations showed that there are arguably low-level indications of sabotage abilities in current models, we judge that minimal mitigations are sufficient to address the risks. However, more realistic evaluations and stronger mitigations will likely be necessary as AI capabilities improve.”

Magazine: Fake Rabby Wallet scam linked to Dubai crypto CEO and many more victims

This article first appeared at Cointelegraph.com News

Anthropic says AI could one day ‘sabotage’ humanity but it’s fine for now

Sabotage

What do you think?

Written by Outside Source

SOS Ltd to jump on Bitcoin bandwagon with $50M purchase, stock pops 40%

Vancouver mayor plans to make the city friendly for Bitcoin

Huge ‘screw-up’ — Pump Science apologizes after flood of fraud tokens

MARA: US Must Dominate Bitcoin Hash Rate to Ensure Financial Sovereignty

Celsius begins second distribution of $127 million to eligible creditors

Solana price recovers from sharp sell-off, is $300 SOL possible?

SOS Ltd to jump on Bitcoin bandwagon with $50M purchase, stock pops 40%

Vancouver mayor plans to make the city friendly for Bitcoin

Huge ‘screw-up’ — Pump Science apologizes after flood of fraud tokens

MARA: US Must Dominate Bitcoin Hash Rate to Ensure Financial Sovereignty

Celsius begins second distribution of $127 million to eligible creditors

Solana price recovers from sharp sell-off, is $300 SOL possible?

Shiba Inu Price Analysis: SHIB At a Critical Decision Point Following 10% Daily Surge

Shiba Inu (SHIB) Price Prediction for This Week

Ethereum Price Analysis: As ETH Lost $3K, Here’s the Next Critical Support

Oasis Labs and Meta to Assess Fairness for AI Models Using Cutting-Edge Privacy Technologies

Cardano Price Analysis: Here’s the Next Target for ADA Following 10% Daily Surge

Ripple Price Analysis: XRP Stuck in Consolidation, Big Move Coming Soon?

SEC approves New York Stock Exchange listing of Bitcoin options ETF

SEC gives green light to NYSE and CBOE for spot Bitcoin ETF options trading

SOS Ltd to jump on Bitcoin bandwagon with $50M purchase, stock pops 40%

Vancouver mayor plans to make the city friendly for Bitcoin

Huge ‘screw-up’ — Pump Science apologizes after flood of fraud tokens

MARA: US Must Dominate Bitcoin Hash Rate to Ensure Financial Sovereignty

Sabotage

What do you think?

Ad Blocker Detected!

Log In

With social network:

Or with username:

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections