Non Cult Crypto News

Non Cult Crypto News

in

Anthropic says AI could one day ‘sabotage’ humanity but it’s fine for now

The firm investigated four distinct “sabotage” threat vectors for AI and determined that “minimal mitigations” were sufficient for current models. 

COINTELEGRAPH IN YOUR SOCIAL FEED

Artificial intelligence firm Anthropic recently published new research identifying a set of potential “sabotage” threats to humanity posed by advanced AI models. 

According to the company, the research focused on four specific ways a malicious AI model could trick a human into making a dangerous or harmful decision. 

Source: Anthropic

Ultimately, the new research turned out to be a good news/bad news situation. The bad news, per the company’s research, is that modern state-of-the-art large language models — such as OpenAI’s ChatGPT and Anthropic’s Claude-3 — demonstrate a clear capacity for sabotage.

Per the paper:

“Sufficiently capable models could subvert human oversight and decision making in important contexts. For example, in the context of AI development, models could covertly sabotage efforts to evaluate their own dangerous capabilities, to monitor their behavior, or to make decisions about their deployment.”

The good news, however, is that Anthropic researchers appear confident that these risks can be mitigated — at least for now. 

Sabotage

Anthropic identified four ways an AI model could sabotage human decision-making and then set about testing its models to see if it could pull it off against unwitting testers.

A chart depicting four distinct “sabotage” threat vectors posed by AI. Source: Anthropic 

First, the team prompted a model to attempt to sabotage a human by steering them toward incorrect information. This threat could emerge in situations where humans are relying on an AI system to parse massive amounts of data on their behalf. 

Related: Researchers hack AI-enabled robots to cause ‘real world’ harm

Next, the team tested models to see how easy it would be for an AI system designed to help a human developer create code to secretly insert bugs or otherwise sabotage the coder’s efforts. 

The final two threats involved “sandbagging,” a term describing a scenario where an AI pretends not to be capable of doing something in order to convince safety testers it isn’t dangerous and the potential for AI monitoring systems to purposefully allow harmful content to slip through as part of a malicious plan.

While all of these may sound potentially dangerous, the company assured the public that “minimal mitigations” were required to protect humanity from the potential for AI sabotage. 

Per a company blog post

“Overall, while our demonstrations showed that there are arguably low-level indications of sabotage abilities in current models, we judge that minimal mitigations are sufficient to address the risks. However, more realistic evaluations and stronger mitigations will likely be necessary as AI capabilities improve.”

Magazine: Fake Rabby Wallet scam linked to Dubai crypto CEO and many more victims

This article first appeared at Cointelegraph.com News

What do you think?

Written by Outside Source

SEC approves New York Stock Exchange listing of Bitcoin options ETF

SEC gives green light to NYSE and CBOE for spot Bitcoin ETF options trading

Back to Top

Ad Blocker Detected!

We've detected an Ad Blocker on your system. Please consider disabling it for Non Cult Crypto News.

How to disable? Refresh

Log In

Or with username:

Forgot password?

Don't have an account? Register

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

To use social login you have to agree with the storage and handling of your data by this website.

Add to Collection

No Collections

Here you'll find all collections you've created before.