|
Back in the saddle again
Join Date: Oct 2001
Location: Central TX west of Houston
Posts: 58,323
|
Interesting (OK, terrifying) stuff
https://www.bbc.com/news/articles/cpqeng9d20go
short excerpt (the first few ¶)
Quote:
Artificial intelligence (AI) firm Anthropic says testing of its new system revealed it is sometimes willing to pursue "extremely harmful actions" such as attempting to blackmail engineers who say they will remove it.
The firm launched Claude Opus 4 on Thursday, saying it set "new standards for coding, advanced reasoning, and AI agents."
But in an accompanying report, it also acknowledged the AI model was capable of "extreme actions" if it thought its "self-preservation" was threatened.
Such responses were "rare and difficult to elicit", it wrote, but were "nonetheless more common than in earlier models."
Potentially troubling behaviour by AI models is not restricted to Anthropic.
Some experts have warned the potential to manipulate users is a key risk posed by systems made by all firms as they become more capable.
Commenting on X, Aengus Lynch - who describes himself on LinkedIn as an AI safety researcher at Anthropic - wrote: "It's not just Claude.
"We see blackmail across all frontier models - regardless of what goals they're given," he added.
|
https://www.anthropic.com/glasswing
short excerpt (the first few ¶)
Quote:
Today we’re announcing Project Glasswing1, a new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks in an effort to secure the world’s most critical software.
We formed Project Glasswing because of capabilities we’ve observed in a new frontier model trained by Anthropic that we believe could reshape cybersecurity. Claude Mythos2 Preview is a general-purpose, unreleased frontier model that reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.
Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.
As part of Project Glasswing, the launch partners listed above will use Mythos Preview as part of their defensive security work; Anthropic will share what we learn so the whole industry can benefit. We have also extended access to a group of over 40 additional organizations that build or maintain critical software infrastructure so they can use the model to scan and secure both first-party and open-source systems. Anthropic is committing up to $100M in usage credits for Mythos Preview across these efforts, as well as $4M in direct donations to open-source security organizations.
|
I have also heard that some AI systems that have been virtually isolated from a larger environment, but have intentionally broken out and into the larger environments.
__________________
Steve
'08 Boxster RS60 Spyder #0099/1960
- never named a car before, but this is Charlotte.
'88 targa  SOLD 2004 - gone but not forgotten
|
04-10-2026, 09:15 AM
|
|