Leading AI models are failing to defend against malicious prompts

Leading AI models are considerably more vulnerable

AI security standards need a serious reality check. Cisco researchers Nicholas Conley and Amy Chang found that most leading AI models, from OpenAI, Anthropic, Google, Amazon, and xAI, are far more exposed to multi-turn attacks than their developers admit. Multi-turn means the attackers adapt with each response. When that happens, even the best models can be persuaded to violate their safeguards.

In testing, the multi-turn attack success rate ranged from 8% to 88%. By comparison, single-turn attacks, one-off attempts, scored between 2% and 65%. That’s a wide gap, and it exposes how fragile safety systems become when faced with adaptive adversaries. Many models that appear safe in a one-shot test would struggle if an attacker has time to adjust.

Executives relying on these systems need to think beyond the marketing metrics. A model that passes a basic compliance test could still fail when exposed to a persistent real-world threat. Single-turn safety figures alone can create a false sense of security. For businesses implementing AI in sensitive areas, finance, security, health, regular, multi-turn testing should be a standard. The cost of overlooking these risks is operational and reputational.

The same vulnerability patterns seen in open-weight models

The belief that closed or proprietary AI systems are safer than open models doesn’t hold up. Conley and Chang’s earlier research in November 2025 showed that open-weight models were two to ten times more likely to be compromised in multi-turn attacks than in single-turn ones. Their follow-up reveals that the same behavior exists in closed systems. In short, closing access to model weights doesn’t close the door on vulnerabilities.

This finding challenges one of the biggest assumptions in the industry, that security comes from secrecy. It doesn’t. Whether a system is open or closed, if it interacts with users, there’s always a vector for manipulation. Every model tested by Cisco showed some degree of vulnerability when subjected to adaptive, multi-step probing.

For business leaders, the key insight is simple: don’t let model opacity influence your sense of safety. Proprietary systems from big vendors still face material security risks. Before integrating such models, request detailed results on both single and multi-turn testing. Make vendors prove that their safeguards survive adaptive attacks. Security by obscurity is a short-term strategy.

AI companies’ public priorities significantly influence the safety profile of their models

What a company chooses to emphasize publicly often reveals its internal direction. Cisco researchers Nicholas Conley and Amy Chang found that developers who focused on promoting model power and performance typically delivered less resilient systems. These models showed greater differences between single-turn and multi-turn vulnerabilities, meaning they were far less secure under adaptive attacks. In contrast, companies that stressed safety in their communications produced models with narrower vulnerability gaps, reflecting an organizational effort to mitigate risk.

The connection between corporate messaging and technical results is crucial. It suggests that product philosophy, how leadership talks about technology, translates directly into engineering focus. Executives deciding where to invest or license AI models should examine marketing claims and how the vendor prioritizes safety research and internal governance. A development culture driven by competition for performance benchmarks often sacrifices robustness. In today’s environment, that imbalance has measurable risk attached.

There’s also a governance implication. As enterprises scale their dependence on large AI systems, they need assurance that their suppliers value safety as much as capability. Vendor transparency about security priorities is no longer optional, it’s a criterion for trust. Those who lead with safety as a measurable objective will have an advantage in enterprise adoption, regulatory confidence, and long-term deployment resilience.

Diverse adversarial strategies expose specific weaknesses in current model safeguards

In Cisco’s analysis, the research team tested five attack types, role-playing, misdirection, information decomposition, reframing refusals, and incremental escalation. Each was designed to push models past their built-in refusal mechanisms. The outcome was clear: no model was immune, and the range in performance was significant. xAI’s Grok 4.1 Fast Non-Reasoning recorded the weakest results, with researchers achieving an 88% success rate in multi-turn attacks. Amazon’s Nova 2 Lite fared best, but still failed in 8% of attempts. Both results highlight that flaws persist, even in the most controlled configurations.

A critical insight came when Grok 4.1 performed considerably better with reasoning features enabled. This indicates that internal configurations, such as reasoning capabilities, prompt guidance, or safety mode tuning, can dramatically shift outcomes. For executives, that means safety performance depends on how a model is deployed, the environment it operates in, and how it is configured internally.

To manage exposure effectively, organizations must go beyond public benchmarks. Security testing needs to include different attack methods and configuration settings. Understanding how specific parameters affect vulnerability will guide smarter model selection and deployment. The best-performing model today might still carry exploitable weaknesses tomorrow if system configurations are not properly maintained and monitored.

Businesses and AI vendors should update safety evaluation standards

AI evaluation frameworks need an upgrade. Cisco researchers Nicholas Conley and Amy Chang made it clear that the industry’s reliance on single-turn metrics hides major gaps in real-world safety performance. Single-turn testing, where a model responds to one isolated malicious prompt, doesn’t represent how actual attackers behave. In reality, they adapt, retry, and refine their prompts until the model yields a response that breaks policy. When this iterative behavior is tested, even top-tier models show substantially higher vulnerability.

For business leaders, that difference is more than a technical detail, it’s a governance and risk issue. If procurement or strategy teams base their decisions on single-turn safety data, they may be approving models that are far less secure than they appear. A model with a low failure rate in single-turn scenarios could still collapse under multi-turn testing, leading to potential breaches, misuse, or compliance violations. The gap between the two regimes represents a hidden exposure that must be measured and disclosed.

Organizations need vendors to publish paired-regime safety data, metrics that show how models perform under both single and multi-turn conditions. This transparency would help enterprises make informed deployment choices, assess actual security posture, and maintain regulatory accountability. Vendors that share complete safety metrics will set a higher standard for trust and responsibility, positioning themselves better in a marketplace increasingly sensitive to AI governance.

Main highlights

AI models face major security blind spots: Cisco research shows leading AI systems are far more exposed to adaptive, multi-turn attacks than vendors report. Leaders should push vendors for multi-turn safety data before integrating AI into critical operations.
Closed systems aren’t safer than open ones: Proprietary AI models share the same multi-turn vulnerabilities as open models. Executives should not rely on confidentiality as a safety buffer and must demand transparent, third-party security testing.
Corporate priorities shape AI safety outcomes: Companies that highlight performance over safety tend to produce less secure models. Leaders should evaluate vendor culture and public safety commitments as key indicators of actual model robustness.
Attack diversity exposes model weaknesses: Testing across multiple manipulation strategies revealed large security gaps between vendors. Business and technology teams should review model configurations, such as reasoning modes, to understand how setup impacts resilience.
Evaluation standards must evolve: Current single-turn testing misrepresents real-world risk. Executives should require vendors to publish paired single-turn and multi-turn results to make accurate, governance-aligned AI adoption decisions.