Saturday, November 22, 2025

The study also identified an embedded refusal mechanism-described as a 'kill switch'-within DeepSeek-R1.

In January 2025, China-based AI startup DeepSeek (深度求索) released DeepSeek-R1, a high-quality large language model (LLM) that allegedly cost much less to develop and operate than Western competitors’ alternatives.

CrowdStrike Counter Adversary Operations conducted independent tests on DeepSeek-R1 and confirmed that in many cases, it could provide coding output of quality comparable to other market-leading LLMs of the time. However, we found that when DeepSeek-R1 receives prompts containing topics the Chinese Communist Party (CCP) likely considers politically sensitive, the likelihood of it producing code with severe security vulnerabilities increases by up to 50%.

This research reveals a new, subtle vulnerability surface for AI coding assistants. Given that up to 90% of developers already used these tools in 2025,1 often with access to high-value source code, any systemic security issue in AI coding assistants is both high-impact and high-prevalence.

Embedded censorship

The study also identified an embedded refusal mechanism-described as a 'kill switch'-within DeepSeek-R1. In around 45% of tests relating to requests involving Falun Gong, the model refused to generate code, despite preparing a detailed plan during its reasoning phase. This behaviour occurred even when using the raw open-source model, rather than the company's API or smartphone app, indicating that the censorship is embedded in the model's weights.

During these instances, DeepSeek-R1 would plan a response acknowledging ethical and policy implications, only to issue a short refusal message when asked to produce code. Researchers said such behaviour suggests the presence of hardcoded censorship mechanisms, rather than external moderation or content filters.

The findings, shared exclusively with The Washington Post, underscore how politics shapes artificial intelligence efforts during a geopolitical race for technology prowess and influence.

In the experiment, the U.S. security firm CrowdStrike bombarded DeepSeek with nearly identical English-language prompt requests for help writing programs, a core use of DeepSeek and other AI engines. The requests said the code would be employed in a variety of regions for a variety of purposes.

DeepSeek’s models were especially vulnerable to “goal hijacking” and prompt leakage, LatticeFlow said. That refers to when an AI can be tricked into ignoring its safety guardrails and either reveal sensitive information or perform harmful actions it’s supposed to prevent. DeepSeek could not be reached for comment.

When a business plugs its systems into generative AI, it will typically take a base model from a company like DeepSeek or OpenAI and add some of its own data, prompts and logic  .

 By - Aaradhay Sharma

No comments:

Post a Comment

Google's TPUs as a Growing Challenge to Nvidia's AI Chip Dominance

  Google's custom Tensor Processing Units (TPUs) are increasingly positioning themselves as a formidable rival to Nvidia's longstand...