ITU researcher develops software to secure Large Language Models
As Large Language Models (LLMs) are deployed in thousands of applications, the need for scalable evaluation of how models respond to malicious attacks grows rapidly. ITU Associate Professor Leon Derczynski has developed the hitherto most comprehensive tool for the task.
Leon DerczynskiComputer Science DepartmentResearchartificial intelligence
Written 26 September, 2024 11:16 by Theis Duelund Jensen
Large Language Models (LLMs), complex algorithms trained on vast datasets to generate fluent-seeming text, are rapidly transforming how we interact with technology. From chatbots and virtual assistants to content creation and code generation, these powerful AI systems are finding their way into countless applications. However, with this growing ubiquity comes a critical question: how secure are LLMs from malicious attacks?
Associate Professor Leon Derczynski, who works with machine learning and cybersecurity at ITU, is at the forefront of addressing this concern. He has developed Garak, the most comprehensive tool currently available for evaluating LLM vulnerabilities.
The Security Risks of LLMs
"LLMs can behave in ways we don't expect," says Leon Derczynski. "In some cases, this behaviour can be deliberately triggered, creating a security risk."
These vulnerabilities can be exploited by attackers to achieve various malicious goals. Attackers could steal private chat history or other sensitive data stored by LLMs. An attacker might copy the entire LLM, potentially replicating its capabilities for malicious purposes. Malicious code could be injected through the LLM, granting attackers access to the underlying system.
This is where Garak comes into the picture. Developed by Derczynski during his sabbatical at ITU and now maintained with a team at NVIDIA where the researcher is also employed, Garak stands as one of the most advanced tools for LLM security assessment and red teaming. "It's essentially a one-stop shop for testing LLM vulnerabilities," says Leon Derczynski.
"Garak pools hundreds of exploits discovered in online communities and academic literature, allowing for a comprehensive battery of tests with unified reporting."
The significance of Garak lies in its ability to empower developers and organisations using LLMs. "Anyone running a dialogue-based system or LLM can leverage Garak to identify weaknesses and measure their system's vulnerability," says Leon Derczynski.
Raising the Security Poverty Line
Derczynski emphasises the importance of proactive security measures in the LLM landscape. "We aim to raise the 'security poverty line' with Garak," he says, referencing a concept that highlights the need for a baseline level of cybersecurity in all technologies. "Just like any software, LLM producers should strive for a good score with security assessment tools."
Transparency and responsible disclosure are key aspects of the approach. "Garak is entirely open-source, and all the exploits it utilizes have gone through responsible disclosure processes," he clarifies. "We believe that open knowledge empowers everyone to fix vulnerabilities and build a more secure AI ecosystem. If we don't expose vulnerabilities, they won't be fixed. By working together, we can build trust and ensure that LLMs continue to be a force for good."
Leon Derczynski and his colleagues have documented Garak’s results in a scientific paper entitled garak : A Framework for Security Probing Large Language Models which is available here.
Theis Duelund Jensen, Press Officer, phone +45 2555 0447, email