Back to Blog

Securing Enterprise Data in the Face of GitHub Copilot Vulnerabilities

Prompt Security Team
January 1, 2025
AI coding assistants pose risks to user data. Organizations must understand these risks and identify the best solutions to mitigate them.

AI coding assistants pose risks to user data. Organizations must understand these risks and identify the best solutions to mitigate them.

AI coding assistants help developers generate and analyze software code and configuration. Built on foundation models that have been fine-tuned for code, they provide developers with code completions and suggestions, allowing them to create, fix, and refactor code as needed.

AI coding assistants have made code development more efficient, but they depend on the quality of their inputs. When inputs contain confidential information or become compromised, they can pose significant risks to organizations. This is true for all such assistants, including GitHub Copilot, which is the most widely used option.

Since its public release in 2022, GitHub Copilot has quickly become a key tool in the developers’ technology stack: According to GitHub, as of 2024, over 80,000 organizations worldwide use their Copilot feature.

Key Risks Associated with GitHub Copilot and Other AI Coding Assistants

There are many security risks associated with GitHub Copilot and other AI coding assistants (Amazon Q, Tabnine, JetBrains AI Assistant, Codeium and others.) Alongside IP violation, training data poisoning, and adversarial prompting, Gartner has designated vulnerable output (where coding assistants feed vulnerable code back to developers) and sensitive data leakage (where coding assistants leak sensitive data) as especially notable.

Top Five Security Risks of AI Coding Assistants sorted by impact (source: Gartner, 2024)

Notable incidents related to GitHub Copilot

Embedding hidden [malicious or risky] instructions within GitHub Copilot suggestions using invisible Unicode characters

A few months ago, there was a research piece reporting an example of vulnerable output in GitHub Copilot for Business whereby invisible characters (initially introduced to code in open-source repositories, where most coding prompts originate) sabotage the coding assistant’s outputs.  

Invisible characters are present in code but are not represented on text editor display screens. Their explicit purpose is often limited to formatting and spacing, but notably, they can affect code's behavior. A bad actor can use them to hide harmful instructions in portions of the codebase that go on to constitute GitHub Copilot prompts. This, in turn, jeopardizes the code suggestions that GitHub Copilot feeds back to developers.

A serious risk associated with such vulnerabilities is that a compromised codebase can prevent GitHub Copilot from operating effectively. More troubling, though, is the fact that compromised code can evade detection as developers unknowingly spread it far and wide. Compromised outputs can reach and damage numerous files and projects. By the time an organization spots the problem, there is no telling how many of its developers will have used the code or how many other GitHub Copilot outputs will have become compromised.

GitHub Copilot can expose sensitive information it has (unintentionally) been trained on

The threat posed by secret leakage is arguably even more insidious, where GitHub Copilot can leak organization’s sensitive data after it has been trained on it, both unbeknownst to GitHub Copilot, and even more worryingly, to the organization itself.

In May, researchers at the Chinese University of Hong Kong (CUHK) uncovered that GitHub Copilot is susceptible to secret leakage. They found that attackers can induce it to provide responses that include users’ secrets – information that the coding assistant was (unintentionally) trained on. 

To show this vulnerability, the researchers built an algorithm that generates prompts designed to extract secrets from the coding assistant. The algorithm redacted hard-coded secrets from publicly available GitHub code and requested that GitHub Copilot suggest new strings of characters for the redacted parts, inducing Copilot to disclose the original credentials.

If, for instance, a developer hardcodes an API key in one of their scripts, it may become prone to exposure once it makes its way into GitHub Copilot’s training. Moreover, once the key becomes baked into GitHub Copilot’s training data, identifying and removing the secret from GitHub’s repository is no longer sufficient to prevent a leak from GitHub Copilot.

Not just any security solution will do

These vulnerabilities constitute challenges to security, data privacy and governance  for nearly all industries, particularly in health, finance, and other fields where failure to comply with data protection standards can have severe consequences.

Maintaining codebase security can help enterprises mitigate AI coding assistant risks, but this is easier said than done. Open-source repositories are getting larger and developers continue to draw source code from them. The challenge of keeping one’s house in order, so to speak, is only becoming more complex.

Many solutions review code for vulnerabilities, but most do so at infrequent intervals. For GenAI Security to be meticulous, it must guard against malicious code in real time and at any stage. This means screening both the inputs (from one’s codebase) and the outputs from GitHub Copilot and other coding assistants.

Addressing these risks requires masking and anonymizing data, supporting privacy policies, preventing sensitive data from becoming inputs to GitHub Copilot, and reviewing API calls suggested by Copilot for compliance with best practices and industry standards. 

How does Prompt Security mitigate risks and vulnerabilities in GitHub Copilot

Prompt Security does all this and more. A lightweight endpoint agent that syncs with developers’ IDEs (VScode, IntelliJ, Netbeans, etc.), Prompt for Developers provides full visibility into and governance over AI coding assistants, ensuring no secrets or PII get shared outside the organization, and inspecting prompt responses to ensure they’re safe and compliant too. It works at latencies as low as a few milliseconds and integrates with GitHub Copilot's chat feature.

Prompt Security addresses secret leakage by redacting and sanitizing code in real time. To reduce the risk of vulnerable output, Prompt Security analyzes all responses from GitHub Copilot, blocking generated code that it identifies as potentially hazardous or containing vulnerabilities. 

Beyond GitHub Copilot, Prompt Security supports a variety of AI coding assistants, including Codeium, Tabnine, Amazon Q Developer, Blackbox, GitLab Duo, Supermaven, Sourcegraph Cody and more.

Prompt Security enables proactive monitoring of AI usage and policy enforcement throughout development cycles, helping to identify potential privacy violations and enforce data privacy policies.

Let's talk security for AI Coding Assistants.

---------------------------------------------------------------------

Sources

Share this post