In 2024 we looked at the possibility of leveraging open weights LLMs for source code analysis.
The answer was clearly negative, as a small code base could easily take 200K tokens, more than any context window offered by open weights models.
The table below summarizes the top LLMs by context window as of today. Context windows significantly increased, compared to an average of 32.000 tokens last year. However, we are still a few orders of magnitude away from being able to feed entire code bases to LLMs.
Model | Tokens | Open Weight |
Gemini 1.5 Pro (Google) | 2.000.000 | No |
GPT 4.1 | 1.000.000 | No |
Claude (Anthropic) | 200.000 | No |
DeepSeek | 128.000 | Yes |
LLAMA 3.1 | 128.000 | Yes |
Later in the year a tool named vulnhuntr was published (https://github.com/protectai/vulnhuntr). The tool approached limited context windows by performing source code analysis from source to sink and providing code blocks of the call chains iteratively in a multi-step process. Initially the LLM is fed code blocks for a set of files (e.g. sources where API entry points are defined). Then it literally asks for code blocks of functions and classes which are required for the analysis.
In addition to the clever approach, a track record of discovered vulnerabilities was published, making the tool even more appealing.
There is one relevant limitation though: only python code bases are supported. It is in fact very difficult to statically determine the call chain from source to sink for non-typed languages.
We decided that it was worth extending the tool to support additional languages. Therefore we created xvulnhuntr (https://github.com/CompassSecurity/xvulnhuntr), a fork of the original project where the ‘x’ stands for extended.

Xvulnuhntr also supports C#, Java and Go. For each language, there is a dedicated tool developed in the corresponding language. Given a repository path and a class or function name in input, the tool returns a json with the file name for the match and the source code of the code block matching the function/class name. This modular approach allows to easily extend support to other typed languages.
In addition, we intentionally put additional effort into making xvulnhuntr easier for developers to contribute to. Compared to the original project, it is possible to run xvulnhuntr against a local test suite, mocking API responses. This provides multiple advantages:
- reproducibility: while LLMs are not deterministic (even with temperature set to zero), mocked responses allow easier debugging
- speed: mocked responses avoid latency from the LLM provider
- cost reduction: no need to waste tokens during development
Next Steps
We primarily focused on the development of the tool. We expect refinements and bugs to come up as the tool is used against a variety of code bases. We are also interested in evaluating how analysis from different LLM providers compare to each other. Finally, we welcome and encourage contributions. Until then, happy LLM-powered hacking!
Cool stuff! Did you also think of / try upstreaming the enhancements instead of creating a fork?