Published: 2025-11-21
Overview
A high-severity vulnerability, identified as CVE-2025-62164, has been discovered in vLLM, a popular inference and serving engine for large language models (LLMs). This vulnerability, affecting versions 0.10.2 to before 0.11.1, could allow attackers to cause a denial-of-service (DoS) and potentially achieve remote code execution (RCE) on systems running vulnerable versions of vLLM.
Technical Details
The vulnerability resides in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint utilizes torch.load() to load serialized tensors. Prior to version 0.11.1, insufficient validation of these tensors was performed. A change introduced in PyTorch 2.8.0 disabled sparse tensor integrity checks by default.
This allows a malicious actor to craft specially designed tensors that bypass internal bounds checks during the subsequent call to to_dense(). This circumvention triggers an out-of-bounds memory write, leading to memory corruption. The resulting memory corruption can crash the vLLM server, leading to a denial-of-service condition. More seriously, it presents a potential avenue for remote code execution if an attacker can carefully control the memory corruption process.
CVSS Analysis
- CVE ID: CVE-2025-62164
- Severity: HIGH
- CVSS Score: 8.8
A CVSS score of 8.8 indicates a high-severity vulnerability. The exploitability is relatively high due to the accessibility of the Completions API. The potential impact, including denial-of-service and remote code execution, makes this vulnerability a significant security risk.
Possible Impact
Successful exploitation of CVE-2025-62164 can have severe consequences:
- Denial of Service (DoS): The vLLM server crashes, rendering the LLM inference service unavailable.
- Remote Code Execution (RCE): An attacker gains the ability to execute arbitrary code on the server hosting vLLM, potentially compromising sensitive data and systems.
Mitigation and Patch Steps
The vulnerability has been patched in vLLM version 0.11.1. It is strongly recommended that all users of vLLM versions 0.10.2 through 0.11.0 upgrade to version 0.11.1 or later immediately.
To upgrade vLLM, use your preferred package manager. For example, using pip:
pip install vllm==0.11.1
If upgrading is not immediately possible, consider implementing temporary mitigation measures, such as:
- Restricting access to the Completions API to trusted sources only.
- Implementing additional input validation on prompt embeddings before passing them to
torch.load(). However, this is unlikely to be a complete mitigation.
