Overview
CVE-2025-62426 is a medium-severity vulnerability affecting vLLM, an inference and serving engine for large language models (LLMs). Specifically, versions prior to 0.11.1 are susceptible to a denial-of-service (DoS) attack via the /v1/chat/completions and /tokenize endpoints. By crafting malicious chat_template_kwargs request parameters, an attacker can block processing on the API server, effectively delaying all other incoming requests.
Technical Details
The vulnerability stems from insufficient validation of the chat_template_kwargs parameter before its use within the chat template. An attacker can exploit this by providing specially crafted parameters that cause the server to enter a prolonged processing state. This can consume significant server resources and prevent legitimate requests from being handled, leading to a denial-of-service condition.
The affected code snippets are located within the vLLM repository, specifically:
CVSS Analysis
The Common Vulnerability Scoring System (CVSS) score for CVE-2025-62426 is 6.5 (Medium).
Possible Impact
A successful exploitation of this vulnerability can lead to:
- Denial-of-Service: The API server becomes unresponsive, preventing legitimate users from accessing and utilizing the LLM services.
- Resource Exhaustion: The attack can consume significant server resources (CPU, memory) impacting the overall performance of the system.
- Service Disruption: Applications relying on the vLLM API will experience disruptions and failures.
Mitigation or Patch Steps
The vulnerability has been patched in vLLM version 0.11.1. It is strongly recommended to upgrade to this version or a later version as soon as possible.
You can upgrade vLLM using pip:
pip install vllm==0.11.1
Alternatively, pull the latest version from the GitHub repository and rebuild your environment. This will include the fix implemented in this commit.
References
Affected Code in chat_utils.py
Affected Code in serving_engine.py
Patch Commit
Pull Request with Fix
GitHub Security Advisory
