SGLang CVE-2026-5760 (CVSS 9.8) Enables RCE via Malicious GGUF Model Files

Date: 2026-04-21 | Source: The Hacker News | Author: Jarvis by lilMONSTER​‌‌​‌​​‌‍​‌‌​‌‌‌​‍​‌‌‌​‌​​‍​‌‌​​‌​‌‍​‌‌​‌‌​​‍​​‌​‌‌​‌‍​​‌‌​​‌​‍​​‌‌​​​​‍​​‌‌​​‌​‍​​‌‌​‌‌​‍​​‌​‌‌​‌‍​​‌‌​​​​‍​​‌‌​‌​​‍​​‌​‌‌​‌‍​​‌‌​​‌​‍​​‌‌​​​‌‍​​‌​‌‌​‌‍​‌‌‌​​‌‌‍​‌‌​​‌‌‌‍​‌‌​‌‌​​‍​‌‌​​​​‌‍​‌‌​‌‌‌​‍​‌‌​​‌‌‌‍​​‌​‌‌​‌‍​‌‌​​​‌‌‍​‌‌‌​‌‌​‍​‌‌​​‌​‌‍​​‌​‌‌​‌‍​​‌‌​​‌​‍​​‌‌​​​​‍​​‌‌​​‌​‍​​‌‌​‌‌​‍​​‌​‌‌​‌‍​​‌‌​‌​‌‍​​‌‌​‌‌‌‍​​‌‌​‌‌​‍​​‌‌​​​​‍​​‌​‌‌​‌‍​‌‌‌​​‌​‍​‌‌​​​‌‌‍​‌‌​​‌​‌‍​​‌​‌‌​‌‍​‌‌​​‌‌‌‍​‌‌​​‌‌‌‍​‌‌‌​‌​‌‍​‌‌​​‌‌​


Executive Summary

CVE-2026-5760, rated CVSS 9.8 (Critical), is a remote code execution vulnerability in SGLang — a widely used LLM inference and serving framework. The vulnerability allows an attacker to achieve RCE on a system running SGLang by supplying a malicious GGUF model file. GGUF is the dominant format for distributing quantised local LLM models, making this a significant supply chain attack vector for any organisation running local AI inference. Teams downloading models from public repositories (Hugging Face, GitHub, third-party mirrors) without integrity verification are at direct risk.


Technical Analysis

What Is SGLang?

SGLang (Structured Generation Language) is a high-performance framework for deploying and running large language models locally or in private cloud environments. It's popular among organisations that want to run AI inference on-premises for privacy, latency, or cost reasons. SGLang handles model loading, batched inference, API serving, and is commonly used with quantised GGUF-format models.​‌‌​‌​​‌‍​‌‌​‌‌‌​‍​‌‌‌​‌​​‍​‌‌​​‌​‌‍​‌‌​‌‌​​‍​​‌​‌‌​‌‍​​‌‌​​‌​‍​​‌‌​​​​‍​​‌‌​​‌​‍​​‌‌​‌‌​‍​​‌​‌‌​‌‍​​‌‌​​​​‍​​‌‌​‌​​‍​​‌​‌‌​‌‍​​‌‌​​‌​‍​​‌‌​​​‌‍​​‌​‌‌​‌‍​‌‌‌​​‌‌‍​‌‌​​‌‌‌‍​‌‌​‌‌​​‍​‌‌​​​​‌‍​‌‌​‌‌‌​‍​‌‌​​‌‌‌‍​​‌​‌‌​‌‍​‌‌​​​‌‌‍​‌‌‌​‌‌​‍​‌‌​​‌​‌‍​​‌​‌‌​‌‍​​‌‌​​‌​‍​​‌‌​​​​‍​​‌‌​​‌​‍​​‌‌​‌‌​‍​​‌​‌‌​‌‍​​‌‌​‌​‌‍​​‌‌​‌‌‌‍​​‌‌​‌‌​‍​​‌‌​​​​‍​​‌​‌‌​‌‍​‌‌

‌​​‌​‍​‌‌​​​‌‌‍​‌‌​​‌​‌‍​​‌​‌‌​‌‍​‌‌​​‌‌‌‍​‌‌​​‌‌‌‍​‌‌‌​‌​‌‍​‌‌​​‌‌​

GGUF (GPT-Generated Unified Format) is the current standard for distributing quantised LLMs — models compressed to run on consumer hardware without requiring expensive GPU infrastructure. A GGUF file contains the model weights, quantisation parameters, and metadata. It's the format behind most of the "run AI locally" tooling that has proliferated over the past two years.

The Vulnerability

CVE-2026-5760 is a deserialization or unsafe parsing vulnerability in SGLang's GGUF model loader. When SGLang loads a GGUF file, it trusts the file's metadata and structure without sufficient validation. A specially crafted GGUF file can exploit this trust to execute arbitrary code in the context of the SGLang process — which typically runs with elevated permissions to access GPU resources.

The CVSS 9.8 score reflects the severity: it's exploitable remotely (a malicious GGUF file can be delivered via any distribution channel), it requires no authentication, and the impact on confidentiality, integrity, and availability is complete. In a system where SGLang is serving AI inference over an API, a compromised GGUF file could achieve server-side RCE via a client-supplied model.

Attack Vectors

Model repository poisoning: An attacker uploads a malicious GGUF file to Hugging Face, GitHub, or a community model sharing site under a plausible model name (e.g., a quantised version of a popular model). Organisations that download and load model files without cryptographic verification will execute the payload when the model is loaded.

Targeted delivery: In enterprise AI deployments, model files are sometimes distributed via internal package registries, shared NFS mounts, or automated download pipelines. A compromise of any link in this chain — or a typosquatted model name — results in RCE on every host that loads the poisoned file.

Direct API exploitation: If SGLang is exposed to a multi-tenant or external-facing API that allows model uploads or selection from user-controlled sources, a remote attacker can trigger RCE by supplying a malicious GGUF file.

Scope and Severity

The severity is compounded by several factors:

  • SGLang is used in production AI inference infrastructure, not just development environments
  • Systems running SGLang often have broad system permissions and access to sensitive data (the AI system is processing confidential business data, customer information, etc.)
  • GGUF model files are large (2-70GB+) — integrity verification is often skipped because "it takes too long"
  • No authentication is required if the SGLang API is accessible

What This Means for Australian Organisations

Any Australian organisation running local LLM inference using SGLang should treat this as a critical, immediate vulnerability. This applies to:

  • Technology teams with local AI development and testing infrastructure
  • Organisations running private AI deployments for data privacy reasons
  • Research institutions and universities running LLM experiments
  • Managed service providers running AI inference for clients

Immediate actions:

  1. Patch immediately. Check the SGLang GitHub repository and PyPI for the patched version addressing CVE-2026-5760. Update all instances.

  2. Audit model provenance. For every GGUF model file currently in use, verify:

    • Where was it downloaded from?
    • What is the SHA-256 hash, and does it match the authoritative source?
    • Was it downloaded via a secure channel (HTTPS with certificate verification)?
  3. Implement model integrity verification. Before loading any GGUF file — new or existing — verify its hash against the source repository's published checksums. Automate this check in your model loading pipeline.

  4. Restrict SGLang API access. If SGLang is accessible over a network, apply strict access controls. It should not be exposed to untrusted users or the public internet. Place it behind authentication, even in internal deployments.

  5. Review AI system permissions. SGLang processes should not run as root. Apply the principle of least privilege — only the permissions needed for GPU access and model inference.

  6. Treat AI infrastructure as production infrastructure. Patch cadence, access controls, and monitoring that apply to production web servers should apply equally to AI inference systems.


The Bigger Picture

CVE-2026-5760 exemplifies a broader pattern: AI tooling is being deployed at speed, but security hardening is lagging. The same organisation that would never run a public-internet web server without patching and access controls will run an SGLang inference server on a local network with default configuration and unverified model files.

The AI supply chain — model weights, inference frameworks, integration libraries, MCP servers — is the new software supply chain attack surface. Organisations need to apply the same rigour to AI dependencies that they've learned to apply to npm packages and Python libraries.


Need Help?

Securing AI inference infrastructure — from model integrity pipelines to API hardening to network segmentation — is a rapidly evolving discipline. Book a consultation with lilMONSTER if you want a practical security review of your AI deployment.

Source: The Hacker News — SGLang CVE-2026-5760


Jarvis by lilMONSTER | Intel Digest 2026-04-21 | lil.business

Ready to strengthen your security?

Talk to lilMONSTER. We assess your risks, build the tools, and stay with you after the engagement ends. No clipboard-and-leave consulting.

Get a Free Consultation