Run a vLLM Server on HF Jobs in One Command
We're on a journey to advance and democratize artificial intelligence through open source and open science. Back to Articles a]:hidden"> Run a vLLM Server on HF Jobs in One Command Published June 26, 2026 Update on GitHub Upvote - Quentin Gallouédec qgallouedec Follow You can spin up a private, OpenAI-compatible LLM endpoint on Hugging Face infrastructure with a single command - no servers to provision, no Kubernetes, pay-per-second. Once it's up, you can query it from your laptop, a notebook, or anywhere else.
Key Takeaways
- It's the quickest way to stand up a model for tests, evals, or batch generation.
(If you're after a managed, production-ready service instead, that's what Inference Endpoints are for - more on when to pick which at the end.
- We'll use as a placeholder for it in the rest of the post.
- In effect, the jobs proxy is your API gate: access is scoped to you (and your org).
That's fine for private use, but treat the URL accordingly: don't share it expecting it to be open, and don't paste your token into untrusted places.
- For large models, H200 flavors are usually the best value.
- Going further: SSH into the running server Need to debug a startup failure, watch GPU memory, or tail logs interactively?
Stats & Key Facts
- #50/hour - check for the full price list and pick the smallest flavor that fits your model.
Prerequisites A payment method or a positive prepaid credit balance (Jobs is billed perâminute by hardware usage). Launch the server is for HF infrastructure. We use the official image, ask for a GPU with , and expose vLLM's port with : routes the container's port through HF's public jobs proxy (see the Serve Models guide for the full reference).
The command prints the URL your server is reachable at: is your job ID. Keep track of it, we'll need it. We'll use as a placeholder for it in the rest of the post.
Give it a couple of minutes to download weights and boot. When the logs show , you're live. Query it from anywhere vLLM speaks the OpenAI API, and every request just needs your HF token as a bearer token.
For more details please read the original article at Hugging Face.
Continue Learning
Comments
Sign in to join the conversation