Back to News Hub
đŸ€—Hugging Face
June 26, 2026
Society & Culture

Run a vLLM Server on HF Jobs in One Command

Overview

We're on a journey to advance and democratize artificial intelligence through open source and open science. Back to Articles a]:hidden"> Run a vLLM Server on HF Jobs in One Command Published June 26, 2026 Update on GitHub Upvote - Quentin Gallouédec qgallouedec Follow You can spin up a private, OpenAI-compatible LLM endpoint on Hugging Face infrastructure with a single command - no servers to provision, no Kubernetes, pay-per-second. Once it's up, you can query it from your laptop, a notebook, or anywhere else.

Key Takeaways

  • It's the quickest way to stand up a model for tests, evals, or batch generation.

    (If you're after a managed, production-ready service instead, that's what Inference Endpoints are for - more on when to pick which at the end.

  • We'll use as a placeholder for it in the rest of the post.
  • In effect, the jobs proxy is your API gate: access is scoped to you (and your org).

    That's fine for private use, but treat the URL accordingly: don't share it expecting it to be open, and don't paste your token into untrusted places.

  • For large models, H200 flavors are usually the best value.
  • Going further: SSH into the running server Need to debug a startup failure, watch GPU memory, or tail logs interactively?

Stats & Key Facts

  • #50/hour - check for the full price list and pick the smallest flavor that fits your model.

Prerequisites A payment method or a positive prepaid credit balance (Jobs is billed per‑minute by hardware usage). Launch the server is for HF infrastructure. We use the official image, ask for a GPU with , and expose vLLM's port with : routes the container's port through HF's public jobs proxy (see the Serve Models guide for the full reference).

The command prints the URL your server is reachable at: is your job ID. Keep track of it, we'll need it. We'll use as a placeholder for it in the rest of the post.

Give it a couple of minutes to download weights and boot. When the logs show , you're live. Query it from anywhere vLLM speaks the OpenAI API, and every request just needs your HF token as a bearer token.

For more details please read the original article at Hugging Face.

Continue Learning

Originally published by Hugging Face
Read the original

Comments

Sign in to join the conversation