AI's easy on-ramp has become a costly exit problem for enterprises, says Red Hat

Overview

Red Hat argues that AI's easy on-ramp through frontier model providers has become a costly exit problem as enterprises scale inference. Red Hat's Stephen Watt says companies should start on a frontier model like OpenAI or Anthropic but would be unwise to stay once token economics get expensive at scale. The company positions horizontal cloud, a shared foundation for running workloads, at the center of AI strategy. Red Hat's vLLM Semantic Router directs inference requests to purpose-trained open-weight models to improve accuracy and lower cost.

Key Takeaways

Red Hat says scaling inference turns AI's easy on-ramp into a costly exit problem.
Stephen Watt advises starting on a frontier model but leaving once token economics get expensive at scale.
Horizontal cloud, one shared foundation across storage, compute, and management, is central to the strategy.
Red Hat's vLLM Semantic Router sends inference requests to purpose-trained open-weight models by query type.
One New Zealand Group cut delivery time by 40% and reduced operational costs by 30 to 45% on Red Hat OpenShift.

Stats & Key Facts

#Delivery time cut by 40%
#Operational costs reduced by 30 to 45%
#Red Hat AI 3.4 referenced

AI's easy on-ramp has become a costly exit problem for enterprises, says Red Hat

The exit problem

Scaling inference forces a rethink of infrastructure.

›The cost and complexity of running inference at scale are forcing enterprises to rethink how infrastructure is designed, governed, and sourced.
›The market has become dependent on a small number of frontier model providers, including Anthropic and OpenAI.
›Watt said you would be crazy not to start on a frontier model, but also crazy to stay once you hit a certain scale in token economics.

Watt framed the dilemma as what your options are when you want to leave a frontier provider and how to navigate that transition.

Horizontal cloud

Red Hat positions a shared platform as the answer.

›Horizontal cloud is one shared foundation for running workloads across the enterprise.
›It is a shared layer spanning storage, compute, and management.
›Red Hat AI 3.4 extends model-as-a-service and distributed inferencing capabilities.

Watt spoke with theCUBE at Red Hat Summit 2026 about inference routing, agentic AI governance, and horizontal cloud architecture.

A real-world deployment

One New Zealand Group shows the stakes.

›One New Zealand Group deployed a horizontal telco cloud platform built on Red Hat OpenShift.
›The operator cut delivery time by 40% and reduced operational costs by 30 to 45%.
›Processes that once took weeks or months collapsed into days.

The key enabler was treating the platform as a shared foundation rather than a collection of isolated pilots.

Moving past isolated pilots

Central IT can consolidate scattered experimentation.

›Watt said every department runs its own experimentation and pilots, each in a different way.
›Shared observations emerge once teams exit the pilot phase.
›Central IT can then decide what platform to buy, driving total cost of ownership and efficiency.

vLLM Semantic Router

Routing directs queries to the best model.

›The vLLM Semantic Router gives organizations a mechanism for navigating the transition off a single large model.
›It directs inference requests to purpose-trained open-weight models, such as one tuned for physics and another for history, based on each query.
›This approach improves accuracy while lowering cost.

Watt said it ensures inference requests always go to the highest-performing models and described open source as giving organizations all the ingredients to build a custom solution.

Frequently Asked Questions

What is the costly exit problem?

Red Hat argues that starting on frontier model providers is easy, but scaling inference becomes costly, creating a difficult and expensive exit when token economics grow at scale.

What does Stephen Watt advise?

He advises starting on a frontier model like OpenAI or Anthropic, then leaving once you hit a certain scale in token economics where staying becomes too expensive.

What is horizontal cloud?

It is one shared foundation for running workloads across the enterprise, a single layer spanning storage, compute, and management.

What results did One New Zealand Group report?

On a horizontal telco cloud platform built on Red Hat OpenShift, the operator cut delivery time by 40% and reduced operational costs by 30 to 45%.

What does the vLLM Semantic Router do?

It directs inference requests to purpose-trained open-weight models based on the query, such as one tuned for physics and another for history, improving accuracy while lowering cost.

Red Hat positions horizontal cloud and inference routing as the escape hatch from costly frontier-model dependence.