New Server Hopes to Break Through AI's "Memory Wall"
Majestic Labs is developing a new AI server called Prometheus, which aims to overcome the memory limitations faced by large language models (LLMs). With up to 128 terabytes of memory and a unique architecture that prioritizes DRAM, Prometheus seeks to enhance AI inference performance significantly.
Key Takeaways
- Prometheus features up to 128 terabytes of memory, significantly surpassing Nvidia's DGX B300 server.
- The server's architecture uses a proprietary memory interface with miniature copper cables to improve memory access.
- Majestic Labs' Ignite AI processing unit combines ARM and RISC-V cores for efficient LLM processing.
- Prometheus is designed to support existing AI frameworks like PyTorch and Triton without requiring code changes.
- The server's design is modular, allowing for customization based on memory needs.
Stats & Key Facts
- #128 terabytes of memory in Prometheus
- #25.6 terabytes per second memory bandwidth
- #120 kilowatts power draw per server rack

The Challenge of Memory in AI
Memory constraints are a significant barrier to the performance of large language models.
- ›LLM token generation is a memory-bound task, limiting text output rates.
- ›As model sizes increase, the memory bottleneck becomes more pronounced.
Modern AI large language models (LLMs) face a critical challenge known as the 'memory wall.' This term describes the limitations imposed by memory access speeds on the ability of these models to generate text efficiently. As the size of these models grows, the impact of this bottleneck becomes increasingly severe, hindering overall performance.
Introducing Prometheus: A New AI Server
Majestic Labs is addressing the memory wall with its innovative server design.
- ›Prometheus offers a staggering 128 terabytes of memory, far exceeding competitors.
- ›The server's design focuses on a DRAM-centric architecture.
Majestic Labs is developing the Prometheus server, which aims to break through the memory wall that limits LLM performance. With up to 128 terabytes of memory, Prometheus is set to provide a significant advantage over existing solutions like Nvidia's DGX B300 server. This drastic increase in memory is expected to enhance the efficiency of LLM inference.
Innovative Memory Architecture
The architecture of Prometheus sets it apart from traditional AI servers.
- ›Majestic Labs utilizes LPDDR6 DRAM in a unified architecture.
- ›A proprietary memory interface allows for greater memory placement flexibility.
Prometheus employs a unique architecture that focuses on dynamic random access memory (DRAM), specifically LPDDR6. Unlike Nvidia's servers, which rely on high-bandwidth memory (HBM) for model weights, Prometheus's design is centered around DRAM, enabling larger memory pools. The proprietary memory interface, constructed with miniature copper cables, enhances memory access over longer distances, allowing for more extensive memory configurations.
AI Acceleration with Ignite
Prometheus features a custom processing unit to enhance AI performance.
- ›The Ignite chip combines ARM and RISC-V cores for optimized processing.
- ›It supports multiple aspects of LLM inference on a single die.
To complement its impressive memory capabilities, Prometheus includes the Ignite AI processing unit, which serves as the server's compute engine. This custom chip integrates both ARM application cores and RISC-V vector and tensor cores, allowing for efficient processing of LLM tasks without the need for inter-processor communication. This design aims to streamline the inference process and improve overall performance.
Compatibility and User Adoption
Majestic Labs is focused on ensuring easy integration for users.
- ›Prometheus supports popular AI frameworks like PyTorch and Triton.
- ›Existing models can run without code modifications.
Majestic Labs recognizes the importance of software compatibility in the adoption of new hardware. Prometheus is designed to work seamlessly with established AI frameworks such as PyTorch, vLLM, and OpenAI's Triton. This means that users can run their existing models on Prometheus without needing to modify their code, reducing friction and encouraging adoption.
Server Design and Power Management
Prometheus is built for efficiency and scalability.
- ›The server is Open Compute Project-compliant.
- ›Modular memory design allows for customization based on user needs.
The Prometheus server is compliant with the Open Compute Project standards, ensuring it meets industry benchmarks for performance and scalability. It can accommodate up to four servers in a single rack, with a power draw of up to 120 kilowatts. The modular design of the memory allows users to customize the server's memory capacity according to their specific requirements, making it a flexible solution for various AI applications.
Frequently Asked Questions
What is the main advantage of the Prometheus server?
The main advantage of the Prometheus server is its ability to provide up to 128 terabytes of memory, significantly enhancing the performance of large language models.
How does Prometheus differ from Nvidia's AI servers?
Prometheus utilizes a DRAM-centric architecture and a proprietary memory interface, allowing for greater memory capacity and flexibility compared to Nvidia's high-bandwidth memory systems.
What types of AI frameworks does Prometheus support?
Prometheus supports popular AI frameworks such as PyTorch, vLLM, and OpenAI's Triton, enabling users to run existing models without code modifications.
What is the Ignite chip used for in Prometheus?
The Ignite chip serves as the compute engine for Prometheus, combining ARM and RISC-V cores to efficiently handle LLM processing tasks.
How is power managed in the Prometheus server?
Prometheus is designed to draw up to 120 kilowatts per rack and employs cold-plate liquid cooling to manage heat effectively.
Majestic Labs is poised to redefine AI server capabilities with Prometheus.
Continue Learning
Comments
Sign in to join the conversation