SecDevOps.comSecDevOps.com
Kubernetes GPU Management Just Got a Major Upgrade

Kubernetes GPU Management Just Got a Major Upgrade

The New Stack(3 days ago)Updated 3 days ago

“As a low-level systems engineer, if you do your job right, no one knows you exist — but the minute you do your job wrong, everybody knows you exist.” That observation from Nvidia Distinguished...

“As a low-level systems engineer, if you do your job right, no one knows you exist — but the minute you do your job wrong, everybody knows you exist.” That observation from Nvidia Distinguished Engineer Kevin Klues underlines why the Kubernetes open source community has been quietly building foundational features and abstractions that will shape how organizations run AI workloads for the next decade. At KubeCon + CloudNativeCon North America 2025 in Atlanta, New Stack Founder and Publisher Alex Williams led a panel discussion with Klues and Jesse Butler, principal product manager at Amazon Web Services, about two developments that deserve more attention: dynamic resource allocation (DRA) and an upcoming workload abstraction that could transform multinode AI deployments. DRA: GPUs That Work Like Storage Dynamic resource allocation (DRA), which reached general availability in Kubernetes 1.34, solves the long-standing frustration around requesting GPU resources in Kubernetes. “The only knob you had in the old way of requesting access to resources was a simple count,” Klues said. “You could say, ‘I want two GPUs,’ but you couldn’t say what type of GPU. You couldn’t say how you might want that GPU to be configured once it’s given to you.” DRA, which Butler called “one of the most elegant things I’ve ever seen,” borrows its conceptual model from persistent volumes and persistent volume claims — familiar abstractions that storage teams have used for years. The difference is that DRA works with any specialized hardware, not just storage, meaning that third-party vendors can now bring their own device drivers and make hardware accessible to Kubernetes users in standardized ways. A New Workload Abstraction for Smart Scheduling But DRA alone isn’t enough for complex AI deployments. Sometimes you need multiple pods across multiple nodes to all come online simultaneously or, conversely, not at all. That’s the problem a new Kubernetes abstraction (called, simply, “the workload abstraction”) aims to solve. “You want to be able to express things like, I can have some subset of these pods come up, but if I can’t get all of them, I don’t want any of them to come up,” Klues said. “And, at least today, you can’t really express that in the Kubernetes world.” A basic implementation is slated for the Kubernetes 1.35  release on Dec. 17, though Klues emphasized there’s significant work ahead. The abstraction will let users define pod groupings with scheduling constraints and topology requirements, kind of like node selectors on steroids. “It’s going to shape the future of how all of this works for the next 10 years of Kubernetes,” Klues said, stressing that the Device Management Working Group, where these features take shape, strongly invites community participation. For the full conversation — including discussion of agentic AI architectures, small language models, and why Unix philosophy still matters in the age of large language models — check out the complete interview. The post Kubernetes GPU Management Just Got a Major Upgrade appeared first on The New Stack.

Source: This article was originally published on The New Stack

Read full article on source →

Related Articles