KServe Joins CNCF To Standardize AI Model Serving on Kubernetes

At KubeCon+CloudNativeCon North America last month, the Cloud Native Computing Foundation accepted the open source KServe software as an incubating project. KServe’s prominence in the cloud native space illustrates how much Kubernetes has come to be a bedrock for AI computing, offering a scalable open source platform for enterprises to run their own generative AI and predictive work. “The rising complexity of modern AI workloads drives an urgent need for robust, standardized model serving platforms on Kubernetes,” said TOC sponsor Kevin Wang, in a statement. “Its focus on scalability, particularly multinode inference for large language models, is key to providing efficient serving and deployment solutions for cloud native AI infrastructure.” The KServe development team will work through the CNCF Graduation Criteria with the goal of becoming “a fully abstracted, elastic inference platform where users solely focus on models and pre/post-processing while KServe handles the orchestration, scaling, resource management, and deployment,” according to the CNCF. The Origins and Evolution of KServe What does KServe do? It defines how a model is served within an organization, providing a single API to access. It “gives us a standard scalable way to run self-hosted models on-prem and it gives every model a stable internal endpoint that the gateway can talk to,” explained Bloomberg senior engineer for AI infrastructure Alexa Griffith in a presentation at KubeCon. Google, IBM, Bloomberg, Nvidia and Seldon Technologies LLC collectively created KServe, launching it in 2019 originally under the KubeFlow project (as “KFServing”). The project was then donated to LF AI and Data Foundation in 2022, and then submitted to the CNCF lasty September. In September 2022, the project rebranded from KFServing to the standalone KServe, graduating from Kubeflow. KServe then moved to CNCF as an incubator in September 2025. The software was originally built for predictive inference, but was expanded for LLM-based generative AI usage when ChatGPT caught the public’s imagination. Every problem Bloomberg encountered running LLMs, it was able to use to help build in KServe support for generative AI work in KServe, Griffith said. Although KServe was built for predictive inference, the project “created all these new features for generative AI”–Bloomberg’s Alexa Griffith Understanding KServe’s Core Components KServe actually has three components. One is the namesake KServe Kubernetes controller, which reconciles KServe custom resource definitions (CRDs) that define ML resources and other Kubernetes objects. The InferenceService CRD manages predictive inference, and the LLMInferenceService CRD covers the GenAI use cases. The ModelMesh is the management and routing layer for models, built to rapidly change out model use cases. And the Open Inference Protocol provides a standard way, via either HTTP or gRPC, to perform machine learning model inference across serving runtimes for different ML frameworks. “On the technical front, KServe’s rich integration with Envoy, Knative, and the Gateway API anchors it powerfully within the CNCF ecosystem,” explained Faseela K, CNCF Technical Oversight Committee sponsor, in a statement. “The community’s welcoming nature has made it easy for new contributors and adopters to get involved, which speaks volumes about its health and inclusiveness.” Key Features for Predictive and Generative AI For predictive modeling jobs, KServe offers: Multi-Framework support, spanning TensorFlow, Python’s PyTorch and scikit-learn, XGBoost, ONNX, and others. Intelligent Routing that understand the routing requirements for predictor, transformer, and explainer components with automatic traffic management. Advanced Deployment patterns for Canary rollouts, inference pipelines, and ensembles with InferenceGraph. Autoscaling, including scale-to-zero capabilities. And for generative AI the software provides: LLM-Optimized: OpenAI-compatible inference protocol for seamless integration with large language models. GPU Acceleration: High-performance serving with GPU support and optimized memory management for large models. Model Caching: Intelligent model caching to reduce loading times and improve response latency for frequently used models. At present, the project has 19 maintainers, along with more than 300 contributors. Over 30 companies have adopted the technology, and either contribute to the project or just use the technology. It has gathered over 4,600 GitHub stars. The post KServe Joins CNCF To Standardize AI Model Serving on Kubernetes appeared first on The New Stack.

KServe Joins CNCF To Standardize AI Model Serving on Kubernetes

Related Articles

Ongoing SoundCloud issue blocks VPN users with 403 server error

700Credit data breach impacts 5.8 million vehicle dealership customers

AWS Weekly Roundup: Amazon ECS, Amazon CloudWatch, Amazon Cognito and more (December 15, 2025)

How Nutanix Is Taming Operational Complexity

Flaw in Hacktivist Ransomware Lets Victims Decrypt Own Files

What Is Google’s Agent Development Kit? An Architectural Tour