IBM Research and the University of Illinois Urbana-Champaign are collaborating to make artificial intelligence more accessible by addressing the fragmented software ecosystem for building AI platforms. Carlos Costa, an IBM Distinguished Engineer, leads efforts to bridge large-scale, high-performance distributed computing and modern cloud platforms through open-source projects for scaling AI. Dr. Costa has led and contributed to many projects in the field, including co-leading the creation and development of the llm-d project for scaling AI inference, leading the CodeFlare project for data and AI pipelines, and contributing to the Ray framework ecosystem. The IBM-Illinois Discovery Accelerator Institute partnership demonstrates how industry-academia collaboration advances generative AI, machine learning infrastructure, and distributed computing systems.
Written by Cassandra Smith
Carlos Costa
Research through the partnership between IBM and The Grainger College of Engineering at the University of Illinois Urbana-Champaign is addressing the challenge of making artificial intelligence accessible and cost-effective by fixing fragmented, expensive AI platforms.
Carlos Costa, a distinguished engineer with IBM Research and technical lead for AI platforms focusing on distributed inference, has made bridging the gap between traditional high-performance computing and modern cloud practices his mission. Coming from a background in building some of the world's largest supercomputers, Costa observed a significant disconnect between two camps: HPC experts focused on hand-tuned, low-level optimization, and a newer generation leveraging cloud computing abstractions for scaling simplicity, and productivity.
"I always felt like there was a very big opportunity to bridge these domains: How I could leverage more of the cloud native stack into high-performance distributed computing?" he said. He built middleware and runtimes to enable data analytics on the high-scale supercomputers he had previously helped to develop, eventually finding himself at the intersection of AI during the early days of deep learning.
The power of open standards
Costa credits much of cloud computing's value to common abstractions and standards not controlled by any single entity. He points to Kubernetes, which after being donated by Google became a community-driven technology and the de facto control plane for the cloud, as an example of how open standards allow multiple players to build value on top of a common foundation.
He approached the AI stack with the same spirit but found it more fragmented because of early dominance by single players and the emergence of specialized accelerators. This fragmentation led Costa to launch several projects.
Building community through open source
Costa engaged early with Ray, a popular open-source project creating an agnostic runtime for distributing Python-based applications. He also created the CodeFlare project, an IBM Research-initiated effort that later included Red Hat to scale data and AI pipelines. While CodeFlare showed traction, it proved challenging to align interests and attract as many participants as hoped, prompting Costa to look for more fundamental components that could rally broader support.
The emergence of generative AI provided that inflection point. Costa noted that the transformer architecture for large language models was the fundamental innovation behind modern generative AI systems like ChatGPT. At IBM, which was already training large models, it became clear that serving those models would present significant challenges, especially in terms of scalability.
Scaling inference with llm-d
Costa co-created llm-d in a joint effort initially between IBM and Red Hat to address the problem of scaling inference, serving models to many users by managing replicas, resource allocation, and traffic routing specific to generative AI workloads. The project combined IBM's research on inference routing, cache management and resource allocation with practical application.
The partnership with Red Hat proved pivotal. Joined by Google as core contributor, and among a total of 12 founding members at launch, including Nvidia, CoreWeave, AMD, Intel and others, they developed a hardware-agnostic, common control plane for scaling and serving LLMs in production.
"We don't see this in the industry very often... when you get all the major players joining and contributing to the same project, and especially when you have competitors, like AMD and Nvidia, joining a common initiative," Costa said. The community traction has been significant, and sparked numerous collaborations and joint explorations, from startups and academic groups to large-scale enterprises.
Expanding horizons through collaboration
Partnership, particularly with academia, has been a driving theme in Costa's work, providing a venue for exploring innovations, experimentation and contribution to the broader community.
Costa said the partnership between IBM and the university through the IBM-Illinois Discovery Accelerator Institute has expanded his research into new domains. Projects related to Ray and Code Flare have been applied to scientific fields, complementing IBM's business focus with areas more aligned with the university's mission.
"In the time we've been working with UIUC, there have been a lot of joint projects that helped us to find new areas to explore," Costa said. "We are only getting started. There's a lot of skill within the Institute that can help us expand even more."