The New Stack | DevOps, Open Source, and Cloud Native News

Choosing the Right Red Hat AI Solution: RHEL AI vs. OpenShift AI

Oyedele Tioluwani — Wed, 19 Mar 2025 15:00:24 +0000

Some projects need minimal overhead and fast results. Others require large-scale orchestration and deep integration. For your project, the ideal AI setup will fit your immediate needs without hampering your future ambitions.

Red Hat addresses these challenges with two paths: Red Hat Enterprise Linux (RHEL) AI for simpler deployments and OpenShift AI for scaling complex environments. RHEL AI integrates with existing workflows and aims at smaller workloads, while OpenShift AI enables advanced pipelines and cluster-level coordination for bigger projects. Both solutions align with different stages of an AI journey.

This guide will unpack each’s strengths and help you decide which is best for your project and when to deploy it.

Comparing RHEL AI and OpenShift AI

For organizations evaluating Red Hat’s AI solutions, this list highlights the core differences between RHEL AI and OpenShift AI in terms of deployment, scalability and automation.

Deployment model
- RHEL AI: Single production server
- OpenShift AI: Distributed Kubernetes across hybrid clouds
Complexity and setup
- RHEL AI: Straightforward
- OpenShift AI: Advanced capabilities, more robust features
Scalability
- RHEL AI: Good for smaller AI projects
- OpenShift AI: Designed for mid- to large-scale AI
Machine learning operations (MLOps) automation
- RHEL AI: Integrated but simpler
- OpenShift AI: Comprehensive pipeline automation
Open source tools
- RHEL AI: Granite LLMs, InstructLab, vLLM, Docling
- OpenShift AI: Granite LLMs, InstructLab, Open Data Hub, vLLM, KubeFlow, partner integrations
Ideal use cases
- RHEL AI: On-premises, smaller scale
- OpenShift AI: Hybrid cloud, enterprise AI at scale
Cloud and partner integration
- RHEL AI: Limited support from partners
- OpenShift AI: Extensive cloud and partner ecosystem integration

With this comparison in mind, let’s take a closer look at each solution, starting with RHEL AI.

RHEL AI: A Foundation for Individual Servers

RHEL AI is an easy-to-deploy, server-centric AI platform that efficiently runs on standalone servers (on premises or in the cloud) for organizations seeking a straightforward generative AI (GenAI) solution. It removes the burden of large-scale orchestration overhead, which is ideal for teams that want to focus on developing AI without managing distributed infrastructure. It is also best suited for teams focused on AI development while maintaining data privacy and security.

Some of its key benefits include:

IBM Granite LLMs, which allow for rapid prototyping with pre-trained AI models.
InstructLab, which simplifies model alignment and fine-tuning with minimal setup.
Ability to run pre-trained AI models locally without requiring dynamic scaling.
A simpler architecture that reduces maintenance and lowers consumption of IT resources.

Smaller teams, research institutions and businesses with strict data governance policies can benefit from RHEL AI. For many organizations, especially those in the early stages of AI adoption, this lightweight yet capable platform is more than enough to get started.

Why Starting With RHEL AI Makes Sense

The best approach is often to start small, and RHEL AI allows this with its easy setup, lower expense and incremental adoption of AI. It’s good for teams exploring AI without committing to complex platforms. Although powerful, Kubernetes orchestration can add unnecessary complexity early on. This makes RHEL AI a practical choice before scaling up.

Aside from its ease of use, RHEL AI also offers flexibility. It accommodates open source AI frameworks, allowing you to test AI models without being held hostage to vendors. This makes it a good fit for research teams and startups that must prove AI use cases prior to scaling.

However, while RHEL AI is effective for smaller projects, it lacks features for large-scale AI operations. Some of its limitations are:

No distributed, multicluster AI training — it’s not suitable for organizations handling complex, high-volume workloads.
Limited automation — it lacks the advanced MLOps tools available in OpenShift AI.

Organizations with long-term AI ambitions may start with RHEL AI but should plan for a transition to a more scalable solution as workloads expand.

OpenShift AI: Built for Scalable, Enterprise-Grade AI

OpenShift AI provides a platform for building, training, deploying and monitoring predictive and generative AI models. It offers orchestration, automation and scalability for large-scale AI workloads on multiple hybrid cloud environments. It also includes Kubernetes-native scalability, making it capable of effectively scheduling and carrying out resource allocation for demanding AI applications.

OpenShift AI offers a number of advantages, including:

Dynamically scaling AI workloads across distributed infrastructure.
Automating AI model training, deployment and monitoring through data science pipelines, making it easier to run operations.
Supporting a unified platform for AI model management from on-premises to the cloud with minimal manual overhead.
Conforming to security and compliance practices including role-based access control (RBAC), trustworthy AI for bias detection and guardrails to protect organizations from harm.
Adding custom cluster images to enhance collaboration on notebooks and model registry to track and share data science projects.

Organizations with multiple models or medium to large AI workloads need a platform that offers scalability, security and compliance. OpenShift AI is good for businesses looking to build ML pipelines and those with firm regulatory requirements, such as:

Large public-sector organizations that run multiple AI applications on hybrid cloud platforms.
Financial institutions where AI security, compliance and risk management are crucial.
Health and biotech firms that rely on AI for drug development and medical diagnostics.

For companies focusing more on high availability and resilient AI operations, OpenShift AI is the better platform for scalable, production-grade AI deployments.

Not Every AI Project Needs This Much Overhead

While OpenShift AI offers quite a number of benefits, including scalability and orchestration, it requires a steep learning curve and infrastructure requirements that not every organization is prepared to undertake. Here are some tradeoffs associated with OpenShift AI:

Managing Kubernetes-based AI workloads requires skilled expertise.
Higher operational complexity means that advanced features require more setup, maintenance and monitoring.
Robust automation and multicloud features typically demand greater investments in infrastructure and resources.

The overhead may outweigh the benefits for smaller teams or those just starting with AI. However, for companies concentrating on scalability, automation and resilience, OpenShift AI remains a strategic long-term option.

For example, a retail company managing AI-driven recommendations across multicloud infrastructure would benefit from OpenShift AI’s model monitoring and performance optimization to achieve a cost-effective solution for AI workloads at scale. Meanwhile, a research institution with strict data privacy requirements may choose RHEL AI for its lightweight, on-premises deployment, avoiding cloud complexity.

Which AI Solution Is Right for You?

Selecting between RHEL AI and OpenShift AI depends on your AI development strategy and scalability needs.

RHEL AI is ideal for server-centric AI workloads, individual deployments and simpler AI use cases.
OpenShift AI thrives in multicloud environments, offering enterprise-grade AI orchestration, MLOps automation and large-scale AI model training and inferencing.

For Red Hat shops, a balanced strategy involves starting with RHEL AI for experimental or small-scale AI models. Organizations can then transition to OpenShift AI when AI workloads demand hybrid cloud infrastructure, scalable AI and enterprise support.

Making the right AI platform choice improves adoption and scalability as your needs evolve. The key to success is planning ahead for AI expansion.

The post Choosing the Right Red Hat AI Solution: RHEL AI vs. OpenShift AI appeared first on The New Stack.

Five Critical Shifts for Cloud Native at a Crossroads

Kim McMahon — Wed, 19 Mar 2025 14:03:01 +0000

As enterprises run ever-more-complex workloads on Kubernetes, they’re facing a new set of challenges: how to ensure security requirements are met and how budgets are deployed efficiently and, also, operational complexity is, well, not as complex. Many are finding that the full potential of their cloud native investments now requires fundamental changes to the way they approach infrastructure, starting with the operating system itself.

With technical leaders evaluating cloud native strategies for the next era, I see five interconnected forces reshaping what’s possible for how cloud native infrastructure is built, secured and operated.

Purpose-Built OSes as a More Secure Foundation

General-purpose operating systems can become a Kubernetes bottleneck at scale. Traditional OS environments are designed for a wide range of use cases, carry unnecessary overhead and bring security risks when running cloud native workloads. Enterprises are increasingly instead turning to specialized operating systems that are purpose-built for Kubernetes environments, finding that this shift has advantages across security, reliability and operational efficiency.

The security implications are particularly compelling. While traditional operating systems leave many potential entry points exposed, specialized cloud native operating systems take a radically different approach. By designing the OS specifically for container workloads, organizations can dramatically reduce their attack surface with security controls that align precisely with Kubernetes security best practices.

More granularly, these specialized systems include built-in automated network-level encryption, using technologies like WireGuard and KubeSpan to secure cluster communications with lean, efficient cryptography. API-based management replaces traditional interfaces like Bash and SSH, enforcing consistency with Kubernetes’ declarative model while eliminating many of the common sources of human error. Communications between components are secured through Mutual TLS (mTLS) encryption, ensuring that only properly authenticated services can interact within the cluster.

For those ready to modernize their cloud native infrastructure, the criteria for selecting these specialized operating systems should include alignment with CIS Benchmarks for container security and, for Linux distributions, adherence to Kernel Self-Protection Project (KSPP) guidelines. These standards ensure that security is engineered into the foundation of the operating system, rather than added as an afterthought.

Moving Kubernetes Beyond Public Cloud Dependencies

Cost-conscious organizations (Is there another kind?) are discovering that running Kubernetes workloads solely in public clouds isn’t always the best approach. Momentum has continued to grow toward pursuing hybrid and on-premises strategies for greater control over both costs and capabilities. This shift isn’t just about cost savings, it’s about building infrastructure precisely tailored to specific workload requirements, whether that’s ultra-low latency for real-time applications or specialized configurations for AI/machine learning workloads.

The key to making this transition successful lies in the infrastructure stack. Organizations are selecting operating systems and tools specifically designed for bare metal Kubernetes deployments, enabling them to achieve cloud-like flexibility without the traditional overhead of public cloud environments. These purpose-built platforms improve operational efficiency while maintaining the portability that cloud native architectures promise. The result is true infrastructure flexibility: Workloads can move seamlessly between on-premises, edge and cloud environments as business needs dictate, avoiding vendor lock-in while optimizing for specific performance and cost requirements.

Declarative Principles as the New Infrastructure Standard

Kubernetes introduced enterprises to the power of declarative configurations. Now that approach is expanding beyond container orchestration to reshape the entire infrastructure stack. Forward-thinking organizations are applying declarative principles to operating systems, networking and security, creating truly cloud native environments where infrastructure itself is treated as code.

Shifting toward declarative operations goes beyond technical elegance. The strategy yields tangible business benefits by reducing operational complexity and human error. When infrastructure components follow the same declarative model as Kubernetes, teams can manage complex environments more consistently and reliably. Organizations are finding that adopting lightweight, purpose-built operating systems designed for declarative management amplifies these benefits, further simplifying operations while improving security and performance.

The result is a more cohesive cloud native stack where every layer — from the operating system to application deployment — follows consistent principles of Infrastructure as Code (IaC). This approach is freeing technical teams from routine maintenance tasks, allowing them to focus on innovations that drive business value.

Cloud Native Architecture as a Sustainability Driver

Compute infrastructure’s environmental impact has become impossible to ignore, particularly as organizations scale their cloud native workloads and AI initiatives. In response, teams are discovering that the principles that make cloud native architectures efficient (namely, minimalism, automation and precise resource allocation) also make them more environmentally sustainable.

I’ve seen more organizations setting aggressive efficiency targets for their Kubernetes environments, recognizing that optimized infrastructure delivers both environmental and economic benefits. This optimization starts at the OS level, where lightweight, purpose-built distributions can significantly reduce resource consumption compared to general-purpose alternatives. When combined with intelligent workload scheduling and automated scaling, these optimized environments can then improve infrastructure utilization while reducing energy consumption.

The sustainability benefits of this approach extend beyond energy efficiency. Streamlined, container-optimized operating systems require fewer compute resources to operate, enabling organizations to run more workloads on existing hardware. This not only reduces operational costs but also minimizes the environmental impact of hardware procurement and disposal.

The Edge as the Next Evolution

The divide between cloud and edge computing is rapidly dissolving as organizations push Kubernetes deployments closer to where data is generated and consumed. This shift is more than reducing latency; it’s about applying cloud native principles to solve complex distributed computing challenges. Organizations are now deploying Kubernetes at the edge (even in single-node clusters) to bring consistency and simplified operations to their most remote infrastructure.

But success at the edge demands infrastructure designed for distributed operations. The same principles I’ve discussed — specialized operating systems, declarative management and resource efficiency — become even more critical in edge environments where physical access is limited and reliability is paramount. Teams are finding that lightweight, security-focused operating systems designed for Kubernetes workloads are particularly well-suited for edge deployments, offering automated updates, minimal attack surfaces and efficient resource utilization.

This convergence of edge and cloud native technologies marks a significant evolution in enterprise infrastructure. By extending Kubernetes-based operations to the edge, organizations can maintain consistent practices across their entire infrastructure footprint while optimizing for local computing needs

Act Today to Build Tomorrow’s Cloud Native Infrastructure

These five trends signal a fundamental shift in the way enterprises approach cloud native infrastructure. The time to act is now. Organizations that move decisively to modernize their infrastructure stack will be better positioned to scale their cloud native operations while maintaining security, controlling costs, and driving innovation.

To learn more about Kubernetes and the cloud native ecosystem, join us at KubeCon + CloudNativeCon Europe in London on April 1–4.

The post Five Critical Shifts for Cloud Native at a Crossroads appeared first on The New Stack.

Pagoda: A Web Development Starter Kit for Go Programmers

Loraine Lawson — Wed, 19 Mar 2025 13:10:44 +0000

In 2020, Mike Stefanello fell in love with Go.

“It’s the first time I ever said that I loved a programming language or even a technology or tool, but it was that kind of reaction — I just really, really fell in love with it,” the software engineer said. “I knew I wanted to work with it.”

Stefanello also did a lot of web development in the form of personal projects and at that time, most web development was PHP-based. One day, he saw a Hacker News post asking developers what their web stack of choice was for personal projects.

“I sat there just thinking, ‘I actually don’t have an answer to that question,’” he said. “I don’t want to go back to all the stuff in PHP that I was using previously. I love Go. I’m very obsessive and I couldn’t not have an answer to that question.”

Out of that frustration and exploration, Pagoda was born.

“It was more about the love of Go and the love of web development,” he said. “It wasn’t like I was thinking about which language should I use to get into web — I started in web development.”

Pagoda: A Starter Kit for Go

Pagoda is not a framework — Stafanello stressed this repeatedly. It’s a starter kit for web development that provides frontend and backend libraries glued together by Go code. The Go generates the HTML server-side to create the web pages.

The starter pack approach may seem strange to frontend developers who are used to a universe of JavaScript frameworks, but backend developers want to keep it simple if they have to work with the frontend, Stefanello told The New Stack.

“Most of us don’t want to have to switch languages, especially if you’re not using JavaScript in the backend,” he said. “I can understand if you are, then you’re used to that ecosystem. But if you’re not used to it — and I’m really not, I haven’t really gotten hands-on with JavaScript in a very long time — it feels like a bit of a chaotic ecosystem, and it’s really hard to grasp.”

But that left him with the quandary of how to build a modern, sleek web app without having to open up JavaScript and commit to big frameworks, such as React and Vue, he said.

“Nothing at all against them,” he added. “They’re all amazing projects, obviously, but it just comes down to personal choice. If you’re a backend developer, you want to be focusing on the frontend as little as possible.”

But why not make Pagoda a Go framework? Because the Go community doesn’t seem interested in that, he said. Whereas in PHP, there’s a mega singular framework — Laravel — there’s not really an equivalent in Go, he said.

A sample home page made in Pagoda by Mike Stefanello. The functional website is included in the repo.

“If you’re new to Go, it’s confusing: Why isn’t there one and why don’t people like it?” Stefanello said. “But I think the more you use Go, the more you begin to appreciate that.”

Pushup is an exception in that it’s a Go web development framework. There is also GoBuffalo, which notes on its site that it could be a framework, but instead describes itself as a “Go web-development ecosystem” that’s “mostly an ecosystem of Go and JavaScript libraries curated to fit together.”

Stefanello made the choice early on not to create a framework because he didn’t like the idea of being bound by a singular, mega framework.

“They tend to be overly bloated,” he said. “They tend to really force patterns where you have to do it.”

Also, developers tend to outgrow frameworks but then are locked in by the framework, he added. And then there’s the risk that the framework authors will stop maintaining it.

By comparison, starter kits let web developers quickly jumpstart web development without the drawbacks of a full framework, he added.

“The nice thing about the starter kit is it solves all those problems,” he said. “There are no strict patterns to follow. I provide some ideas and patterns and I glue things together just to make things easy and kind of get you up and running. But none of that is forced. There’s nothing strict about it.”

Even if he stops maintaining Pagoda, web developers have what they need to continue.

“I’m basically just doing a lot of the work for you, and then you can take over from there,” he said. “You don’t have to worry about if I stop maintaining it because as soon as you copy the starter kit, it’s yours — 100% of it’s yours. ”

The Pagoda Frontend

Pagoda includes three libraries for the frontend:

HTMX, which provides access to AJAX, CSS transitions, web sockets, and server events directly in HTML. “The beauty of something like HTMX is that it enables you to have that Ajax-type behavior where you don’t have to do full-page reloads,” he said. “It’s the kind of functionality you expect or you see a lot on the JavaScript-driven single-page apps. You can use as much or as little as you want but without having to write a line of JavaScript, you can take regular HTML and create really good interactivity on your site.”
Alpine.js, which Stefanello said was much like JQuery, but for the modern web. It’s a minimal tool for composing behavior directly in markup. “What’s really nice about it, too, is that it all works inside of — for the most part — your HTML, so you don’t even have to actually write standalone JavaScript,” he said. “You can just add a bunch of Alpine tags and some declarations and tell the HTML what to do. And it’s really quite remarkable how far you can get with that. It’s a project that I really enjoy using.”
Bulma, an easy-to-use CSS framework. “It’s just really easy — you just throw some classes and you have a pretty decent-looking UI,” he said.

If you don’t like those libraries, you can switch them out. For instance, Tailwind could replace Bulma, he said, and it can be done in minutes.

The Backend of Pagoda

On the backend, Pagoda includes:

Echo: A high-performance, extensible, minimalist Go web framework.
Ent: A powerful ORM for modeling and querying data.
Gomponents: HTML components written in pure Go. They render to HTML 5, and make it easy to build reusable components.

Again, Stefanello stressed that any of these libraries could be replaced — in fact, this month, he replaced Go templates with Gomponents.

“If you ever go through any of the Go communities, whether it’s Reddit or Slack or Discord or whatever, there’s a lot of frustration with the template — especially when it comes to HTML, they do leave a lot to be desired, “ he said.

He enumerated the problems: They aren’t type-safe; if the code has an error, you can’t tell until you run the application; it’s difficult to pass data between different templates; and finally, it’s tricky to get them to compile in a way that’s easy to use within a web application.

Gomponents is a library created by Markus Wüstenberg, and it renders to HTML 5, making it easy to build reusable components.

“That was probably the single biggest code change that I made in the project […] since the project started,” he said. “This was a big fundamental change, to go away from the Go standard library templates to using a third-party solution.”

SQLite provides the primary data storage as well as persistent, background task queues, according to the documentation. However, it can be swapped out if a developer prefers to use Postgres or Redis.

The project has even been forked to create GoShip, which is a Go plus HTMX boilerplate with all the essentials for SaaS, AI tools or web apps.

The post Pagoda: A Web Development Starter Kit for Go Programmers appeared first on The New Stack.

CIQ Previews a Security-Hardened Enterprise Linux

Steven J. Vaughan-Nichols — Wed, 19 Mar 2025 13:00:02 +0000

CIQ is best known as the founding company behind the CentOS-variant Rocky Linux. Recently, though, it’s been flexing its muscles in the enterprise Linux space; first when it started to offer a business support contract for Rocky Linux from CIQ (RLC) and now with a technical preview for Rocky Linux from CIQ – Hardened.

Hardened, an Enterprise Linux designed to meet the most stringent security requirements. This enhanced operating system is tailored for mission-critical environments, offering robust security features to combat the increasing sophistication and volume of cyberthreats.

How, you ask? This security-first version comes with:

System Level Hardening: This version minimizes zero-day and CVE risks by eliminating many potential attack surfaces and common exploit vectors. It includes code-level hardening that blocks commonly used exploit paths, reducing the risk of successful attacks.
Advanced Threat Detection: It utilizes the Linux Kernel Runtime Guard (LKRG) to detect sophisticated intrusions that evade traditional security measures. This proactive approach helps identify and mitigate threats before they become major issues.
Strong Access Controls: Featuring advanced password hashing, strict authentication policies, and hardened access controls. These measures enhance the security of user authentication and access to system resources1 2 4.
Accelerated Risk Mitigation: The system addresses security threats ahead of standard updates, significantly reducing exposure time. This ensures that organizations are protected from vulnerabilities more quickly than with traditional update cycles.
Secure Supply Chain: All packages are validated and delivered via a secure supply chain, ensuring that the operating system is delivered securely and is always up to date.
Proactive Security Approach: Unlike many distributions that focus on fixing individual CVEs, Rocky Linux from CIQ – Hardened aims to proactively mitigate entire classes of similar bugs that are not yet discovered or patched.
Premium Support: It offers premium support from experienced Linux security experts, providing assistance in troubleshooting and addressing unique security needs.

In a statement, Alexander Peslyak (aka Solar Designer), lead for the Openwall project for two decades and now a CIQ employee, said. “While most distributions still fix individual CVEs one at a time, Rocky Linux from CIQ — Hardened will fix CVEs and also learn and introduce changes so it can proactively mitigate entire classes of similar bugs that are not yet discovered or patched.”

The business motivation for this distro, according to CIQ CEO Gregory Kurtzer, was driven by conversations with security-concerned IT executives. The goal is to provide a fortified software infrastructure that addresses vulnerabilities and enhances the security of enterprise applications and services.

The technical preview is already available for sign-up, with an official launch planned for March 20. Want to know more? A webinar discussing detailed features of Hardened is scheduled for March 19.

The post CIQ Previews a Security-Hardened Enterprise Linux appeared first on The New Stack.

RamaLama Project Brings Containers and AI Together

Scott McCarty — Wed, 19 Mar 2025 00:00:36 +0000

The RamaLama project stands at the intersection of AI and Linux containers and is designed to make it easier to develop and test AI models on developer desktops.

With the recent launch of RamaLama’s website and public invitation to contribute, I decided to catch up with two of the founders of the project, Eric Curtin and Dan Walsh. Dan and Eric previously worked together on the Podman container management tool, recently accepted as a Cloud Native Computing Foundation (CNCF) project.

How RamaLama Got Started

Scott McCarty: How did you get involved with RamaLama?

Eric Curtin, software engineer at Red Hat: RamaLama was a side project I was hacking on. We started playing around with LLaMA.cpp, making it easy to use with cloud native concepts. I’m also a LLaMA.cpp maintainer these days. I have a varied background in software.

Dan Walsh, senior distinguished engineer at Red Hat: I now work for the Red Hat AI team. For the last 15 years, I have been working on container technologies including the creation of Podman, now a CNCF project. For the last year or so, I’ve worked on bootable containers and this led to working on Red Hat Enterprise Linux AI (RHEL AI), which used bootable containers for AI tools. I also worked on the AI Lab Recipes, which used containers for running AI workloads. I worked with Eric a couple of years ago on a separate project, so we have kept in touch.

Scott: How and when did the RamaLama project get started?

Dan: Eric wrote up some scripts and was demonstrating his tools last summer, when I noticed the effort. I was concerned that the open source AI world was ignoring containers and was going to trap AI developers into specific laptop hardware and operating systems. And more importantly, exclude Linux and Kubernetes.

Eric: The initial goal of RamaLama was to make AI boring (easy to use) and use cloud native concepts. It was called podman-llm at the time. We had two main features planned back then: pull the AI accelerator runtime as a container and support multiple transport protocols (OCI, Hugging Face, Ollama). The diagram today in the README.md hasn’t really changed since.

Dan: I started suggesting changes like moving it to Python to make it easier for contributors and line up with most AI software. We renamed the project “RamaLama.” I also suggested we move the tools to the containers org on GitHub, where we had our first pull request merged on July 24, 2024.

Scott: Where did the name come from?

Eric: (laughs) I’ll leave that to Dan.

Dan: A lot of open AI content is using some form of Llama, spearheaded by Meta’s Llama2 AI model. We based some of the technology in RamaLama on Ollama, and the primary engine we use inside of the containers is LLaMA.cpp. So we wanted to somehow have a “llama” name. A silly song I recalled from when I was young was “Rama Lama Ding Dong,” so we picked the name RamaLama.

How RamaLama Works

Scott: What’s the advantage of using container images for AI models on the desktop?

Eric: We already use Open Container Image (OCI) as a distribution mechanism for things like application containers, bootc and AI runtimes. OCI registries are designed to transfer large data, and it’s a mature transport mechanism that’s already available in a lot of places.

Dan: Enterprises want to be able to store their AI content on their infrastructure. Many enterprises will not allow their software to pull directly from the internet. They will want to control the AI models used. They will want their models signed, versioned and with supply chain security data. They will want them to be orchestrated using tools like Kubernetes. Therefore being able to store AI models and AI content as OCI images and artifacts makes total sense.

Scott: How does RamaLama work?

Eric: RamaLama attempts to autodetect the primary accelerator in a system; it will pull an AI runtime based on this. Then it will use or pull a model based on the model name specified — for example, ramalama run granite3-moe — and then serve a model. That’s the most basic usage; there’s functionality for Kubernetes, Quadlet and many other features.

Dan: Another goal for RamaLama is to help developers get their AI applications into production. RamaLama makes it easy to convert an AI model from any transport into OCI content and then push the model to an OCI registry such as Docker Hub, Quay.io or Artifactory. RamaLama can not only serve models locally but will generate Quadlets and Kubernetes deployments to easily run the AI models in production.

Scott: Why is RamaLama important?

Dan: We make it easy for users to just install RamaLama and get up and running an AI model as a chatbot or to serve an AI-based service in a simple command, as opposed to the user having to download and install, and in some cases build, the AI tools before pulling a model to the system. One of the key ideas of RamaLama is to run the model within a container to protect the user from the model or the software running the model from affecting their host machine. Users running random models is a security concern.

Eric: It has given the community an accessible project for AI inferencing using cloud native concepts. We are also less opinionated about things like inferencing runtimes, transport mechanisms, backend compatibility and hardware compatibility, letting developers use and build on AI on their chosen systems.

RamaLama’s Support for Hardware and Other Tools

Scott: Are you able to support alternative hardware?

Eric: This is one area where RamaLama differs. Many projects have limited support for hardware and support just one or two types of hardware, like Nvidia or AMD. We will work with the community to enable alternate hardware on a best effort basis.

Dan: RamaLama is written in Python and can probably run anywhere Python is supported and supports Podman or Docker container engines. As far as accelerators, we currently have images to support CPU-only, as well as Vulkan, Cuda, Rocm, Asahi and Intel-GPU. A lot of these were contributed by the community, so if someone wants to contribute a containerfile (Dockerfile) to build the support for a new GPU or other accelerator, we will add it to the project.

Scott: What other tools does RamaLama integrate with?

Eric: RamaLama stands on the shoulders of giants and uses a lot of pre-existing technologies. From the containers perspective, we integrate with existing tooling like Podman, Docker, Kubernetes and Kubernetes-based tools. From the inferencing perspective, we integrate with LLaMA.cpp and vLLM, so we are compatible with tooling that can integrate with those APIs. There’s probably ways it’s being used that we are unaware of.

Scott: Does RamaLama work with the new DeepSeek AI model?

Eric: Yes, we were compatible with DeepSeek on the day the model was released. It’s one of the more impressive models; it’s interesting how it shows its thought process.

Dan: We have found very few GGUF (GPT-Generated Unified Format) models that it does not work with. When we have, we worked with the LLaMA.cpp project to get them fixed, and we have them working within a few days. We plan on supporting other models for use with vLLM as well.

What’s Ahead for AI?

Scott: Any other thoughts on RamaLama or the future of AI?

Dan: I see our AI adventure as being a series of steps. First, we play and serve AI models. RamaLama does this now. We want to enhance this by adding other ways of using AI models like Whisper. Next, we are actively working on helping users convert their static documents into retrieval-augmented generation (RAG) databases using open source tools like Docling and Llama Stack. After that, we add support for running and service models along with RAG data to improve the ability of AI models to give good responses. All this will be done focusing on containerizing the AI data as well.

The next step after that is support for AI agents. These agents allow AI models to interact with random APIs and databases all over the internet. We are seeing a ton of work going on in this field in the open source world. We want to make it easy for developers to take advantage of these tools and to eventually put them into production.

Eric: We welcome the community to get involved. I still see RamaLama as being in its infancy. We’ve only barely touched on things like RAG, AI agents, speech recognition and Stable Diffusion. I’m looking forward to seeing how the community will use it. Podman at the start was used for things like servers; now we see more creative usages of it like Podman Desktop, toolbox and bootc. I’m looking forward to seeing how RamaLama evolves for unprecedented use cases.

To learn more about Kubernetes and the cloud native ecosystem, join us at KubeCon + CloudNativeCon Europe in London from April 1-4.

The post RamaLama Project Brings Containers and AI Together appeared first on The New Stack.

After DeepSeek, NVIDIA Puts Its Focus on Inference at GTC

Frederic Lardinois — Tue, 18 Mar 2025 23:30:20 +0000

Earlier this year, the news that DeepSeek built a highly competitive reasoning model with only minimal training cost sent NVIDIA’s stock into a nosedive as analysts started wondering whether the age of large-scale AI hardware investments was coming to an end. It’s maybe no surprise then that much of this year’s keynote at GTC, NVIDIA’s annual conference, felt like a reaction to this. Jensen Huang, NVIDIA’s CEO and co-founder, announced the usual slew of new software and hardware, including the next generation of its flagship accelerators and an interesting array of desktop-scale AI systems for developers. But at the core of all of this was one message: the next generation of applications will be AI-based — and making that work, especially in the age of reasoning models and agents, will take massive amounts of compute power, which NVIDIA is more than happy to provide.

Indeed, NVIDIA says it expects demand for AI compute to increase by 100x compared with previous estimates. Interestingly, when Huang wanted to demonstrate how much compute power the new reasoning models will need, he chose to compare Meta’s more traditional Llama model to DeepSeek R1. That was surely no coincidence. DeepSeek, as it turns out, used 150 times more compute and generated 20 times more tokens.

“Inference at scale is extreme computing,” Huang said. There is always a trade-off between latency and compute cost to be made here. Either way, Huang argued, the amount of tokens generated will only continue to increase. Training wasn’t completely left out of the keynote, of course, but it was hard to look at a large part of the presentation and not think that at least the first half or so was a reaction to DeepSeek.

Huang also argued that there is a general platform shift happening from hand-coded software built on general-purpose computer storage to machine learning software built on accelerators and GPUs. This also means — and that’s good for NVIDIA — that the future of software development means capital investments. Before, Huang noted, you wrote software and you ran it. Now, “the computer has become a generator of tokens,” he said, and in his view, most enterprises will soon build what he likes to call “AI factories” that will run in parallel to their physical plants.

For developers, NVIDIA announced the DGX Spark and the DGX Station. The Spark can run in parallel to an existing desktop or laptop and looks somewhat akin to a Mac Studio. The DGX Station, meanwhile, is essentially a full-blown desktop workstation for data scientists with up to 500 teraFLOPS of compute power.

To speed up inferencing and drive down the cost in the data center, NVIDIA announced several new accelerators, including the Blackwell Ultra family and the upcoming Vera Rubin, Rubin Ultra and the Feynman generations of its chips, all of which will boast significant increases in compute performance and memory bandwidth over their respective predecessors.

NVIDIA is clearly going for a somewhat Intel-like tick-tock cadence here with a new generation of chip every year and then an optimized ‘ultra’ version soon after. To put a point on this, Huang joked that he was the “chief destroyer of revenue” at NVIDIA because nobody should be buying the current generation of its Hopper chips anymore.

NVIDIA’s Dynamo inferencing framework.

Another new project the company announced today is Dynamo, an “open source inference software for accelerating and scaling AI reasoning models in AI factories,” as NVIDIA describes it. The idea here is to provide an optimized framework for running reasoning models in the enterprise data center.

“Industries around the world are training AI models to think and learn in different ways, making them more sophisticated over time,” said Huang. “To enable a future of custom reasoning AI, NVIDIA Dynamo helps serve these models at scale, driving cost savings and efficiencies across AI factories.”

And as if to stress its overall focus on inference even more, NVIDIA is also launching its own family of reasoning models, Llama Nemotron, which is optimized for inferencing speed (and boasts a 20% increase in accuracy over the Llama model it’s based upon).

Overall, the reaction to this year’s GTC keynote seemed a bit more muted than to last year’s event. In part, that may be because there just weren’t as many announcements as in previous years, or that they were technically impressive but also a bit esoteric (like its photonics-based networking hardware) — but also because the show felt more reactionary than visionary this time around.

The post After DeepSeek, NVIDIA Puts Its Focus on Inference at GTC appeared first on The New Stack.

From Basics to Best Practices: Python Regex Mastery

Jessica Wachtel — Tue, 18 Mar 2025 22:00:12 +0000

Regex, short for regular expressions, is a powerful tool for matching and manipulating text. It automates various text-processing tasks, such as validating email addresses, extracting data from log files and cleaning messy datasets. While regex syntax is quite similar across programming languages, this tutorial will focus on how it works specifically in Python.

What Does Regex Do?

• Data extraction: Extracts data points like email addresses, phone numbers and error codes from text
• Validate user input: Ensures that user input (e.g., email addresses, phone numbers and passwords) is in the correct format
• Search and replace data: Modifies text without human intervention
• Automates repetitive tasks: Automates processing of logs, files and large datasets

Regex Best Practices

• Build incrementally: Develop regex patterns step by step to avoid confusion in complex code blocks.
• Test efficiency: Avoid slow executions by testing regex efficiency.
• Use raw strings: Prevent backslashes from being interpreted as escape characters by using raw strings (e.g., r"\d+").
• Debug with tools: Use online tools like Regex101 to help debug and refine patterns.

The `re` Module

In Python, regex functionality is provided by the re module. This module supports pattern matching, searching and string manipulation. Built-in functions like re.search(), re.match() and re.sub() allow for complex pattern matching. Without the re module, Python supports basic pattern matching using methods like .find(), .startswith(), .endswith() and .replace(). While these built-in methods allow basic matching, the re module is necessary for more advanced regex operations.

You can import the re module using the same syntax as all other Python imports.

View the code on Gist.

Commonly used regex built-in functions:

The re module provides many useful functions, including:
• re.match(): Matches the pattern at the start of the string
• re.search(): Finds the first occurrence of the pattern
• re.findall(): Returns all occurrences of the pattern
• re.finditer(): Returns an iterator of match objects
• re.sub(): Replaces pattern matches with a specified string
• re.subn(): Replaces matches and returns the number of replacements
• re.split(): Splits the string by the pattern
• re.compile(): Compiles the pattern into a regex object
• re.fullmatch(): Checks if the entire string matches the pattern
• re.escape(): Escapes special characters in a string

Regex Categories and Their Applications

Characters and Literals

Searching for characters and literals finds exact matches for specified characters or sequences in a string. This is useful for finding fixed patterns, such as an error code in a log file or a product ID on an invoice.

Basic syntax:
• .: Matches any character except a newline
• a, b, 1: Matches the literal characters a, b or 1

Code example:
View the code on Gist.
Output: cat

Character Classes

Character classes allow searches for any character within a defined set (e.g., digits, letters). This category is helpful when you need to match patterns with varying characters, such as extracting customer phone numbers or dates.

Basic syntax:
• [abc]: Matches any of the characters a, b or c
• [^abc]: Matches any character except a, b or c
• [0-9]: Matches any digit
• [a-z]: Matches any lowercase letter
• \d: Matches any digit (equivalent to [0-9])
• \D: Matches any non-digit
• \w: Matches any word character (letters, digits and underscores)
• \W: Matches any nonword character
• \s: Matches any whitespace character (spaces, tabs and newlines)
• \S: Matches any non-whitespace character

Code example:
View the code on Gist.
Output: [‘12345’]

Quantifiers

Quantifiers control how many times a pattern should repeat, which is useful when data varies in length. They come in handy when matching repeated words or phrases, or when searching for data like email addresses where the length can vary.

Basic syntax:
• *: Matches 0 or more of the preceding element
• +: Matches 1 or more of the preceding element
• ?: Matches 0 or 1 of the preceding element (optional)
• {n}: Matches exactly n instances of the preceding element
• {n,}: Matches n or more instances
• {n,m}: Matches between n and m instances

Code example:
View the code on Gist.
Output: [‘Hello’, ‘world’]

Anchors

Anchors are used to match positions in the string rather than characters. They are helpful for verifying patterns in fixed positions, such as checking if an email address ends with a specific domain or if a sentence ends with a question mark.

Basic syntax:
• ^: Matches the start of a string (or line if in multiline mode)
• $: Matches the end of a string (or line if in multiline mode)
• \b: Matches a word boundary
• \B: Matches a nonword boundary

Code example:
View the code on Gist.
Output:

Groups and Captures

Groups and captures allow you to extract and manipulate parts of a string by capturing portions of the matched text for later use. This is particularly useful when you need to extract specific data, like names or error codes from logs.

Basic syntax:
• (abc): Captures the group abc as a match
• \1: Refers back to the first captured group
• (?:abc): Matches abc but does not capture it (non-capturing group)

Code example:
View the code on Gist.
Output: My
name

Alternation

Alternation is useful for matching one of multiple patterns. It’s often used when you need to match different possibilities, such as searching for multiple error codes in log files.

Basic syntax:
• a|b: Matches either a or b

Code example:
View the code on Gist.
Output: [‘cat’, ‘dog’]

Escaping Special Characters

In regex, certain characters (like ., * or ?) have special meanings. Escaping these characters allows you to match the literal characters themselves, which is helpful when they appear in your input but should not be interpreted as metacharacters.

Basic syntax:
• \.: Matches a literal dot (.)
• \*: Matches a literal asterisk (*)

Code example:
View the code on Gist.
Output: [‘.’, ‘.’]

Modifiers or Flags

Modifiers (or flags) modify how regex patterns are applied, such as when making searches case-insensitive or enabling multiline matching. These are useful for adjusting search behaviors based on context.

Basic syntax:
• i: Case-insensitive matching (re.IGNORECASE)
• g: Global matching (find all matches, implicitly handled in Python by re.findall())
• m: Multiline mode (matches start ^ and end $ of each line)

Code example:
View the code on Gist.
Output:
None
Hello

Conclusion

Regex is an essential tool for text processing, enabling tasks like data extraction, validation and replacement. Whether you’re cleaning data, automating tasks or extracting valuable information from text, understanding regex syntax and best practices is key. By leveraging Python’s re module, you can easily match complex patterns and automate repetitive tasks, improving both efficiency and accuracy in your work.

The post From Basics to Best Practices: Python Regex Mastery appeared first on The New Stack.

The ROI of Speed: How Fast Code Delivery Saves Millions

Jeffrey Burt — Tue, 18 Mar 2025 21:00:18 +0000

There is an odd tension among software engineering leaders when it comes to thinking about productivity, according to Rob Zuber, CTO of CI/CD platform provider CircleCI. They have to weigh the joy of creating products that companies can use against the need to make sure that it’s done in a way that best benefits their own businesses.

“Most engineering leaders grew up as engineers and value the personal reward and fulfillment of being really productive, because nobody likes to fight through toil and wrestle with tools and all those sorts of things,” Zuber told The New Stack. “They like to deliver product. Whether they’re really into scaling backend systems or putting frontends in front of customers, whatever that might be, that’s what engineers really value.”

But when it comes to being measured, there’s “sort of a curious, allergic reaction” from engineers, who feel like they’re always being judged, he said. Organizations need to solve for that, because engineering leaders and other executives have to wrestle with the tug and pull of gauging what they’re getting from an efficiency perspective vs. whether they’re moving forward fast enough relative to their peers.

Adding ROI to the Mix

That’s a reason why San Francisco-based CircleCI is taking an expanded approach to its annual State of Software Delivery report, with the sixth edition released Tuesday. The report still looks at the key metrics used for defining performance – duration, throughput, mean time to recovery (MTTR), and success rate – but the vendor also is measuring the ROI organizations derive from them, a key measuring tool for business leaders and stakeholders.

That also becomes an important metric as AI permeates almost every tier of software development, just as its foothold in every other aspect of IT and business grows.

“Certainly, it’s having an impact on our ability to deliver in some way and [it’s important] knowing, are we being successful? Are we keeping up with the competition?” Zuber said. “Those sorts of things are on the minds of engineering leaders, and we want to give them as much information as we can to help them understand where they are and where they can focus their energy.”

The report is based on the vendor’s analysis of almost 15 million workflows of teams building software on CircleCI’s platform. It also explores what such advanced technologies as automation in CI/CD, Infrastructure as Code (IaC), and AI mean for delivering software.

CircleCI metrics

Speed Is Key

Top-level findings were that the top 25% of performers were continuing to separate themselves from the rest of the pack, in large part due to speed. For example, they shipped updates three times as fast as teams in the bottom 25%, giving them a market advantage in development velocity.

They saved millions in annual development costs, again due to speed: they completed critical workflows five times faster than lower-performing units – which freed resources for strategic initiatives – and debugged products in minutes rather than days, freeing up more time for developers.

“The parts that have been consistent over the years but are still really important to us [is] that moving quickly wins the marketplace and moving quickly is dependent on great systems, processes, [and] approaches,” he said.

This is where those advanced technologies – particularly AI and automation – come in. They’re ramping up the speed of software delivery, and those organizations at the top of the list are the ones adapting to the rapidly evolving nature of engineering and delivering value to their users.

“Historically, we always had to choose between speed and quality, and it’s finally set in that going fast is actually something that drives quality, because you just can’t do it unless you’re really good at quality,” Zuber said.

Other Metrics Factor In

However, he cautioned that it’s not the only gauge. A development team can churn out products quickly, but that will do no good if the product is faulty or if the organization is slow to make fixes. That’s where other metrics matter, from workflow duration to recovery speed to success rate. The aim of the report is to give developers and team leaders details they can sort through in a nuanced way.

“It allows people to see a little more of themselves and that, ‘Oh, this blend of metrics looks a little bit like us. What does that say?’ We’re really great at going quick until … something breaks and then it takes us a really long time to fix it,” the CTO said. “‘Is that because our systems are complex? Does it say something about our culture?’ It could be a number of different things. At that point, it really is on teams and leaders to ask the questions. Data is the beginning of a conversation, not the end of a conversation.”

RecurShip and ROI

That data includes the ROI numbers. In the report, the researchers quantify it through a fictional company called “RecurShip,” complete with 500 developers distributed around the world who are responsible for three commits a week – such as updates and optimization – and who are paid $180,000, or $1.50 per developer minute.

Duration – the time from when a workflow is triggered until all steps are complete – is an important metric. The data in the report indicates that the median duration time is 2 minutes and 43 seconds, with 25% of teams completing their workflows in less than 38 seconds. The other 75% completes it in 8 minutes or less. The fastest times could be the result of lighter workflows with fewer validation steps or other factors. However, some teams had duration times of 25 minutes or more.

Looking at RecurShip, optimizing workflow and reducing duration time from 20 minutes to 10 minutes, the company would recover 750,000 minutes of developer time a year, which at $1.50 per minute would translate into $1.1 million in annual productivity gains.

Another example is throughput, which measures the average number of workflow runs on a project per day. In the projects running on CircleCI, the median throughput is 1.64 runs a day, with 25% of developers reaching 2.7. Among the 20 most productive organizations, the daily throughput reaches 3,762, creating a “delta between average and top performers [that] suggests significant untapped potential in most software teams,” the report says.

At RecurShip, adding 25 engineers with the job of removing friction from development pipelines to its 500 developers (running 300 workloads per day, or 0.6 per developer) would boost throughput to 394 daily workflows, or 0.75 per developer per day.

The company would see a 25% improvement in productivity per developer based on a 5% increase in personnel and sees productivity gains that are equal to adding 156 full-time developers. This investment in optimization and the improvement in developer experience delivers $28.4 million back to the company in productivity gains.

AI Will Bring Promise, Change

The need for speed will only increase as AI and automation become more commonplace. Engineers and developers know how the new tooling used for building software will change how they operate, creating a lot of uncertainty. Given that, the job for engineering leaders is less about predicting the AI future and more about creating teams that can adjust to the changes when they come.

“I need to be able to adapt to that very quickly, so it helps us think about how we build, how we ship the kinds of systems that we build,” Zuber said. “Everyone is facing that right now.

“It’s much more interesting to think about how would I prepare myself for a set of possible outcomes and be prepared to adapt and adjust, because those are the folks that are really going to succeed. Succeeding in uncertainty, its agility, and if you’re not thinking about how we deliver software effectively and how we collect that feedback, then you’re not really setting yourself up for that level of change.”

Strategies, Big and Small

In the meantime, CircleCI mapped out steps that companies can take now to become faster and more efficient in developing and delivering software. Smaller companies need to build resilient pipelines that can run autonomously when their teams are working on other tasks to reduce the burden of debugging software by investing in automated testing.

For midsize companies, the job is keeping up the quick recovery times, standardizing processes, and replicating across the organization the practices used by high-throughput teams. Larger organizations need to streamline change management and approval flows and balance build speed optimization with processes that scale across business units, the report’s authors wrote.

(Team) Size Matters

In addition, all companies should be aware of the size of their teams.

“Smaller teams tend to move quickly with less coordination overhead, while larger teams must navigate dependencies, standardization, and process complexity as they scale,” they wrote. “Understanding these trade-offs is key to optimizing development velocity and reliability.”

Recommended strategies include creating autonomous development teams of five to 10 engineers, while companies scaling beyond 100 developers can use standardized tools and processes to keep their fast MTTR.

Companies with 50 to 100 engineers tend to hit throughput and other barriers when they scale to this size and then get their groove back as they grow beyond those numbers, the CTO said. Given that, these companies need to invest in automation to help drive higher throughput and push past what CircleCi calls “complexity barriers” common at this size.

The post The ROI of Speed: How Fast Code Delivery Saves Millions appeared first on The New Stack.

AI Coding Trends: Developer Tools To Watch in 2025

Richard MacManus — Tue, 18 Mar 2025 19:00:57 +0000

With virtually every coding tool now infused with AI, developers are increasingly asking themselves: what type of coding tool should be my default now? Do I need one of those new-fangled “agentic IDEs” or is Visual Studio Code good enough? What role does the cloud play in AI tooling?

To answer these questions, I’ve surveyed the dev tool landscape and picked out some trends for developers to watch. To start with, let’s assess the main options for developers when it comes to adapting to AI:

Your normal IDE, augmented with an AI assistant plugin: The most common option seems to be to stick with your existing IDE (like VS Code, JetBrains, or Neovim) while integrating an AI assistant such as GitHub Copilot, Google’s Gemini Code Assistant, or JetBrains AI. (Although, if you’re a Visual Studio Code user, the question becomes: how do you stop different AI plug-ins talking over each other?)
Keeping AI separate from your editor: If you prefer a clean, distraction-free code editor, you might opt to use chatbots like ChatGPT or Anthropic’s Claude 3.7 Sonnet externally as a coding assistant, rather than embedding AI directly into your workflow.
Switching to an “agentic IDE”: Tools like Bolt, Cursor and Windsurf promise to do most of the coding for you, acting more like an AI-powered co-developer rather than a simple autocomplete assistant. These environments aim to reduce manual coding by taking high-level instructions and generating a full application. (see also: vibe coding)
Relying on an AI-native cloud IDE: Instead of a traditional desktop IDE, some developers are embracing options like Replit (Ghostwriter), Amazon CodeCatalyst, or Google Cloud Workstations, where AI is deeply integrated into a cloud-based development environment.
Using an AI-powered terminal: If you live in the command line, you might prefer an AI-enhanced terminal like Warp or Ghostty, or even AI-driven CLI tools like ShellGPT or Copilot CLI, which generate commands and scripts on the fly.
Going fully AI-free: A shrinking but passionate group of developers are choosing to avoid AI-assisted coding altogether, preferring to write code the old-fashioned way (which, to be fair, has worked just fine for decades for knowledgeable developers).

GitHub Copilot: options to install in “your favorite code editor” as a plugin.

One Dev Tool To Rule Them All

What some of these options have in common is that their representatives think they will be the only AI-assisted coding tool a developer will need.

I recently spoke to Warp’s founder and CEO, Zach Lloyd, about the company’s new Windows version of its terminal app. We also discussed how Warp is positioning itself among the raft of AI coding tools that have hit the market recently. His reply made clear that he thinks terminal apps like Warp are capable of much more than command-line interaction now.

“Warp is a highly differentiated, opinionated approach to the next generation of AI tooling,” he told me. “You know, today we’re a terminal — today that is what we are. But the vision that we have […] is that we believe that the command line is a great place for developers to do anything using AI. It’s like this really low-level interface that has this vast array of tools available to it. The tools are already written, for the most part, to be usable by people and by machines, like CLIs [Command Line Interface] are for both. So we feel like it’s this awesome, differentiated non ‘VS Code clone’ approach to the future of AI.”

Windows UI for Warp Drive; image via Warp

Probably the most prominent fork of VS Code — or “clone” to use Lloyd’s term — is Cursor. Unlike VS Code itself, which relies on plugins like GitHub Copilot or Gemma Code Assistant for AI features, Cursor embeds AI capabilities directly into the development environment. And similar to what Warp is aiming to do, with Cursor you can do almost all developer tasks inside the app.

As The New Stack’s Janakiram MSV explained last September:

“What I absolutely loved about Cursor is the ability to deal with the end-to-end application life cycle without having to leave the development environment. While features like Composer and Tab tackle code generation, the chat window within the terminal is a real game changer. It can generate and run shell scripts, Docker and Kubernetes commands, and any other CLI-related tools.”

This “one app to rule them all” approach — a vision being pursued by Warp, Cursor, and several other coding apps — is only possible because of the ever-increasing reasoning abilities of the latest large language models.

The Ideal Prototyping Tool

Not all apps are trying to be all things to all (AI) developers.

Bolt is a browser-based app that leverages StackBlitz’s proprietary WebContainers technology. But when I spoke to its CEO Eric Simons, he acknowledged that many developers will still want to use an IDE like VS Code or any of the JetBrains options.

Bolt screenshot.

Firstly, it’s worth noting that the majority of Bolt’s users are not professional developers — Simons estimated that 60-70% of Bolt users are “non-technical.” But for the professional developers who do use the product, Bolt “is not a wholesale replacement […] and that’s not what we intend to be either,” he said. Instead, pro devs tend to use Bolt as a kind of prototyping aid.

“A lot of the companies that we’re now selling to, they’re using this as a kind of a replacement for Figma, almost,” Simons told me. “Where, instead of doing all of your prototypes and stuff as designs in Figma, like, let’s just get the components made in Figma, and then drop them into Bolt […] as code, and then just prompt it to make apps for us. It’s way faster to have the AI just go and build this stuff, and then what you get is real code.”

It’s worth mentioning Google and Microsoft in this ‘prototyping’ category, because both companies are aiming to expand the developer market well beyond professional developers. Not to mention that both have the ability to massively scale their AI coding tools. As Google’s Ryan J. Salva told me in a recent interview:

“We’re laying the foundation for, how do we get just the basic tools and the IDEs out to as many people as possible, with really generous usage limits, and with effectively no requirement other than an email address.”

Gemini Code Assist; image via Google.

Cloud Native Tooling for AI

Another trend we’re seeing in AI development is, for want of a better phrase, the Cloud-Native-ication of AI tools. For instance, the creator of Docker Compose, Ben Firshman, has created a technology that wraps AI models into containers — it’s called Cog and Fishman describes it as “Docker for machine learning.” On the back of that he co-founded a company called Replicate, which offers a cloud platform to share these models.

We’ve also seen various serverless platforms emerge that specialize in AI. Recently I profiled Modal, which specializes in providing serverless infrastructure tailored for compute-heavy and long-running AI, ML, and data workflows. It’s aimed squarely at developers who don’t want to deal with the massive computing demands of LLMs and other AI infrastructure.

Modal playground.

Conclusion

It feels like we’re at an inflection point with AI coding tools. While I expect most experienced developers will stick with their favorite full-fledged IDE (why wouldn’t you if you can simply add an AI plugin to get that functionality), it’s junior developers and the next wave of developers that we should watch.

Many of recent or new entrants to the developer job market will likely choose a tool like Cursor or Warp as their default app, and run with it. They’re also more likely to pick up tools like Bolt and Windsurf to prototype their apps. We’ll continue to track these AI dev tool trends here on The New Stack over the rest of 2025.

The post AI Coding Trends: Developer Tools To Watch in 2025 appeared first on The New Stack.

Coming Soon: New Ebook on Cloud Sustainability

Vicki Walker — Tue, 18 Mar 2025 18:00:52 +0000

As the effects of climate change escalate around the world, what role do software developers play in protecting the environment? More than you might think, particularly when it comes to the choices you make for your cloud infrastructure.

According to Charles Humble, one of The New Stack’s top contributors, sustainability should join cost, performance, security, regulatory concerns and reliability as one of the top-level considerations when optimizing your cloud workloads.

And his forthcoming ebook, “The Developer’s Guide to Cloud Infrastructure, Efficiency and Sustainability,” written in collaboration with AMD, Google and The New Stack, will dive deep into these topics. He’ll share his research and expertise on sustainable software, tools for measuring carbon emissions, ways to reduce the environmental impact of your software infrastructure, and the key trends that you need to consider going forward.

Be among the first to preregister and get early access to this ebook before its official release on April 1, 2025.

What You’ll Learn

By reading “The Developer’s Guide to Cloud Infrastructure, Efficiency and Sustainability,” you will learn:

How developers are helping build a sustainable future.
Developer best practices for choosing a cloud VM.
The positive and negative environmental impact of cloud infrastructure.
The factors that can make cloud computing “green.”
How to measure carbon emissions from your cloud infrastructure.

Specifically, the book’s contents include:

What is sustainable software? Three definitions of “sustainability” that engineers need to know.
Why carbon emissions matter: The impact of energy proportionality.
IT’s role in sustainability: Defining key concepts, including carbon scopes, carbon neutrality and carbon zero.
The developer’s role in sustainability: Coding vs. operational efficiency and ways to improve them.
Unpacking CPU performance in the cloud: How to choose the best virtual machine (VM) for your workload.
Measuring carbon emissions: Tools for assessing hardware utilization, cloud costs and carbon footprint.
Taking steps to reduce emissions: Implementing strategies for reducing carbon emissions.
Putting it all together: Finding allies inside and outside your organization.
Trends and future considerations: Looking ahead at renewable energy certificates, carbon taxation and water usage.

Don’t miss this engaging, informative and accessible look at the impact of cloud computing on carbon emissions, and how the choices you make affect your environmental footprint. Preregister today!

The post Coming Soon: New Ebook on Cloud Sustainability appeared first on The New Stack.

AI Agents in Doubt: Reducing Uncertainty in Agentic Workflows

Brian Godsey — Tue, 18 Mar 2025 17:00:45 +0000

It’s easy to get excited about the power of AI agents and everything they can accomplish autonomously, saving us considerable amounts of time and effort. State-of-the-art agents are already useful in many ways, and we can imagine even more capabilities that seem just around the corner. Academic research is showing us what is possible, and it’s only a matter of time until these new capabilities appear on the market in software products.

This gap between what’s theoretically possible and what we can successfully put into production comes down to two opposing forces: confidence and uncertainty.

Of all the factors that influence an AI agent’s success — software, data, architecture and more — perhaps one of the most critical but least discussed is uncertainty. Without high confidence that an agent will make good decisions and take an appropriate course of action, we shouldn’t trust it in production. And, this high confidence develops only if the agent is provably effective at dealing with uncertainty in its many forms.

Let’s discuss the concept of uncertainty within agentic workflows and why it is important to explicitly acknowledge and address it. I will describe some strategies that can reduce critical uncertainties and prevent related problems from arising.

What Is Uncertainty in the Context of Agentic Workflows?

Uncertainty, in general, is the condition of not knowing something that would be relevant, whether it be facts from the past/present or something that might happen in the future.

In agentic architectures, uncertainty can arise from ambiguous instructions, missing or unreliable data, limitations in the agent’s ability to reason through complex decisions and for many other reasons. When an AI agent tries to operate in the presence of high uncertainty — and if the agent doesn’t acknowledge and handle it properly — it may struggle to determine the correct course of action, leading to mistakes, inaction or negative outcomes.

In order to handle uncertainties that arise, agents must be able to assess the situation, including all relevant information and resources, at each stage of a task — whether interpreting a request, retrieving necessary information, or making a decision. If there is too much uncertainty to make a decision and take action safely, the agent needs to realize this and take steps to either reduce it — by seeking clarification, checking additional resources or refining its reasoning.

In agentic architectures, uncertainties can fall into the following categories:

Uncertainty in goals – Goals may be ambiguous or unclear.
Uncertainty in resources – Some necessary resources may not be available or reliable.
Uncertainty around unknowable information – Relevant future events and other facts that can’t be known (yet) for any other reason.
Uncertainty in reasoning – Complex or difficult reasoning tasks may be too tough to solve with certainty.

When an issue around uncertainty presents itself, it is helpful to determine which type of uncertainty is present, because that helps us form a strategy to find a resolution — and it also can help us design an agent to find its own resolution autonomously. Even if an uncertainty doesn’t fit squarely into one category, which is common, noticing which of the categories align most closely with the issue can help devise a strategy to address it.

Uncertainty is not a flaw in AI agents; it’s an inherent part of decision-making in dynamic real-world environments. To build effective AI agents, we don’t need to eliminate uncertainty entirely, but we do need to design systems that can recognize and manage it. Strategies such as confidence estimation, fallback mechanisms, human-in-the-loop intervention, risk-aware reasoning, guardrails and specialization can help reduce uncertainty as well as risk.

Let’s look more closely at several of these strategies with which agentic architectures can improve reliability, ensuring that agents make well-informed decisions and act in ways that align with user expectations.

Strategies to Reduce Uncertainty in Agents

There are many ways to reduce uncertainty in agent workflows. None of them are foolproof, but some can be very effective, depending on the use case. Some helpful strategies for reducing uncertainty are:

Human in the loop: Pass the most difficult cases to a person.
High awareness of uncertainty: The agent realizes it’s uncertain and stops to think a bit more.
Secondary checking: Reconsider a decision, or get a second opinion from another agent.
Specialization: aAgents with expertise make decisions only within their domain.
Menu of possible actions: An agent can perform only a finite set of specific actions.

Human in the Loop

Having human involvement may seem like it contradicts the notion that an agent is an independent actor, but in practice, it is often a reliable way to decrease risk without greatly decreasing workflow efficiency. If the agent can identify when it is most uncertain, it can request that a human review the situation and give input or make a decision, allowing the agent to continue in relative certainty and safety. If the agent is good at identifying the uncertain cases that require human help, and if these situations are not very common, then the whole process becomes only marginally less efficient, with the agent still doing the vast majority of the work.

Having a human in the loop is particularly helpful when there is uncertainty in goals or uncertainty in reasoning. In many ways, humans are still better than AI at navigating nuances in language, intent and reasoning.

In some industries, this approach is already standard practice. Since well before modern AI systems, for instance, banks and other industries have used automated phone answering systems to provide a menu of items that the caller can select, including a request to speak with an employee of the company. AI agents can take this type of approach to the next level, allowing far more complex and interactive cases to be automated before human involvement is required.

For example, an AI agent for personal banking might manage routine queries from users, but could request human review for complex or critical tasks involving external transfers, large sums of money or payments to new recipients. Ideally, such an AI agent should recognize when uncertainty is high and refer the task to a human rather than making a potentially costly mistake.

NVIDIA has published a thorough article on building a human-in-the-loop AI agent for social media content.

High Awareness of Uncertainty

AI agents operate best when they are aware of their own confidence levels and can adjust their behavior accordingly. When an agent encounters tasks with mixed certainty levels — where some aspects of a decision are highly confident while others are more ambiguous — it should take additional steps to reduce uncertainty or escalate the task to a human for review.

The strategy of maintaining awareness of uncertainty is particularly useful when there is uncertainty in resources or uncertainty in unknowable information. In these cases, we should design the agent to continually ask itself (or other systems) if it has enough information to make a confident decision. When there are signs that information is unreliable or missing, the agent can take action to gather what is needed.

For example, an AI agent designed for legal analysis might use a scoring system to evaluate the certainty of its responses. If a contract analysis tool determines that a particular clause matches a known precedent with high confidence, it can proceed without intervention. However, if the tool is unsure whether a clause introduces new legal risks, it can highlight it for further review rather than presenting a potentially incorrect interpretation as fact.

There’s ongoing academic research around this strategy, with some specific implementations called “uncertainty quantification” and “uncertainty-guided planning.”

Secondary Checking

One of the paradoxes of large language models (LLMs) is that while they sometimes generate incorrect or hallucinated responses, they can also correct themselves when prompted to verify their own output. A secondary checking mechanism — whether self-verification or cross-checking with another model — can significantly reduce uncertainty in AI-generated responses.

This strategy is particularly effective when dealing with uncertainty in resources or uncertainty in reasoning. With these types of uncertainty, the agent can review the available resources or its own reasoning process a second time, with additional scrutiny or focus as needed, to confirm that the subsequent conclusions hold up.

For example, an AI agent for customer support using an LLM might generate an initial response based on a user inquiry. Before presenting the answer, the agent could prompt itself to double-check the response against its documentation using a retrieval-augmented generation (RAG) system. If discrepancies are detected, the response could be refined or flagged for human review.

LangChain has a nice introduction to some popular implementations of “reflection agents,” which are examples of this strategy.

Specialization

A reliable way to prevent AI agents from making poor decisions is to ensure they operate strictly within their intended domain. An agent designed for one purpose shouldn’t be making decisions in areas where it lacks expertise or relevant context.

Specialization is particularly useful when there’s uncertainty in goals or uncertainty in reasoning. If the goals or the reasoning process don’t seem to be familiar and aligned with the specialization of the agentic workflow, the agent can refuse or defer action until a more clearly appropriate situation appears.

For example, a personal assistant AI focused on scheduling and calendar management should not be making decisions about email content beyond scheduling-related communication. Without proper restrictions, a scheduling AI might attempt to respond to emails outside its scope or make commitments it shouldn’t be authorized to handle. Guardrails can be implemented through well-structured prompt templates, explicit model instructions or additional verification layers that prevent the agentfrom stepping outside its designed function.

Specialization is typically achieved through some combination of prompt engineering, fine-tuning or model augmentations like retrieval-augmented generation (RAG), and it is also an ongoing area of academic research, such as this paper on guardrails and off-topic prompt detection.

Menu of Possible Actions

Instead of allowing an AI agent to freely decide how to use tools and APIs in an open-ended manner, restricting it to a set of well-defined actions can improve reliability and safety. This is especially useful when tools and APIs are complex or when improper usage could lead to unintended outcomes.

Similarly to the specialization strategy above, the agent is restricted in some way, but instead of limiting itself to a particular domain of inputs or expertise, here an agent limits its possible outputs or actions. These limits can be like intelligent guardrails that are as restrictive as necessary to balance effectiveness and safety.

A menu of possible actions is a strategy that is particularly effective when there is uncertainty in goals or uncertainty in resources. A menu of actions can act like a filter where, whatever inputs and reasoning processes have already been considered, the final step is to confirm that the action to be taken is well-defined and familiar and that the agent has all of the necessary inputs and resources to take action with that particular menu item.

For example, in software development, AI coding assistants can be restricted to modifying specific file types or suggesting changes, rather than writing or executing code autonomously. Or, in health care, AI agents assisting in diagnostics might be limited to providing reference data and symptom-matching rather than making treatment decisions outright.

Uncertainty in the Future of Agentic Workflows

With agents as well as with humans, uncertainty is a natural part of solving real-world problems. Rather than jumping straight to the “best” known action, agents need to be aware of the level of confidence in their goals, resources, unknowns and reasoning processes — and whenever high uncertainty is present, they should take action to reduce it whenever possible. Or, it is often a good decision to take no action until the situation changes and the level of uncertainty decreases. In either case, the agent needs to be able to recognize and acknowledge the uncertainty to address it in any way.

Agentic systems and workflows should embrace the concept of uncertainty at the design level, even as they attempt to reduce it. Ignoring uncertainty leads to overconfidence, bad decisions and actions with negative consequences. Designing agentic systems to be aware of uncertainty and to handle it well will continue to grow in importance as we allow our agents to do an increasing number of tasks autonomously on our behalf.

We may not know what agents will be able to do in one, five, or 10 years, but we know they will continue to routinely encounter uncertainty in goals, resources, unknowns and reasoning. And AI agents will need to admit when they don’t know enough so they can stop themselves before making bad choices that lead to adverse outcomes.

You can get started building AI agents with Langflow by following this guide.

The post AI Agents in Doubt: Reducing Uncertainty in Agentic Workflows appeared first on The New Stack.

Can You Trust Your Dashboard? The Critical Role of Data Freshness

Shilpa Shastri — Tue, 18 Mar 2025 16:00:56 +0000

Picture this: You’re making a crucial business decision based on your analytics dashboard. But how do you know if those analytics tell a story based on data from this morning, yesterday, or last week? In today’s data-driven world, the age of your data matters just as much as its accuracy. This is where data freshness comes in — it is no longer just a technical metric but a vital sign of your data’s relevance and reliability.

While the allure of real-time data excites users, the truth is more nuanced. Just as in a stock trading platform where fractions of seconds matter, tracking market positions in real-time while their annual compliance reporting updates monthly. Or a retail giant where Black Friday inventory levels refresh every few minutes, but their real estate expansion plans coast along with quarterly updates. In the world of data, not everything needs to move at the speed of light — and that’s precisely the point.

Data freshness isn’t about making everything instant. It’s about understanding that different business needs demand different frequencies of updates. When a trading algorithm needs to reflect data by the second, it can’t rely on old data. Yet, updating their employee benefits documentation weekly is sufficient for that same company. This selective approach to data freshness isn’t just practical — it’s essential for building robust and efficient systems.

In our organization, the wake-up call came during a customer escalation meeting. “I just made a major infrastructure change yesterday,” our customer explained, frustration evident in their voice, “but I can’t tell if these dashboard numbers reflect that change or if I’m looking at last week’s data.”

As our cloud analytics platform grew, we faced an uncomfortable truth: while we had built powerful dashboards for tracking cloud spending and usage, customers couldn’t trust what they were seeing. Not because the data was wrong but because they did not know how fresh it was.

The impact was widespread — technical teams hesitated to make time-sensitive decisions, finance teams questioned their cloud cost visibility, and our support teams struggled to answer a seemingly simple question: “How current is this data?”

This version preserves the key emotional resonance of customer frustration, maintains the core problem statement, and illustrates the broad impact while being more economical with words. Would you like me to adjust it further?

This felt akin to a weather app showing you the temperature but without telling you when that temperature was last updated, leaving you uncertain whether to grab a jacket. Our customers made significant business decisions based on our insights, but this uncertainty could erode trust.

We could process vast amounts of cloud usage data, transform it into actionable insights, and present it — but without revealing the freshness of that data, we were leaving our customers in the dark about a crucial dimension of data relevance and reliability. It became clear that transparency into data freshness was essential for rebuilding trust and enabling genuinely informed decision-making.

How We Solved the Problem: ‘When Was This Updated?’

Our Design Philosophy

In tackling the opacity of data pipelines and its ripple effects on customer experience, we needed more than a quick fix — we needed a comprehensive framework for lasting transparency. Three core principles guided our approach:

Customer-Centric Transparency: Like a well-designed GPS, our solution needed to show users exactly where their data was (e.g., in the ingestion pipeline or data platform) without overwhelming them with technical complexity. Each status update focused on answering the critical question: “When will my data be ready?”

Role-Based Access: The requirements of an internal data engineer debugging a pipeline bottleneck differ vastly from those of a customer awaiting their dashboard refresh. Our design provided depth where necessary, ensuring each user group could effectively act on the information presented.

A Framework for the Technical Solution

Our answer materialized as the Data Freshness Visibility Framework — an integrated system bringing unprecedented clarity to pipeline operations. Here’s how we architected each component:

Pipeline Instrumentation: We approached monitoring like a nervous system, with our data pipeline, data, and compute platforms at critical junctures capturing essential metadata for the data freshness metrics. These asynchronous observers ensure minimal impact on pipeline throughput.

Status Management Service: At the heart of our framework lies a purpose-built service that aggregates pipeline status information. This service maintains a reliable record of events while serving real-time updates, and it is optimized for high-volume queries through aggressive caching strategies.

Contextual Interfaces: We developed targeted interfaces for different user groups:

Customer View: A streamlined interface showing data progress through the ingestion pipeline, data platform, and compute platforms, with precise timestamps and estimated completion times.

Internal Dashboard: A comprehensive view featuring detailed performance metrics and error traces for rapid problem resolution.

Proactive Notification Engine: Our intelligent notification system understands context, considering factors like historical processing times and current pipeline load to provide accurate estimates and proactive alerts about potential delays.

Continuous Improvement Pipeline: Structured feedback channels for customers and internal teams ensure our solution evolves with user needs, driving improvements from granular status updates to smarter alerting thresholds.

We’ve seen tangible results a few months after rolling out the solution. The impact of these changes exceeded our expectations. Within weeks of rolling out the Data Freshness Framework, we saw a 60% reduction in data freshness-related support tickets. Our internal support teams transformed from fielding anxious “Where’s my data?” queries to providing precise pipeline insights and accurate ETAs. Even more significantly, we watched customers leveraging data freshness insights to make real-time cost optimization decisions. We caught and addressed spend anomalies by the day instead of weeks, preventing cloud costs by tens of thousands!

In building the solution, we learned that data freshness isn’t just a technical metric — it’s a fundamental component of data trust. As organizations increasingly rely on data-driven decisions, the ability to understand and trust the timeliness of that data becomes crucial. In doing so, we’ve helped bridge the gap, empowering our customers to act clearly.

The post Can You Trust Your Dashboard? The Critical Role of Data Freshness appeared first on The New Stack.

Let Productivity Metrics and DevEx Drive Each Other

Steve Rodda — Tue, 18 Mar 2025 15:00:19 +0000

Developer productivity metrics have long been a point of contention in tech. Measuring productivity is difficult and improving it even more so. But no company can succeed without a thoughtful approach to productivity. Getting it wrong has myriad knock-on effects, not least of which is that it’s bad for employee well-being.

One argument is that when we focus on developer experience (DevEx), developer productivity (DevProd) will follow, but what measurable changes does that involve?

Metrics are only useful if they help us work toward bigger goals. The right metrics can help you identify what’s holding your developers back. When you remove those roadblocks, ‌metrics improve, but more importantly, your developers are happier and more effective.

The answer to the debate isn’t to choose one side or the other; it’s to recognize that both are means toward the same end. Choose the right metrics, make the right DevEx investments and better DevProd — and, crucially, better products — will follow.

How Are Teams Measuring Developer Productivity?

Most productivity metrics focus on lines of code (LoC) and completed tasks, but a developer’s work is much more than this. It involves enjoyable work, like building environments and learning the latest technology, but it also involves frustrations, the most significant among them being technical debt and tech stack complexity.

Beyond that, developers’ work styles and responsibilities can vary immensely, even for developers of comparable rank who work at the same company. The way they choose to tackle problems can also be very different.

The long-term value a developer provides comes from their ability to work through challenges, build on organizational knowledge and use their individual expertise to solve problems creatively. Your senior developers may spend far more time thinking and researching than writing code, especially where application performance is a priority.

How do you measure developer productivity given those challenges?

If you pose this question to a dozen tech leaders, you’ll likely get as many different answers. Factors that influence the choice include how big your team is, how diverse their assignments are, and how much time and tooling you have available. Other than LoC, companies have traditionally focused on:

Coding speed
Number of commits and pull requests
Cycle time (how many hours to go from pull request to production)
Deployment frequency per service
Mean time to recovery (MTTR)

Two popular frameworks build on these metrics to measure fundamental aspects of developer productivity: DORA and SPACE. Both focus on code, with DORA prioritizing outcomes and SPACE looking at optimizing development teams.

McKinsey recently set off a new round of arguments on DevPod when it outlined its own set of metrics to complement DORA and SPACE, claiming an “end-to-end view of software developer productivity.”

These “opportunity-focused metrics” include a Developer Velocity Index and a talent capability score. The idea is to optimize in-demand talent by tracking developers’ time in both the inner and outer loop.

Other teams have opted to quantify “soft” factors, focusing on project hygiene and motivators to measure employee well-being and team morale.

Consider the diversity of responsibilities under the “developer” job title and how many obstacles can get in the way. It’s difficult to see which metrics could accurately measure the contributions of any significant number of devs.

So, are we even asking the right question?

Make Changes Where You Know They’ll Matter

Metrics might be a part of team member evaluation, but that shouldn’t be their primary purpose. They’re a tool to help you reach the real goal: to improve processes so you can deliver better products.

The question isn’t which metrics to use to measure your developers’ productivity; it’s how metrics can reveal what’s getting in the way of your developers doing their work. To do that, you must tie your metrics to the work environment where your developers spend their time: the inner dev loop.

The Inner Dev Loop Is Evolving

McKinsey’s approach hits on a crucial fact: The inner dev loop is where developer productivity lives. Unfortunately, we’ve lost sight of its value. Thanks to containerization, the inner dev loop has lengthened, and the tasks of the outer dev loop require developers to spend even less time in this focused zone.

Developers’ work has changed, and it will change more with the rise of AI tools like GitHub Copilot and other AI-driven code-gen tools. As the pressure on technical organizations increases, we must make developers’ work environments as efficient as possible. Metrics can help us do that.

First, we must shorten the inner dev loop and reduce the container tax. If you look closely at what sucks up the most time in your developers’ days, you’ll find factors like incompatible local and deployment environments, slow container build times and CI/CD process failures. The data you uncover can direct you toward the right tools and process changes.

Second, we must take a broader view of what makes an effective developer. As AI takes on more of the rote tasks of the old inner dev loop, the developers who contribute the most value won’t be the ones who write the most lines of code. They’ll be the ones taking advantage of the smoother inner dev loop to test out bigger, bolder ideas.

It’s about time we started thinking of developer productivity in terms of business goals and design innovations, not lines of code or commit histories.

Can Productivity Be a Feel-Good Goal?

Productivity shouldn’t be a dirty word in tech — it feels good to be productive. Measuring productivity is pointless if you do it in isolation and worse if it limits creative problem-solving. But there are many ways it can make a positive difference:

It helps ensure alignment: Setting metrics allows development efforts to align with business expectations. Metrics give developers a better understanding of what’s expected of them. Highlighting specific goals and standards ensures you spend time and resources wisely.
It accelerates development: You can use metrics to evaluate the entire development workflow, identifying inefficiencies and bottlenecks. Highlighting delays in the pipeline facilitates streamlining processes and speeds up development, which is crucial for improving the inner dev loop experience.
It motivates developers: People like to know that their contributions make a difference. Setting metrics lets them track their progress toward specific goals, get recognition and substantiate their ideas for process improvements.

The greatest value of metrics is that they can help you identify processes and environments that aren’t working. It’s not about measuring people: No one wants to be valued only for their output, and managers should take it in good faith that most developers want to be efficient and make meaningful contributions. Don’t just look at the metrics; look at the whole picture. But, the metrics can help you diagnose where bigger issues might be hiding.

The fact remains that being productive at work is necessary. Getting things done and seeing the proof before you is also gratifying. Metrics can and should serve that purpose: finding the obstacles that keep people from doing creative, innovative work.

Improving Internal DevEx Feeds Productivity

Companies don’t make money by writing the most code. Developers don’t build good applications that way either. You make money by building great products efficiently. To do that, you must retain your best developers and empower them to do their best work.

This is an argument for building better DevEx, but it’s not an argument for giving up metrics. You need one to get the other.

Internal DevEx Focuses on the Inner Dev Loop

Often, we think about DevEx in terms of product design — what’s it like for developers to use our product? It’s critical to think about the experience of your developers, too — what’s it like for the developers who build our product?

If you think your developers could be more productive, ask yourself: What’s preventing them from being as productive as possible?

No matter how much developers want to improve their productive capacity, they face many obstacles outside of their control. Metrics can help you spot ‌issues, but making life better for your developers means acting on the knowledge you gain. Considering the friction that typically occurs throughout the inner dev loop, it’s a good place to start if you see problems in your metrics.

What Metrics Matter for Internal DevEx?

Too many companies neglect internal DevEx, but it can pay big dividends in productivity gains. In the short term, developers get more done and code quality is higher. In the long term, developers stay engaged at work and avoid burnout. However, many leaders struggle with how to measure DevEx or how to improve it.

You can start with a few key DevProd metrics:

Environment build time and crash/error rates: Do developers lose valuable time waiting for cloud environments to build? Do they depend on other teams to set up development and test environments, or can they do it on their own? How much time do they spend debugging environment setups? Problems here also affect DevOps and QA teams, so look for tools that decouple production and development environments.
CI/CD pipeline delays: Do legacy tools or incompatible processes create duplicate work or add manual steps? Modernize and streamline your tool stack to eliminate tedious and repetitive tasks, such as creating standardized specs, API mocks and boilerplate code. Automating repetitive tasks frees developers to focus on more creative and challenging work.
Time in and length of the inner dev loop: Developers need tools and environments that reduce the container tax and shorten the build stage of the inner dev loop. They need more time on focused work and fewer delays to see their ideas in action.
Time spent waiting for code reviews or other collaboration requests: Provide tools that allow teams to communicate and collaborate more effectively. Encourage teams to share knowledge and engage in problem-solving, which fosters teamwork.

Productivity obstacles are also morale killers. To get the most out of your metrics and your DevEx investments, focus on the end goal: empowering your developers to build better products with fewer frustrations.

Build a More Resilient Team for Whatever Comes Next

We need a paradigm shift in how the tech industry defines and approaches productivity for developers. You’ve recruited talented developers for your projects, so let them see the big picture and think beyond code execution to find better ways to solve problems.

The work of development is changing, but the most productive developers will always be the ones who see their work as part of a larger ecosystem. Your best people can help you weather the changes ahead. They have crucial organizational knowledge and a deeper understanding of the real-world problems your company solves. Treating them like machines and bogging them down with outdated metrics won’t serve anyone’s needs.

Instead, invest in solutions that make work more enjoyable for your developers and motivate them to keep improving. Key in on the metrics that relate most closely to what’s happening in the inner dev loop, and work to make the time developers spend there as valuable as possible.

The post Let Productivity Metrics and DevEx Drive Each Other appeared first on The New Stack.

Microsoft TypeScript Devs Explain Why They Chose Go Over Rust, C#

David Cassel — Tue, 18 Mar 2025 14:00:42 +0000

Last week Microsoft announced that TypeScript’s compiler is being ported to a new programming language — Go. And then a round of discussions kicked off online… Everyone’s fascinated by their high-stakes decision, and a few commenters couldn’t resist second-guessing the development team’s choice of a new programming language.

Why Go? Why not Microsoft’s own C# or the hot language of the day, Rust?

TypeScript‘s developers soon found themselves in kind of ad hoc colloquium taking place on GitHub, Reddit, YouTube, and Hacker News, explaining their decision-making process and all the clear and undeniable advantages of Go. Amid all the back-and-forth discussion, together they offered a surprisingly educational exploration of the various merits of several different programming languages.

And along the way, they even delivered a detailed explanation for why they’ve decided to port TypeScript’s compiler to Go…

But Why Not C#?

TypeScript’s current compiler is written in the TypeScript language. But on Reddit there was a funny yet irrefutable comeback when one commenter suggested that C# “has almost all you need” for rewriting TypeScript code.

“Well you can tell that to the guy who created Typescript and C#, but he disagrees with you.”

Indeed, it’s a question Anders Hejlsberg addressed during a video announcing the move. “Some of you might ask, ‘Well why not my favorite language? Why not C#? Why not Rust? Why not C++?”

Hejlsberg answered that Go was “the lowest-level language we can get to that gives us full, optimized, native-code support on all platforms, great control over data layout, the ability to have cyclic data structures and so forth. It gives you automatic memory management with a garbage collector and great access to concurrency.”

Hejlsberg also addressed the question in a special Zoom interview for the YouTube channel of a monthly TypeScript Meetup in Ann Arbor, Michigan. Hejlsberg described C# as “bytecode-first, if you will.” Plus, when it comes to performance, C#’s ahead-of-time compilation isn’t available on all platforms, and “it doesn’t have a decade or more of hardening… ”

“And then I think Go has a little more expressiveness when it comes to data structure layout and inline structs and so forth.”

But there was also something unique about Microsoft’s original TypeScript codebase for its compiler. While C# is an object-oriented language, TypeScript uses “very few classes,” Hejlsberg said in the Zoom interview. “In fact, the core compiler doesn’t use classes at all…

“Go is functions and data structures, where C# is heavily object-oriented programming (OOP)-oriented. And we would sort of have to switch to an OOP paradigm to move to C#… There’s just more friction in that transition than there is in the transition to Go.”

On Reddit one commenter even collected all the official answers from Microsoft into a handy chart — that Go had a well-tested native-first option supporting all their desired platforms (unlike C#).

Why Not Rust?

In a Reddit comment that received 1,305 upvotes, TypeScript development lead Ryan Cavanaugh acknowledged the groundswell of curiosity. “We definitely knew when choosing Go that there were going to be people questioning why we didn’t choose Rust (or others).

“It’s a good question because Rust is an excellent language, and barring other constraints, is a strong first choice when writing new native code.”

Both Rust and Go are good at representing data, have “excellent” code generation tools, and perform well on single-core systems, Cavanaugh wrote. “In our opinion, Rust succeeds wildly at its design goals, but ‘is straightforward to port to Rust from this particular JavaScript codebase’ is very rationally not one of its design goals.

“It’s not one of Go’s either, but in our case given the way we’ve written the code so far, it does turn out to be pretty good at it.”

Cavanaugh shared some insider info: that they did try Rust. But they’d wanted the new codebase to be “algorithmically similar to the current one,” and Rust just wasn’t a fit. “We tried tons of approaches to get to a representation that would have made that port approach tractable in Rust, but all of them either had unacceptable trade-offs (performance, ergonomics, etc.) or devolved into ‘write your own Garbage Collection’-style strategies. Some of them came close, but often required dropping into lots of unsafe code…”

Cavanaugh explains in the project’s FAQ on GitHub that they’d also tried other languages too, and even “did deep investigations into the approaches used by existing native TypeScript parsers like swc, oxc, and esbuild.” Cavanaugh says they reached two important conclusions.

“Many languages would be suitable in a ground-up rewrite situation.”
“Go did the best when considering multiple criteria that are particular to this situation…”

On Reddit someone pointed out Go gives “more fine-grained control over memory” — and Ryan Cavanaugh jumped in to agree that Go “has excellent control” over memory allocation. Bypassing Go’s built-in allocator with a pool allocator “is extremely straightforward (and easy to experiment with without changing downstream usage).”

In comparison, Cavanaugh adds, “Rust has better control of ‘When do you free memory,’ but in a type checker there is almost nothing you can free until you’re done doing the entire batch, so you don’t really gain anything over a GC model in this scenario.”

In the FAQ Cavanaugh also applauds Go’s control of memory layout — and that it does all this “without requiring that the entire codebase continually concern itself with memory management.”

So when it comes to Rust, Cavanaugh explained on Reddit that they’d face two options:

“Do a complete from-scratch rewrite in Rust, which could take years and yield an incompatible version of TypeScript that no one could actually use…”
“Just do a port in Go and get something usable in a year or so and have something that’s extremely compatible in terms of semantics and extremely competitive in terms of performance.”

In a comment on Hacker News, Cavanaugh reminded readers that in 2022 the SWC (Speedy Web Compiler) project also chose Go for a port of the TypeScript type checker tsc. (Project founder DongYoon Kang noted that tsc “uses a lot of shared mutability and many parts depend on garbage collection. Even though I’m an advocate and believer in Rust, it doesn’t feel like the right tool for the job here.”)

“kdy1 definitely hit on the right general approach with his initial Go port,” Cavanaugh said in a comment on Reddit.

Pros and Cons

In the Zoom interview Anders admits that TypeScript has a “much richer” type system than what’s available in Go. But on the other hand, Go “does actually have excellent support for bit-fiddling and packing flags into ints. And in fact it has much, much, much better support than JavaScript for all of the various data types… You can have bytes and short ints and ints and 64-bit ints and what have you, both signed and unsigned… In JavaScript, everything is a floating point number. Period.” He laughs. “I mean, you want to represent true or false? Yeah, that’s eight bytes for you right there…”

Hejlsberg says there’s more than just making use of all the bits in Go. “We can also lay them out as structs — you know, inline, in arrays. And it shows! Our memory consumption is roughly half of what the old compiler was… If you can condense your data structures, you’re going to go faster.”

And here Hejlsberg paused to reflect on what a long, strange trip it’s been, saying he’s often laughed about what he would’ve said if if people had told him, ‘Anders, you’ll be writing compilers in JavaScript for a decade’…

‘You are nuts.’

“JavaScript was never really intended to be the language for compute-intensive, system-level workloads…” he says. “Whereas Go was precisely intended to be that… Go is a system-level tool, and we are a systems-level program.”

Hejlsberg also found himself weighing in on the ‘Why Go’ discussion on GitHub, emphasizing that the decision to use Go “underscores our commitment to pragmatic engineering choices.”

“At Microsoft, we leverage multiple programming languages including C#, Go, Java, Rust, C++, TypeScript, and others, each chosen carefully based on technical suitability and team productivity.”

Celebrate the Strength

As the original designer of C#, Hejlsberg must’ve taken some pride in pointing out that “C# still happens to be the most popular language internally, by far.” Further down, he even emphasized Microsoft remains invested in C# and .NET “due to their unmatched productivity, robust ecosystem, and strong scalability.”

And he said they’d even tried a C# prototype for TypeScript’s compiler, but “Go emerged as the optimal choice, providing excellent ergonomics for tree traversal, ease of memory allocation, and a code structure that closely mirrors the existing compiler, enabling easier maintenance and compatibility.”

Starting from scratch would’ve been an entirely different question, Hejlsberg adds, but “this was not a green field — it’s a port of an existing codebase with 100 man-years of investment…” With a codebase that was “all functions and data structures” — with no classes — Go just turned out to be “more one-to-one in its mapping… Idiomatic Go looked just like our existing codebase, so the port was greatly simplified.”

Hejlsberg used the moment to make the case that “at Microsoft, we celebrate the strength that comes from diversity in programming languages.”

And his comment also closed with some historical perspective. “Let’s be real. Microsoft using Go to write a compiler for TypeScript wouldn’t have been possible or conceivable in years past.” But “over the last few decades, we’ve seen Microsoft’s strong and ongoing commitment to open-source software, prioritizing developer productivity and community collaboration above all.” Today, Hejlsberg argued, Microsoft sought to “empower developers with the best tools available, unencumbered by internal politics or narrow constraints.

“This freedom to choose the right tool for each specific job ultimately benefits the entire developer community, driving innovation, efficiency, and improved outcomes. And you can’t argue with a 10x outcome!”

The post Microsoft TypeScript Devs Explain Why They Chose Go Over Rust, C# appeared first on The New Stack.

Report: OpenSearch Bests ElasticSearch at Vector Modeling

Jelani Harper — Tue, 18 Mar 2025 13:00:20 +0000

A recent research report from analysis firm Trail of Bits highlights some of the key differences — representing critical considerations for contemporary information retrieval — between OpenSearch and Elasticsearch. OpenSearch and the Open Search Project were created by Amazon; OpenSearch’s search and analytics platform was forked from Elasticsearch.

The offerings were evaluated with the OpenSearch Benchmark, which compares solutions according to various workloads. The report indicates that OpenSearch v2.17.1 (the latest version at the time the research was performed) was 11 percent faster on the Vectorsearch workload than ElasticSearch v8.15.4.

It also reveals that OpenSearch was 1.6x faster on the Big5 workload. These results were found when aggregating the geometric mean of each solution’s queries. Both platforms have since been updated to other versions.

Trail of Bits chose to spotlight the results of these workloads in a recent blog partly because of their meaningfulness to the enterprise. According to Evan Downing, Trail of Bits senior security engineer, AI/ML, and one of the preparers of the report, “Big5’s kind of your generic workload that will satisfy most users and the Vectorsearch workload will evaluate things that have to do with machine learning and vector embeddings.”

The Vectorsearch workload directly correlates to generative AI applications and applications of vector similarity search. According to Trail of Bits Engineering Director William Woodruff, the Big5 workload involves “things like searching for terms over a product database.”

An examination of the different approaches OpenSearch and Elasticsearch invoke for meeting these workloads, and others in the OpenSearch Benchmark, illustrates some of the most useful capabilities in search today.

Multiple Search Engines

Although the solutions were assessed with the OpenSearch Benchmark, “To my knowledge, OpenSearch Benchmark was forked from the Elasticsearch benchmarking suite,” Downing said. Despite the fact that OpenSearch itself was forked from Elasticsearch, the report indicates that a comparison between the two solutions isn’t apples to apples.

One of chief differences is that, at the time of the research (most of which occurred between September and December of 2024), OpenSearch supported a variety of search engines—including those designed for vector embedding retrieval use cases—while Elasticsearch supported just one, Apache Lucene. OpenSearch users can avail themselves of Lucene, Facebook AI Similarity Search (Faiss), and Non-Metric Space Library (NMSLIB).

This three to one ratio of engines between OpenSearch and Elasticsearch could have impacted OpenSearch’s favorable results in the vectorsearch workload.

Vector Search Algorithms and Quantizations

The various search engines assessed in the benchmark employ different approaches to information retrieval — which is not a monolithic process. According to Downing, Lucene, Faiss, and NMSLIB “support different algorithms for doing vector search and also different quantizations. So basically, you can think of this as a compression for the dataset size and the requirements that are required by the users of these algorithms.”

Quantization techniques are one of the factors that influence the performance of vector search databases. The compression to which Downing referred can impact the cost of using vector search systems, particularly in terms of storage. Although there are a host of differences between these three engines, for the actual benchmark, it was pertinent that “each of those workload engines requires different parameters in order to run, based on different API requirements and other things,” Downing said. “So, when we’re comparing this all on the line, we’re comparing OpenSearch with Lucene, OpenSearch with NMSLIB, OpenSearch with Faiss, and Elasticsearch with Lucene.”

Smart Metadata Filtering

Of the three, Lucene may be the most widely known engine. It’s an open source search engine library operated by the Apache Foundation. For solutions that have multiple engines to choose from, as OpenSearch does, there are some applications for which Lucene is particularly appropriate. “It is my understanding that Lucene is generally a good option for smaller deployments,” Downing commented.

One of the more notable facets of Lucene is its metadata filtering. Typically, users can filter the results of vector database searches based on metadata about the actual embeddings. There are options for filtering metadata before searches and after searches, which can affect the overall quality of the results.

The distinction with Lucene is that it “offers some benefits, as does Faiss, with some things like smart filtering, where the optimum filtering strategy, like pre-filtering, or post-filtering, or exact K-Nearest Neighbors, is automatically applied depending on the different situation,” Downing said. Faiss is a software library (with few third-party dependencies) for vector similarity search and other applications that underpin use cases for generative models. NMSLIB is a vector embedding search library and toolset for assessing similarity search methods. “NMSLIB and Faiss are built mostly for large-scale use cases,” Downing said.

Big5 Workload

The Big5 workload illustrates how far information retrieval has come today. It encompasses aspects of text querying, sorting, date histograms, range queries, and term aggregations. These capabilities are useful for searching through documents, product and customer information, structured and unstructured data, and more.

OpenSearch outperformed Elasticsearch in all Big5 categories and was 16.55 times faster than Elasticsearch in the date histogram component. Date histogram features provide temporal aggregations. “This is sort of a chronological grouping, you could say, where you’re dividing the dataset into buckets or intervals,” Downing commented. “So, for example, we want to say give me all the documents from a specific day on this month.”

Text queries are predicated in part on lexical, or keyword, search capabilities and are commonly applied to use cases involving user IDs, email addresses, or names. Range queries “are based on a specific range of values in a given field,” Downing explained. With these capabilities, users can retrieve results from a dataset in which the temperature is between 70 and 85 degrees, for example. Sorting enables organizations to order the results of queries according to any number of factors, which might include chronological, numeric, or alphabetical order.

Meaningful Findings

For the enterprise user, the most meaningful findings from the recent benchmark between OpenSearch and Elasticsearch have less to do with the performance of these solutions and more to do with their capabilities. The report indicates that all vector search platforms are not the same. They incorporate different engines that support respective features.

Some of those distinctions pertain to libraries for vector embedding search and pivotal considerations like metadata filtering, as well as versatility for quantization and compression. Moreover, capabilities for sorting search results, aggregating search terms, issuing range queries, and other facets of the Big5 workload are also worthy of consideration when assessing search and analytics platforms — and their performance.

The post Report: OpenSearch Bests ElasticSearch at Vector Modeling appeared first on The New Stack.

SUSECON 25: AI Gets Practical, Secure

Steven J. Vaughan-Nichols — Mon, 17 Mar 2025 20:00:08 +0000

ORLANDO — You probably know SUSE best as a Linux powerhouse. You may well also know that, thanks in no small part to its Rancher acquisition, it’s a major cloud player as well. Recently, however, SUSE has also been pushing its way towards being a capable AI partner for its customers as well. This trend was on full display at SUSECON 25,

As Abhinav Puri, SUSE’s general manager of portfolio solutions and services, said in his presentation, “Through close collaboration with our customers and partners since the launch of SUSE AI last year, we’ve gained additional and invaluable insights into the challenges of deploying production-ready AI workloads. This collaborative journey has allowed us to bolster our offerings and continue to provide customers with strong transparency, trust, and openness in AI implementation.

SUSE has done this by rolling out several key enhancements and partnerships in its AI offerings. These included

AI-Specific Observability: The new SUSE AI release includes enhanced observability features, providing real-time insights into AI workloads, LLM token usage, and GPU performance. This allows enterprises to predict costs better, improve scalability, and enhance performance by quickly identifying and resolving system issues.
Support for Agentic Workflows: The platform supports the development of agentic AI workflows, enabling proactive decision-making and automation of repetitive tasks. This accelerates innovation by allowing enterprises to focus on high-value activities.
Expanded AI Library: The SUSE AI Library has been expanded to include validated open-source AI components such as Open WebUI Pipelines, custom Retrieval-Augmented Generation (RAG), and PyTorch for image classification and natural language processing. Put it all together and SUSE customers should be able to achieve faster time-to-value by leveraging these curated components.

SUSE isn’t doing all this on its own. As Puri told me in an interview, “Through close collaboration with our customers and partners, we’ve gained invaluable insights into deploying production-ready AI workloads,” Puri noted. “These enhancements reflect our commitment to delivering greater value and strengthening SUSE AI.”

The European open-source company is also working to secure AI workloads by using confidential computing. This technology encrypts data in memory so that your AI data and analysis is safe from snoopers even within public cloud containers or on unsecured edge servers.

Retain Control of Your Data

As Manuel Sammeth, Managing Director at FIS-ASP GmbH, a German SAS integrator said at the conference, “With SUSE AI, we help customers innovate while retaining control over sensitive data and meeting strict regulatory requirements.”

Additionally, SUSE also announced a collaboration with Infosys to facilitate AI adoption securely across businesses. This partnership builds on their existing relationship to enhance private cloud adoption and optimize Linux environments for SAP applications.

Finally, SUSE has integrated its security platform, SUSE Security with Microsoft Sentinel to provide a unified security approach across hybrid IT environments, leveraging AI-driven threat mitigation.

None of this may be as flashy as many recent AI announcements, which promise a revolutionary world out of science fiction. Instead, SUSE is focused on delivering practical results to help businesses securely make the most from the state of AI today and not what it may be — possibly, hopefully — tomorrow.

The post SUSECON 25: AI Gets Practical, Secure appeared first on The New Stack.

Is AI a Bubble or a Revolution? Human[X] Asks: Why Not Both?

Heather Joslyn — Mon, 17 Mar 2025 19:00:45 +0000

LAS VEGAS — The big takeaways from the first-ever Human[X] conference: AI agents are everywhere, and need orchestration and governance. Models are improving rapidly — but trust is a work in progress.

And man, there’s a lot of money sloshing around in the AI space these days: the industry grew to global market size of $184 billion in 2024, up from about $134 billion the previous year, according to Statista figures.

The Human[X]conference, founded by Stefan Weitz and Jonathan Weiner, is dedicated to showcasing how organizations are using AI — and helping investors and business decision-makers learn more about this fast-growing industry.

Human[X] — launched last spring with an initial $6 million investment by VCs like Primary Venture Partners, Foundation Capital, FPV Ventures and Andreessen Horowitz — drew more than 6,500 people registered for the initial outing in Las Vegas.

In his welcome speech last Monday, Weitz told the crowd why he and Weiner started Human[X]: to focus on real-world implementation and its attendant benefits and challenges.

“The conversation is, frustratingly to me, binary,” Weitz said. “It’s either utopia or dystopia. It’s either a benevolent robot overlord or it’s going to be Skynet with a LinkedIn profile. There is no apparent middle ground between Utopia and Judgment Day.

“And reality, as we all know, is more complex. So that’s why we try to build Human[X], not to feed the hype, not to create more fear, but to engage in conversations about what’s actually happening, what’s worked, what hasn’t and what we do next.”

On Wednesday, the conference unveiled the results of a survey of more than 1,000 U.S. business leaders, conducted by the pollster HarrisX. Seventy-five percent of respondents said their organization has a dedicated AI strategy.

Most leaders surveyed said they are spending between 10% and 25% of their budgets on AI initiatives; 37% of survey participants said they expect their AI investments to grow significantly over the next three years,

Meanwhile, the field’s biggest players keep churning out new advancements. On Tuesday, OpenAI unveiled its new AI agent framework; on Wednesday, Google dropped Gemma 3, a collection of lightweight, open models.

In his opening address on Monday, Weitz addressed the question of whether the AI boom was a bubble or the opening shots of a coming revolution in the way we work and live.

While he acknowledged that “signs of a bubble are trending,” he noted that previous bubbles have nevertheless resulted in long-lasting societal change; for instance, in the early 1900s, roughly 2,000 companies made cars. While nearly all of those makers failed, cars still transformed society. So, too, did the digital startups of the 1990s, most of which folded.

The AI boom might be “a little frothy,” Weitz said, but it’s still a potential revolution.

However, he cautioned: “Hype is dictating decisions, and the race to not get left behind is pushing companies and governments to move fast, whether or not they actually understand what they’re building. So the problem isn’t just that AI is overhyped, it’s that the hype itself is making us irresponsible. It’s kind of like the Fyre Festival with better algorithms.”

The next Human[X], slated for April 7-9, 2026, will be held in San Francisco, Weitz announced Tuesday — an acknowledgment of how Bay Area-centric the industry and its funders are.

AI Agents and Governance

Because Human[X] attracted lots of investors and business leaders, it made for a noticeably more extroverted crowd than the usual at tech events. For example, tablemates at Human[X] actually asked each other questions at lunch and breakfast, rather than remaining buried in their phones.

Through those conversations, along with the sessions, some key themes emerged. Among them, Agentic AI’s current hotness is presenting governance challenges.

There’s a reason why agents are on the rise, and it’s got at least as much to do with profit as productivity, Yash Sheth, co-founder and COO of Galileo, a generative AI evaluation company, told The New Stack.

The interest in AI software “has been in an assistive manner so far,” Sheth said. “But RAG and chatbots generating documents and briefs still have the human in the loop. The true [return on investment] from AI will be only through automation, that you can automate massive work streams and transform your backend processes to be more efficient.

Yash Sheth, of Galileo, said it’s no mystery why AI agents have taken off: “That’s the true ROI of AI.”

“You can have multiple businesses interact with each other and really perform complex actions on their own. That’s the true ROI of AI.”

In essence, AI agents infuse AI into robotic process automation, Sheth said, “to automate some of the hard-coded processes and make it more robust.”

He added, “What AI is bringing to the table is a generalization of rules in that automation process. So I think fundamentally, if you understand, why are people so crazy about agents, it’s because that’s going to accomplish tasks end to end. “

Governing all those AI agents is a big task — and an opportunity for vendors. On Monday, Boomi, which specializes in integration as a service, announced a beta trial of its AI Studio platform, which it plans to move into general availability in May.

Boomi’s customers have deployed more than 25,000 agents, Mani Gill, the company’s vice president of product, told The New Stack. “We got our customers thinking of agents and using agents, and naturally they’re like, Hey, how do I better understand what data these agents have access to?’

Mani Gill, of Boomi, said Ai agents will sprawl, just as apps, APIs and data have, in modern enterprise systems.

The company also saw a pattern emerging: Just as applications, data and APIs sprawled, so would AI agents. “So we started talking to our customers about, ‘Hey, as you’re thinking about your agentic journey, would this be a value to you to be able to manage across all of these agents? And the concept is very similar to API management where I’ve got all these APIs. How do I understand them across my landscape?”

He added, “We led our customers there a little bit, but it also is unfolding in front of them.”

Boomi AI Studio provides a platform for AI agent design, governance and orchestration. There are four components:

Agent Designer: Lets users create and deploy AI agents using Generative AI prompts, through no-code templates, using trusted data and security guardrails.
Agent Control Tower: The centerpiece of AI Studio, according to Gill, the control tower “provides governance, but also compliance and auditing,” with all that monitoring of both Boom and third-party AI agents in a central place.
Agent Garden: A space that allows users to interact with their AI agents using natural language. Design, testing, deployment and tool development are enabled in the Agent Garden. “They can learn, continuously learn and nurture that,” Gill said.
Agent Marketplace: “We’re working with our partners to use that design capability to create agents that then our customers can just use as templates.” The Agent Marketplace resides in Boomi Marketplace (formerly Boomi Discover).

Other players are also crowding into the AI governance space. Holistic AI, a Software as a Service company that offers end-to-end AI deployment management. Founded only five years ago, it’s been seeing 50 to 60% growth each year, fueled by customers like Unilever, Raj Patel, Holistic AI’s AI transformation lead, told The New Stack.

The Holistic AI platform, Patel said, includes “observability and evidence-based backing for your decisions as a business — whether you should deploy AI or not, and when you do deploy it, do you have the responsible AI guardrails, ethics, observability in place.”

The idea is to not only govern AI applications and agents but also determine if the applications and agents should be built in the first place.

“Data science teams cost hundreds of thousands of dollars, in order to build a team and spend six months testing and then deploying,” Patel said. “You want to know very early on if this is something that you want to explore and what are the mitigations that you need to put in place in order to make this happen.”

Governance, Patel said, is a gap waiting to be filled as more organizations take up generative AI.

“One of the key deficiencies in the market is they see governance as a checkbox exercise,” he said. “At the moment, it’s something that should be one and done. It’s really not like that anymore.

“If you want to be able to effectively deploy AI in your business, there is a continuum of checks that need to be done, and you need to have a system in place that supports an AI governance strategy that allows that.”

Making LLMs More Human

On Wednesday morning, Sean White, CEO of Inflection AI, a three-year-old company that specializes in training and tuning large language models for enterprises, spoke to the Human[X] audience about his company’s ongoing efforts to make LLM-based chatbots more conversational.

The company’s Pi.ai, a personal assistant chatbot used by 35 million people, began as a way to release its frontier models, a term used for cutting-edge models. White, formerly chief research and development director at the Mozilla Foundation, joined Inflection a year ago. “When I joined, a large part of the shift was taking all of that and seeing if we could then apply that to the enterprise.”

Sean White, of Inflection AI, said a lot of languge models result in user experiences that “either start off where they are just book reports, or they actually are just not good to talk to.”

Pi.ai has been intentionally developed as an emotionally intelligent online assistant. It’s central to Inflection’s mission, White said.

“We really believe this is a new generation of user interfaces and experiences,” he told TNS. “It’s not just the computational system or the UX system — we don’t want to build a crappy user experience. A lot of these systems either start off where they are just book reports, or they actually are just not good to talk to.”

Inflection AI, he said, has put a lot of effort into making models less like “book reports” and more, well, human, fine-tuning for nuance and context.

“We have collected over 10 million of these examples of good conversation, of emotional intelligence. We have this very large dimensional space in which we kind of want the qualities of, is it keeping the conversation going? Is that utterance sarcastic?”

Alongside newer startups like Inflection AI at Human[X] were companies that began before the post-2022 explosion in demand for generative AI tools. Unbabel, started in 2013, doing machine translation. “We got a community of translators from all around the world that would post-edit this machine translation,” Gil Coelho, head of product at Unbabel, told The New Stack.

And now, because machine translation has improved so substantially, Coelho said, “We have a generation of models, which we call Tower LLMs, and they’re state of the art right now for machine translation across most of the languages.”

Gil Coelho, of Unbabel, said his company wants developers to start building on top of AI components it has released for commercial use through Widn.ai.

In addition, Unbabel can perform quality estimation — using another AI model to predict the confidence the company has in a particular translation. ”So basically, one model does the translation, and then the other model will say, ‘Hey, I have high confidence,’ or ‘I have low confidence on the translation,’” Coelho said. “And if I have low confidence, I’m going to send this to a human” to check the translation.

Unbabel has now released for commercial use some of its state-of-the-art AI components, through Widn.ai, and encourages developers to build on its components.

“That’s something that was a big shift in terms of our strategy,” said Coelho. “We just thought it made sense. We’ve been building these and we want to make it available to a lot more people, a lot more developers, a lot more builders, and not just keep it within the Unbabel platform.”

Moving Beyond ‘Prompt and Pray’

Ahead of Human[X], the eight-year-old company AI21 Labs unveiled Jamba 1.6, the latest iteration of its open LLM based on the hybrid transformer-Mamba-mixture of experts (MoE) architecture.

And in alignment with Human[X]’s emerging theme of AI orchestration, It introduced Maestro, an AI planning and orchestration system, on Tuesday.

The problem that Maestro is meant to address: To help overcome the issues of trust that hamper AI adoption in production at enterprises.

While consumer adoption of AI tools is rising, “in the enterprise, it’s a very different story,” Ori Goshen, AI21 co-founder and co-CEO, told The New Stack. “You see a lot of a lot of experimentation, a lot of these charismatic demos, very little workloads that actually go to production.”

While it could be that the enterprise market simply isn’t educated enough yet, Goshen said, “There’s a more fundamental issue here: to get these to work in mission-critical workflows, you have to build trust around those systems. They have to be robust.

“That’s kind of the basic piece. And we’ve been working with customers; we’re seeing their pain. It’s really painful to get something from a flashy demo to actual workflows that actually work in production.”

The current approach to building AI applications, Goshen said, is the underlying culprit. Typically a developer AI builder within the enterprise takes an agentic framework like a React, like Cloudchain, or CrewAI, or AutoGen or any of these, and then it uses a language model or a reasoning model, to figure out what the system is going to do.

“So it lets the language model basically plan and operationalize the workflow, which, again, works for demos but breaks in reality. We call this method ‘prompt and pray.’”

Ori Goshen of AI21 Labs: “It’s really early days. I think there are lots of questions of, how do you govern? How do you create more control?

Workarounds are possible, but not sustainable as the system grows and gets more complex. In their hard code, Goshen said, developers might “put clear checkpoints within, they call the LLM to get the dynamic part of the processing. That method indeed gives you more control, but it’s rigid and brittle and it’s hard to scale.”

Maestro is a model-agnostic system, Goshen said: “It learns the specific enterprise environment. So it learns the APIs, the tools, the data sources … it understands the environment, and then it’s training by doing offline simulation. And then, when a task is received by the system, it creates a structured plan that is explicit, so you can actually trace it”

The Maestro system is currently in private preview, Goshen said, with the expectation that it will roll out to general availability in Q2 of 2025.

As for the conference, Goshen cautioned against letting the hype get ahead of the reality engineering teams face.

“It’s really early days,” he said. “I think there are lots of questions of, how do you govern? How do you create more control? But I think the fundamental, the real fundamental part, is, how do we trust these systems?”

Correction: In discussing the company Unbabel, a previous version of this article incorrectly stated that Widn.ai was an open source project, and misidentified the generation of models known as Tower LLMs.

The post Is AI a Bubble or a Revolution? Human[X] Asks: Why Not Both? appeared first on The New Stack.

How We Built a LangGraph Agent To Prioritize GitOps Vulns

Juan Antonio "Ozz" Osorio — Mon, 17 Mar 2025 18:00:30 +0000

In today’s complex Kubernetes environments, managing and prioritizing vulnerabilities can quickly become overwhelming. With dozens or even hundreds of containers running across multiple services, how do you decide which vulnerabilities to address first?

This is where AI can help, and in this article we’ll share our experience building HAIstings, an AI-powered vulnerability prioritizer, using LangGraph and LangChain, with security enhanced by CodeGate, an open source AI gateway developed by Stacklok.

Too Many Vulnerabilities, Too Little Time

If you’ve ever run a vulnerability scanner like Trivy against your Kubernetes cluster, you know the feeling: hundreds or thousands of common vulnerabilities and exposures (CVEs) across dozens of images, with limited time and resources to address them. Which ones should you tackle first?

The traditional approach relies on severity scores (i.e., critical, high, medium, low), but these scores don’t account for your specific infrastructure context. For example, a high vulnerability in an internal, non-critical service might be less urgent than a medium vulnerability in an internet-facing component.

We wanted to see if we could use AI to help solve this prioritization problem. Inspired by Arthur Hastings, the meticulous assistant to Agatha Christie’s detective Hercule Poirot, we built HAIstings to help infrastructure teams prioritize vulnerabilities based on:

Severity (critical/high/medium/low).
Infrastructure context (from GitOps repositories).
User-provided insights about component criticality.
Evolving understanding through conversation.

Building HAIstings With LangGraph and LangChain

LangGraph, built on top of LangChain, provides an excellent framework for creating conversational AI agents with memory. Here’s how we structured HAIstings:

1. Core Components

The main components of HAIstings include:

k8sreport: Connects to Kubernetes to gather vulnerability reports from trivy-operator.
repo_ingest: Ingests infrastructure repository files to provide context.
vector_db: Stores and retrieves relevant files using vector embeddings.
memory: Maintains conversation history across sessions.

2. Conversation Flow

HAIstings uses a LangGraph state machine with the following flow:

graph_builder = StateGraph(State)
# Nodes
graph_builder.add_node("retrieve", retrieve)  # Get vulnerability data
graph_builder.add_node("generate_initial", generate_initial)  # Create initial report
graph_builder.add_node("extra_userinput", extra_userinput)  # Get more context

# Edges
graph_builder.add_edge(START, "retrieve")
graph_builder.add_edge("retrieve", "generate_initial")
graph_builder.add_edge("generate_initial", "extra_userinput")
graph_builder.add_conditional_edges("extra_userinput", needs_more_info, ["extra_userinput", END])

This creates a loop where HAIstings:

Retrieves vulnerability data.
Generates an initial report.
Asks for additional context.
Refines its assessment based on new information.

3. RAG for Relevant Context

One of the challenges was efficiently retrieving only the relevant files from potentially huge GitOps repositories. We implemented a retrieval-augmented generation (RAG) approach:

def retrieve_relevant_files(repo_url: str, query: str, k: int = 5) -> List[Dict]:
    """Retrieve relevant files from the vector database based on a query."""
    vector_db = VectorDatabase()
    documents = vector_db.similarity_search(query, k=k)
    
    results = []
    for doc in documents:
        results.append({
            "path": doc.metadata["path"],
            "content": doc.page_content,
            "is_kubernetes": doc.metadata.get("is_kubernetes", False),
        })
    
    return results

This ensures that only the most relevant files for each vulnerable component are included in the context, keeping the prompt size manageable.

Security Considerations

When working with LLMs and infrastructure data, security is paramount. The vulnerability reports and infrastructure files we’re analyzing could contain sensitive information like:

Configuration details.
Authentication mechanisms.
Potentially leaked credentials in infrastructure files.

This is where the open source project CodeGate becomes essential. CodeGate acts as a protective layer between HAIstings and the LLM provider, offering crucial protections:

1. Secrets Redaction

CodeGate automatically identifies and redacts secrets like API keys, tokens and credentials from your prompts before they reach the large language model (LLM) provider. This prevents accidental leakage of sensitive data to third-party cloud services.

For example, if your Kubernetes manifest or GitOps repo contains:

apiVersion: v1
kind: Secret
metadata:
  name: database-credentials
type: Opaque
data:
  username: YWRtaW4=  # "admin" in base64
  password: c3VwZXJzZWNyZXQ=  # "supersecret" in base64

CodeGate redacts these values from prompts before reaching the LLM; then it seamlessly unredacts them in responses.

You may be saying, “Hang on a second. We rely on things like ExternalSecretsOperator to include Kubernetes secrets, so we’re safe… right?”

Well, you might be experimenting with a cluster and have a token stored in a file in your local repository or in your current working directory. An agent might be a little too ambitious and accidentally add it to your context, as we’ve often seen with code editors. This is where CodeGate jumps in and redacts sensitive info before it is unintentionally shared.

2. PII Redaction

Beyond secrets, CodeGate also detects and redacts personally identifiable information (PII) that might be present in your infrastructure files or deployment manifests.

3. Controlled Model Access

CodeGate includes model multiplexing (muxing) capabilities help ensure that infrastructure vulnerability information goes only to approved, trusted models with appropriate security measures.

Model muxing allows you to create rules that route specific file types, projects or code patterns to different AI models. For example, you might want infrastructure code to be handled by a private, locally hosted model, while general application code can be processed by cloud-based models.

Model muxing enables:

Data sensitivity control: Route sensitive code (like infrastructure, security or authentication modules) to models with stricter privacy guarantees.
Compliance requirements: Meet regulatory needs by ensuring certain code types never leave your environment.
Cost optimization: Use expensive, high-powered models only for critical code sections.
Performance tuning: Match code complexity with the most appropriate model capabilities.

Here’s an example model muxing strategy with an infrastructure repository:

Rule: *.tf, *.yaml or *-infra.* can be muxed to a locally hosted Ollama model.
Benefit: Terraform files and infrastructure YAML never leave your environment, preventing potential leak of secrets, IP addresses or infrastructure design.

4. Traceable History

CodeGate maintains a central record of all interactions with AI models, creating an audit trail of all vulnerability assessments and recommendations.

Configuring HAIstings With CodeGate

Setting up HAIstings to work with CodeGate is straightforward. Update the LangChain configuration in HAIstings:

# HAIstings configuration for using CodeGate
self.llm = init_chat_model(
    # Using CodeGate's Muxing feature
    model="gpt-4o",  # This will be routed appropriately by CodeGate
    model_provider="openai",
    # API key not needed as it's handled by CodeGate
    api_key="fake-api-key",
    # CodeGate Muxing API URL
    base_url="http://127.0.0.1:8989/v1/mux",
)

The Results

With HAIstings and CodeGate working together, the resulting system provides intelligent, context-aware vulnerability prioritization while maintaining strict security controls.

A sample report from HAIstings might look like:

# HAIsting's Security Report

## Introduction

Good day! Arthur Hastings at your service. I've meticulously examined the vulnerability reports from your Kubernetes infrastructure and prepared a prioritized assessment of the security concerns that require your immediate attention.

## Summary

After careful analysis, I've identified several critical vulnerabilities that demand prompt remediation:

1. **example-service (internet-facing service)**
   - Critical vulnerabilities: 3
   - High vulnerabilities: 7
   - Most concerning: CVE-2023-1234 (Remote code execution)
   
   This service is particularly concerning due to its internet-facing nature, as mentioned in your notes. I recommend addressing these vulnerabilities with the utmost urgency.

2. **Flux (GitOps controller)**
   - Critical vulnerabilities: 2
   - High vulnerabilities: 5
   - Most concerning: CVE-2023-5678 (Git request processing vulnerability)
   
   As you've noted, Flux is critical to your infrastructure, and this Git request processing vulnerability aligns with your specific concerns.

## Conclusion

I say, these vulnerabilities require prompt attention, particularly the ones affecting your internet-facing services and deployment controllers. I recommend addressing the critical vulnerabilities in example-service and Flux as your top priorities.

Performance Considerations

LLM interactions are slow by themselves, and you shouldn’t rely on them for real-time and critical alerts. Proxying LLM traffic will add some latency into the mix. This is expected since these are computationally expensive operations. That said, we believe the security benefits are worth it. You’re trading a few extra seconds of processing time for dramatically better vulnerability prioritization that’s tailored to your specific infrastructure needs.

Secure AI for Infrastructure

Building HAIstings with LangGraph and LangChain has demonstrated how AI can help solve the problem of vulnerability prioritization in modern infrastructure. The combination with CodeGate ensures that this AI assistance doesn’t come at the cost of security. You get intelligent, context-aware guidance without compromising security standards, freeing up your team to focus on fixing what matters most.

As infrastructure becomes more complex and vulnerabilities more numerous, tools like HAIstings represent the future of infrastructure security management, providing intelligent, context-aware guidance while maintaining the strictest security standards.

You can try HAIstings by using the code in our GitHub repository.

Would you like to see how AI can help prioritize vulnerabilities in your infrastructure? Or do you have other ideas for combining AI with infrastructure management? Jump into Stacklok’s Discord community and continue the conversation.

The post How We Built a LangGraph Agent To Prioritize GitOps Vulns appeared first on The New Stack.

How to Run Docker in Rootless Mode

Jack Wallen — Mon, 17 Mar 2025 17:00:23 +0000

Although it’s possible to deploy Docker containers without root privileges, that doesn’t necessarily mean it’s rootless throughout. That’s is because there are other components within the stack (such as runc, containerd, and dockerd) that do require root privileges to run. That can equate to a security issue by way of heightened privilege attacks.

Sure, you can add your user to the docker group and run the docker deploy command without the help of sudo, but that really doesn’t solve the problem. There are other ways to run docker that seem like a good idea but, in the end, they’re just as dangerous as running docker with sudo privileges.

So, what do you do? You can always go rootless.

How Rootless Works

Effectively, running rootless Docker takes advantage of user namespaces. This subsystem provides both privilege isolation and user identification segregation across processes. This feature has been available to the Linux kernel since version 3.8 and can be used with docker to map a range of user IDs so the root user within the innermost namespace maps to an unprivileged range in a parent namespace.

Docker has been able to take advantage of the user namespace feature for some time. This is done using the --userns-remap option. The only problem with this is the runtime engine is still run as root, so it doesn’t solve our problem.

That’s where rootless docker comes into play.

Limitations

Priveleged Port Access

Unfortunately, rootless mode isn’t perfect. The first issue is that rootless docker will not have access to privileged ports, which are any port below 1024. That means you’ll need to remember to expose your containers to ports above 1024, otherwise, they will fail to run.

Resource limitation of Containers

Another issue is that limiting resources with options such as –cpus, –memory, and –pids-limit are only supported when running with cgroup v2 and systemd.

Other limitations you might run into include:

No support for AppArmor, checkpoint, overlay network, and SCTP port exposure.
Limited storage driver support (only the overlay2, fuse-overlayfs, and vfs storage drivers are supported).
Doesn’t support –net-host.

With all of that said, how do we install docker such that it can be run in rootless mode? It’s actually quite simple. Let me show you how.

I’ll be demonstrating on my go-to server of choice, Ubuntu Server 20.04, but you can do this on nearly any Linux distribution. The only difference will be the installation command to be run for the one dependency.

Installing the Lone Dependency

The first thing we must do is install the sole dependency for this setup. That dependency is uidmap, which handles the user namespace mapping for the system. To install uidmap, log into your server and issue the command:

sudo apt-get install uidmap -y

That’s all there is for the dependencies.

Installing Docker

Next, we install Docker. We don’t want to go with the version found in the standard repository, as that won’t successfully run in rootless mode. Instead, we need to download a special installation script that will install rootless Docker.

Download and run the Docker rootless installer

We can download and install the rootless version of docker with a single command:

curl -fsSL https://get.docker.com/rootless | sh

Add the necessary variables

When that installation finishes, you then need to add a pair of environment variables to .bashrc. Open the file with:

nano ~/.bashrc

In that file, add the following lines to the bottom:

export PATH=/home/jack/bin:$PATH
export DOCKER_HOST=unix:///run/user/1000/docker.sock

NOTE: Make sure to add your particular user ID. In the above code, my ID was 1000. To find your user ID, issue the command:

id

You’ll want to add the number after uid= in the line:

export DOCKER_HOST=unix:///run/user/ID/docker.sock

Where ID is your user ID number.

Save and close the file.

Log out and log back into the server (so the changes will take effect) and you’re ready to test out rootless docker.

Testing Rootless Docker

We’ll deploy our trusty NGINX container as a test. Remember, we’ve not added our user to the docker group. If this were a standard Docker installation, we wouldn’t be able to successfully deploy the NGINX container without either adding our user to the docker group or running the deploy command with sudo privileges.

Testing rootless Docker with NGINX

To test rootless mode (deploying NGINX in detached mode), issue the command:

docker run --name docker-nginx -p 8080:80 -d nginx

Open a web browser and point it to http://SERVER:8080 (Where SERVER is the IP address of your Docker server) and you should see the NGINX welcome page.

This container was deployed without using root, so the entire stack is without those elevated privileges.

Testing rootless mode with a Ubuntu container

You can even deploy a full Linux container and access it’s bash shell with a command like:

docker run -it ubuntu bash

All of this done without touching root privileges.

Conclusion

This is obviously not a perfect solution to solve all of the security issues surrounding Docker containers. And you might even find Podman a better solution, as it can run rootless out of the box. But for those who are already invested in Docker, but are looking to gain as much security as possible, running Docker in rootless mode is certainly a viable option.

Give rootless Docker a try and see if it doesn’t ease your security headaches a bit.

Rootless Mode FAQ

1. What is Docker Rootless Mode?

A: Docker Rootless Mode allows you to run containers without requiring superuser privileges, by utilizing namespaces and cgroups provided by the Linux kernel.

2. Why Use Docker Rootless Mode?

Runng Docker in rootless mode provides several benefits:

Security: Reduces potential security risks since no processes are running with elevated permissions.

Isolation: Improves system isolation, as each container runs in its own user namespace.

Flexibility: Allows for the use of non-root users and avoids conflicts with existing root-based applications.

4. Do I Still Need Dockerd?

A: Yes, you still need a docker daemon (dockerd). You can start it as follows:

dockerd-rootless-setuptool.sh install –non-suid

This command starts dockerd in rootless mode.

5. How Do I Run Containers?

A: Once Docker Rootless Mode is set up, you can run containers using the standard docker command such as:

docker run -it ubuntu bash

6. Can I Use Docker Compose with Rootless Mode?

A: Yes, you can use Docker Compose in rootless mode. Just make sure that both Docker and Docker Compose are installed.

7. What About Network Configuration?

A: In rootless mode, network setup is different from the root-mode. By default, dockerd-rootless-setuptool.sh configures a user-specific network stack using SLIRP4NetNS for networking. This setup can be customized by modifying the configuration files under /home/USER/.local/share/docker/rootless (where USER is your username).

8. Can I Share Docker Volumes with Host?

A: Yes, but you need to mount volumes that are accessible from your user namespace. For example, you might do:

docker run -v /host/data:/container/data ubuntu bash

Do note that some features of shared volumes may not be fully supported in rootless mode.

9. Do I Have Access to docker system prune and Other Commands?

A: Not all commands work directly in rootless mode. For example, you cannot use docker system prune, as it requires access to the host kernel that is not available to non-root users.

You can run these commands by using a containerized version of Docker:

The post How to Run Docker in Rootless Mode appeared first on The New Stack.

2.8 Million Reasons Why You Can’t Trust Your VPN

Nick Taylor — Mon, 17 Mar 2025 16:00:44 +0000

There are 2.8 million IP addresses, meaning 2.8 million unique sources are currently hammering away at virtual private network (VPN) devices worldwide, trying to guess their way into corporate networks. Perimeter-based security’s poster child, the VPN, is under siege, and the numbers are staggering, with attack vectors originating across the globe.

Your company’s entire security posture shouldn’t collapse because someone guessed a password. Yet that’s exactly what perimeter-based security offers: Crack one set of credentials and you’ve breached the trusted zone.

Enter Zero Trust Security

The zero trust security model operates on a simple principle: never trust, always verify. Unlike traditional security models that trust anything inside the network perimeter, zero trust verifies every access request regardless of where it originates.

Think about getting on an airplane. At the airport, your every step is verified: Check-in confirms your booking, security screens you and your belongings, and gate agents ensure you’re boarding the right flight at the right time. Your boarding pass works for your one specific flight to your designated destination, not the entire airport or anywhere in the world. Try to enter the wrong gate or board too early? You’ll be stopped, even if you have valid credentials.

Continuous verification at every step. That’s zero trust security in a nutshell.

Now compare this to perimeter-based security: It’s like showing a stolen passport at the first point of entry in an airport, skipping all other checks and suddenly having access to every gate, plane and restricted area in the airport. No questions asked — because you’re already inside.

Sounds absurd? That’s exactly how perimeter security works. Check once, trust forever.

It’s All About Context

While VPNs create a secure tunnel and trust everything inside it, zero trust takes a fundamentally different approach through identity-aware proxies.

Every access request passes through this proxy, which evaluates:

Who is making the request. (Identity)
What they are trying to access. (Resource)
Where they are connecting from. (Location)
What device they are using. (Device posture)
When are they making the request. (Time)

Think of it like this: A senior engineer’s credentials entered at 3 a.m. from an unknown device in a new country should raise flags, even if the password is correct.

Making Decisions

Behind the scenes, a policy engine processes these factors in real time, making instant decisions about access. Instead of maintaining complex firewall rules, you define simple, clear policies like: “Engineers can access production systems only during their on-call shifts, from managed devices with multifactor authentication.”

Microsegmentation

Rather than having broad network access, each application and service is protected individually.

This means:

A compromise of one service doesn’t expose others.
Access is granular and specific.
Lateral movement is restricted by default.

Improved Security Posture on Day One

Identity-aware proxies can instantly modernize your security posture without touching your legacy applications. Some critical internal tools built years ago might not even support modern authentication methods like single sign on (SSO).

Adding an identity-aware proxy in front of these applications:

Enforces strong authentication instantly.
Adds SSO capabilities without application changes.
Defines context-based access (device, role, etc.).
Provides audit logs out of the box.
Enables modern security policies.

Wrapping Up

The shift to zero trust isn’t just a security upgrade; it’s a fundamental rethinking of how we protect our most valuable digital assets. By moving away from perimeter-based models that create a false sense of security, organizations can build resilience against the evolving threat landscape. Whether you’re dealing with remote workers, cloud migrations or legacy applications, zero trust principles provide a flexible framework that grows with your needs while maintaining consistent security standards across your entire infrastructure.

There are 2.8 million reasons not to trust a VPN, and they’re all hammering networks worldwide right now. Zero trust offers a better way: Never trust, always verify — at every access, every time.

For more information, watch Pomerium’s YouTube short about zero trust.

The post 2.8 Million Reasons Why You Can’t Trust Your VPN appeared first on The New Stack.

Meet The New Stack’s New Editor for AI

Heather Joslyn — Mon, 17 Mar 2025 15:00:14 +0000

The New Stack has been building out its coverage of AI — large language models, AI agents, and all the latest innovations that have impact on the work of developers and engineers — for the past few years. Today, we will be joined by a new member of our team, who will steer that coverage.

Frederic Lardinois is TNS’s new senior editor for AI. If you’ve been in the tech industry over the last 17 years, you’re probably familiar with his work — at his most recent post, as senior enterprise editor at TechCrunch, as the founder of the SiliconFilter newsletter, or as a writer for ReadWriteWeb.

“TNS already does a great job covering AI, especially how it is reshaping the day-to-day work of developers,” Frederic told me in an email interview Friday. “AI now touches every stage of the software life cycle, from idea to deployment and beyond, and enterprises are rapidly going from experimenting with AI to putting it into production.

“There is a real opportunity here to cover what I believe is going to be the stack that the next generation of software will be built upon. In practical terms, I think this means I can help TNS expand its coverage of, for example, how the underlying models are built and who is building them, the growing ecosystems around many of these tools and how AI will fundamentally alter how we think about software development.

He sees a lot of rich territory to mine in the AI space and looks forward to sharing it with New Stack readers.

“Agents are obviously hot right now and may, among many other things, finally realize some of the promises of what the robotic process automation companies tried to sell many years ago,” he wrote. “I’m also really interested in how developers are using smaller, more specialized models to extend their services.

“There’s also the question of how, in the long run, on-device AI chips will change the equation for the cloud-based AI services. In addition, I also think the discussion about open source models is far from over.

The ReadWriteWeb Gang Reunited

Frederic will help us build upon the work of TNS Senior Editor Richard MacManus, who has developed our AI coverage over the last few years. Richard will also continue to help us track the trends and innovations in AI, since it’s a big and rapidly expanding space.

He founded ReadWriteWeb, and expressed excitement about reuniting with a former RWW staff member.

“Frederic worked for me at ReadWriteWeb from 2008-2010 and he was one of our leading writers during that time,” said Richard. “So I’m thrilled to see he will be joining The New Stack to help extend our AI coverage. That makes three RWW alumni on the core editorial team, including myself and TNS founder Alex Williams!”

Alex, who is also TNS’s publisher, highlighted the capabilities Frederic would bring to the TNS audience.

“Frederic will offer our readers deeper explanation and analysis of at-scale development, deployment and management,” he said. “These are concepts that resonate deeply in the developer community, all the more complex with the advent of LLMs, AI agents and the host of requirements needed to deploy models for the enterprise.

“We heartily welcome Frederic as our senior editor for AI at The New Stack. He’s one of the best in the business. It’s great to work with him again, following his long and successful run as a senior editor at TechCrunch.”

Our new team member is a longtime TNS reader. “I’ve been following the site since day one — and, in some ways, even before that given, that Alex and I worked together at ReadWriteWeb and TechCrunch,” Frederic told me. “There’s a depth and knowledge here that’s unsurpassed in the industry. It also helps, of course, that I got to meet quite a few TNS team members over the years, at events all around the world.”

In the Air and On the Go

Frederic joins our all-remote staff from his home in Portland, Ore. An avid traveler and hiker, he also acquired the obligatory nerdy pandemic hobby: in his case, collecting custom keyboards.

And he’s a licensed pilot.

“Like so many in tech, I always had a fascination with planes,” he wrote. “After dreaming about it for years, I started taking lessons at a small private airport near Portland in late 2019 and had my first solo flight right before Christmas. Then Covid hit and shut everything down, but I passed my checkride — with masks on‚ in July.

“Ever since, I’ve been trying to fly the experimental plane I helped build as part of our flying club, as often as my schedule allows.”

Frederic will be on the go right from the start at TNS, covering the NVIDIA GTC conference for us. Check out his posts on The New Stack this week.

The post Meet The New Stack’s New Editor for AI appeared first on The New Stack.

Tools and Talks Worth Checking Out at KubeCon Europe

Bhushan Nemade — Mon, 17 Mar 2025 14:05:16 +0000

Are you heading to KubeCon + CloudNativeCon Europe 2025? With the event fast approaching, now’s the perfect time to plan your experience. This massive gathering held over four days is packed with cutting-edge cloud native technologies and expert-led technical talks, so making a clear agenda is essential.

The challenge? Navigating the 229 sessions, 215 vendors and countless networking opportunities without feeling overwhelmed. Let me help you curate the perfect itinerary so you can maximize your KubeCon experience.

Here are five projects that are making significant waves in the cloud native ecosystem, along with the must-attend talks that should be on every attendee’s radar.

Projects That Are Worth Checking Out

Chainguard

If you are already spending countless hours trying to patch and triage the CVEs from container images to secure your systems, you need to check out this tool.

In an era where software supply-chain attacks are on the rise, Chainguard is redefining security with its focus on minimal, hardened and continuously verified container images. By eliminating vulnerabilities before they reach production, Chainguard helps organizations enhance their security posture without compromising efficiency.

Chainguard enhances security through several key approaches:

Distroless container images — These minimal images contain only what’s necessary to run an application, eliminating unnecessary packages, shells and utilities that could introduce vulnerabilities.
Continuous verification — Automated scanning and attestation systems cryptographically verify the integrity and provenance of container images throughout the development life cycle.
Software bills of materials (SBOM) integration — Chainguard generates and maintains detailed SBOMs for its images, providing transparency about all components and dependencies.
Vulnerability elimination — Rather than just detecting vulnerabilities, its focuses on removing them at the source by carefully curating dependencies and maintaining strict update policies.

Chainguard offers publicly available container images that you can explore and use, providing a hands-on experience with their secure, minimal and continuously verified approach.

Crossplane

Crossplane transforms the way organizations manage their cloud infrastructure by bringing the declarative resource model of Kubernetes to multicloud environments. This powerful tool is reshaping infrastructure management through:

Infrastructure as Code via Kubernetes API — Crossplane allows teams to provision and manage cloud resources using the same Kubernetes API and tooling they already use for applications, eliminating the need to juggle multiple cloud provider CLIs and consoles.
Composable infrastructure abstractions — It enables platform teams to create custom, organization-specific abstractions that hide cloud-specific implementation details, allowing developers to self-service infrastructure without needing to understand the underlying cloud providers.
Multicloud resource orchestration — Crossplane provides a unified control plane for managing resources across AWS, Azure, GCP and other cloud providers through consistent Kubernetes custom resource definitions (CRDs).

Kubescape

Kubescape is revolutionizing Kubernetes security with its comprehensive, open source security platform built by Armo and designed to identify and remediate risks across your entire Kubernetes stack. In an environment where container security breaches are increasingly common, Kubescape stands out with:

Comprehensive security scanning — It performs scans against multiple frameworks including NSA-CISA, MITRE ATT&CK and CIS Benchmarks, providing holistic visibility into security posture from cluster configuration to runtime behavior.
Risk-based prioritization — Kubescape analyzes vulnerabilities within your specific context, calculating risk scores that help teams focus on what matters most instead of drowning in alerts.
DevSecOps integration — It seamlessly integrates into CI/CD pipelines and existing workflows with support for CLI, WebUI and Kubernetes native components, enabling security to shift left in the development process.
Compliance automation — Automatically generates compliance reports and provides remediation guidance, significantly reducing the manual effort needed to maintain regulatory compliance.

vCluster

vCluster is transforming multitenancy in Kubernetes by introducing a revolutionary approach to cluster isolation and resource optimization. In environments where traditional namespace isolation falls short, vCluster enables:

Virtual Kubernetes clusters — vCluster creates fully functional Kubernetes clusters that run inside the namespace of another Kubernetes cluster, providing true isolation without the operational overhead of managing separate physical clusters.
Resource optimization —By running virtual clusters on shared infrastructure, vCluster dramatically reduces resource consumption compared to dedicated clusters, cutting cloud costs while maintaining isolation.
Environment standardization — Development, staging and production environments can maintain consistent configurations while running as virtual clusters, eliminating the “works on my machine” problem.
Team independence —Development teams gain full admin rights to their own virtual clusters without affecting other teams or requiring privileged access to the underlying host cluster.

vCluster solves the critical challenge of providing strong isolation in multitenant environments without the prohibitive costs and operational complexity of managing multiple physical clusters, making it ideal for organizations looking to optimize their Kubernetes infrastructure while maintaining security and autonomy for development teams.

Devtron

“When AI makes your developers two times faster, your Kubernetes shouldn’t hold them back.”

Devtron is an open source application life-cycle management platform that enables teams to move faster without engaging with the complexities of Kubernetes. Devtron orchestrates Kubernetes and related operations into a single intuitive UI from where developers and DevOps teams can accelerate their Kubernetes operations.

Devtron makes managing Kubernetes easier and has revolutionized the way they are managed with features like:

Automated and reliable CI/CD — Devtron comes with integrated GitOps-enabled CI/CD workflows that ensure precise feature deployment in target environments. Key components like deployment windows, predeployment approvals and application promotion enhance reliability while reducing tool sprawl. This provides developers with a streamlined path to deployment.
Simplified and fine-grained RBAC — Devtron addresses Kubernetes role-based access control (RBAC) configuration challenges through its the intuitive UI. The platform offers simplified, yet powerful RBAC controls, enabling fine-grained user access down to specific pods within specific clusters.
Multicluster management — Easily onboard multiple Kubernetes clusters to a single Devtron instance and manage them through a unified dashboard. This eliminates the complexity of managing clusters via kubectl commands and provides clear visibility into cluster operations.
Policies and governance — The platform includes robust governance features such as approval policies, configuration locks, built infrastructure controls and comprehensive compliance and audit logging capabilities.

With this, Devtron ensures that your infrastructure-based operations keep pace with the speed and efficiency of AI-accelerated development.

Top Tracks and Talks Worth Attending

AI and Machine Learning

“A Practical Guide to Benchmarking AI and GPU Workloads in Kubernetes” — Yuan Chen, NVIDIA, and Chen Wang, IBM Research

The talk covers benchmarks for a range of use cases, including model serving, model training and GPU stress testing, using tools like NVIDIA Triton Inference Server; fmperf, an open source tool for benchmarking LLM-serving performance; MLPerf, an open benchmark suite to compare the performance of machine learning systems; GPUStressTest; gpu-burn; and cuda benchmark. The talk will also introduce GPU monitoring and load-generation tools.

“Orchestrating AI Models in Kubernetes: Deploying Ollama as a Native Container Runtime” — Samuel Veloso, Cast AI, and Lucas Fernández, Red Hat

In this talk, you’ll discover how a custom container runtime integrated with Ollama streamlines AI model deployment in Kubernetes. It will explore how this approach simplifies operations, enhances efficiency and removes the complexity of traditional model-serving solutions. Through real-world examples and a live demonstration, you’ll gain insight into using this innovative runtime to natively run open source AI models in Kubernetes with ease.

Security

“Open Source Malware or a Vulnerability? The Philosophical Debate and How to Mitigate” — Brian Fox, Sonatype; Madelein van der Hout, Forrester Research Inc.; Santiago Torres-Arias, Purdue University

This talk will shed light on the growing threat of open source malware, distinguishing it from traditional vulnerabilities and exploring why it often goes undetected by conventional security tools. A panel of experts, including researchers, analysts and industry veterans, will break down real-world examples, discuss the challenges of securing open source software and provide actionable strategies for mitigating risks. Attendees will leave with a deeper understanding of the evolving security landscape and practical steps to protect their software supply chain.

Observability

“An Exemplary Path: Leveraging eBPFs and OpenTelemetry to Auto-instrument for Exemplars” — Charlie Le and Kruthika Prasanna Simha, Apple

This talk will explore how eBPF and OpenTelemetry can work together to automate exemplar generation without requiring manual instrumentation. You’ll learn how eBPF’s in-kernel aggregation capabilities enable real-time metric and trace collection, seamlessly integrating with OpenTelemetry to enhance observability.

“The Missing Metrics: Measuring Memory Interference in Cloud Native Systems“ — Jonathan Perry, PerfPod

This session presents the latest research on detecting memory interference, including findings from Google, Alibaba and Meta’s production environments. We’ll explore how modern CPU performance counters can identify noisy neighbors, examine real-world patterns that trigger interference (like garbage collection and container image decompression) and demonstrate practical approaches to measure these effects in Kubernetes environments.

The post Tools and Talks Worth Checking Out at KubeCon Europe appeared first on The New Stack.

Meet Kagent, Open Source Framework for AI Agents in Kubernetes

Heather Joslyn — Mon, 17 Mar 2025 13:00:22 +0000

Solo.io, a cloud native application networking company, today announced kagent, a new open source framework designed to help users build and run AI agents to speed up Kubernetes workflows.

Kagent, aimed at DevOps and platform engineers, offers tools, resources and AI agents that can help automate tasks such as configuration, troubleshooting, observability and network security.

The framework integrates with other cloud native tools through an architecture built on the Model Context Protocol (MCP). MCP, introduced by Anthropic in November, is intended to standardize how AI models integrate with APIs.

Kagent, built on Microsoft’s open source framework AutoGen, holds an Apache 2.0 open source license.

The project began as an internal solution to a customer problem, according to Lin Sun, Solo.io’s senior director of open source.

“We have hundreds of customers who are either running our gateway or mesh solutions,” Sun told The New Stack. “So as we were working with these customers, we have our internal supporting team. They are the face to work with these customers, and they help customers figure out what’s the right solution in the cloud native ecosystem? They help customers solve general problems and also domain-specific problems.”

In the wake of Hurricane Helene’s destructive path last fall through the U.S. Southeast, an insurance company that was a client of Solo.io’s reached out for help, after the insurer’s customers began trying to file online claims on their damaged homes.

“That weekend our team was called in for help, because there was some issue with the production side, and we were jumping to troubleshooting and try to identify where there are, like, 10 network hops. And where is the problem in that network hub?”

Solo.io’s customer-facing engineers wound up tapping its resident experts — who have deeper knowledge of Istio, Envoy, etc. — to help untangle the insurance company’s problems.

“That’s why we started thinking: how can we make ourselves more productive, as we continue to scale out as a company?” Sun said. “We added more customers to our list. So how do we leverage our in-house expertise more efficiently?

“We were thinking, how can we actually clone some of these experts? So we don’t have to pull them [in] for these critical situations, so that they can be either having a peaceful weekend, or they could focus on writing code and make their focus on innovation.”

A Wish List for the Community

Solo.io, which builds its products on open source projects, intends to donate kagent to the Cloud Native Computing Foundation (CNCF), Sun said. If this happens, it would follow the company’s donation in November of Gloo Gateway, a popular open source API gateway, to the CNCF.

The project, now called kgateway, was named an official CNCF Sandbox project this month.

Kagent’s initial launch includes tools for Argo, Helm, Istio, and Kubernetes, along with a Grafana and Prometheus observability tool. It also includes a cloud native expert knowledge base that can extend with any MCP-compatible tool server.

The framework includes three layers:

Tools: AI agents can use pre-defined functions, including a curated knowledge base, availability and performance metrics for services, controls for app deployment and life cycle, utilities for platform administration and debugging, and app security guardrails.
Agents: Autonomous systems that can plan and implement such tasks as canary deployments for new versions of a user’s applications, establishing a Zero Trust security policy for every service in a Kubernetes cluster, and debugging service failures.
A declarative API and controller: This allows the user to build and run agents via their UI, CLI and declarative configuration.

“What we are hoping for is this kagent as an inspiration for the community,” Sun said. “We seeded the project with a few sample agents, a few tools and also a framework integrated with Kubernetes. And then we’re hoping the rest of the community can help us enhance what we build, and also help us add additional agents to the catalog to greatly benefit the rest of the ecosystem.

“What I’m envisioning is, for every critical CNCF projects or cloud native projects out there, we have an agent in the catalog, so that when a new user comes to the cloud native landscape, they can have a project-specific agent sitting next to them, and they could even call multiple agents.”

Sun has an extensive wish list. In addition to urging users to try the existing tools and agents and contribute improvements, she offers other ideas.

“We want to have some tracing capability and maybe have some integration with [OpenTelemetry]. We want to have more metrics for kagent. We would also like to have a feedback system.”

And there’s more: ”We also would love to add multi-agent support. Right now, as part of the initial launch, we focus on single agent, but the framework is designed to support multiple agents.”

Sun would also like to add support for multiple large language models. “Right now the support is focused on OpenAI, which we believe is one of the best large language model out there. We do think it will work for other large language models as well. But we’ve only focused on testing.”

Developers interested in contributing to the project can connect via the CNCF Slack’s #kagent channel. Sun also encouraged people to stop by Solo.io’s booth — number S150— at KubeCon + CloudNativeCon Europe 2025, April 1-4.

The post Meet Kagent, Open Source Framework for AI Agents in Kubernetes appeared first on The New Stack.

AI in Network Observability: The Dawn of Network Intelligence

Christoph Pfister — Sun, 16 Mar 2025 15:00:28 +0000

Let’s face it. The modern network is a beast — a sprawling, complex organism of clouds, data centers, SaaS apps, home offices, and, depending on your industry vertical, factories, offices, retail locations, or branches. Mix in the internet as the backbone to connect them all, as well as an ever-increasing volume and velocity of data, and it becomes clear that traditional monitoring tools are now akin to peering through a keyhole to look at a vast landscape.

They simply can’t see the bigger picture, and a new approach is needed: Enter Artificial Intelligence (AI), the game-changer ushering in a new era of Network Intelligence.

From Reactive to Intelligent: The AI Revolution

Remember the days of watching hundreds of dashboards, sifting through endless logs, and deciphering cryptic alerts? Those days are fading fast. Machine Learning and Generative AI are transforming network observability from a reactive chore to a proactive science.

ML algorithms, trained on vast datasets of enriched, context-savvy network telemetry, can now detect anomalies in real-time, predict potential outages, foresee cost overruns, and even identify subtle performance degradations that would otherwise go unnoticed. Imagine an AI that can predict a spike in malicious traffic based on historical patterns and automatically trigger mitigations to block the attack and prevent disruption. That’s a straightforward example of the power of AI-driven observability, and it’s already possible today.

But AI’s role isn’t limited to number crunching. GenAI is revolutionizing how we interact with network data. Natural language interfaces allow engineers to ask questions like: “What’s causing latency on the East Coast?” and receive concise, insightful answers.

Kentik Journeys takes this further, offering an approachable, AI-augmented user experience with deep network context. By leveraging ML and GenAI and using our vast and uniquely enriched data, we provide insights into network behavior: surfacing anomalies, providing probable causes, and offering actionable recommendations for cost and performance optimizations.

Agentic AI Takes Center Stage

But the real revolution is yet to come. Imagine a network in which AI isn’t just a tool but an active participant, a digital colleague working alongside human engineers. This is the promise of agentic AI.

These aren’t your typical AI algorithms. Agentic AI systems possess a degree of autonomy, allowing them to make decisions and take actions within a defined framework. Think of them as digital network engineers, initially assisting with basic tasks but constantly learning and evolving, making them capable of handling routine assignments, troubleshooting fundamental issues, or optimizing network configurations.

For example, an agentic AI, noticing asymmetric routing in a cloud environment (which can add unnecessary cost), could initiate (or recommend to the human for final approval) a configuration change, creating an appropriate route in the cloud account to leverage existing VPC peering to reduce costs and improve performance.

At Kentik, our robust analytics and visualization capabilities provide the perfect foundation for developing an agentic AI for networks. Our deep understanding of network state and behavior can be used to train these AI agents, enabling them to make increasingly complex decisions and to take appropriate actions, leveraging agentic AI directly and with partners.

Efficiency, Scalability, and Intelligence

The advantages of AI-driven network intelligence include the following:

Proactive Insights: Detect anomalies before they impact users, preventing costly downtime, recommending fast remediation, and ensuring a seamless user experience.
Enhanced Efficiency: Reduce manual effort, freeing engineers to focus on strategic initiatives.
Improved Scalability: You can effortlessly handle the ever-growing volume of network data, simplifying the management of complex hybrid and multi-cloud deployments.
Operational Intelligence: Gain a holistic view of network health, enabling data-driven decisions for capacity planning and cost and performance optimization.

Use Cases: From Troubleshooting to Autonomous Optimization

The applications of AI in network observability are vast and varied:

Root Cause Analysis: Pinpoint the source of network problems, correlating events and metrics to identify the root cause.
Predictive Analytics: Anticipate potential issues from historical trends and proactively take steps to mitigate them.
Cost and Performance Optimization: Identify bottlenecks and optimize traffic flows to ensure optimal application performance and minimize cost
Security Enhancement: Detect and respond to security threats in real-time, protecting critical infrastructure from attacks.

With the advent of agentic AI, these use cases will expand even further. Imagine AI agents collaborating with human engineers, eliminating the drudgery and allowing humans to focus their attention and creativity on what matters.

Navigating the AI Landscape

While the potential of AI is immense, there are challenges to address:

Data Quality: AI agents and algorithms are only as good as the data they are trained on. Ensuring data accuracy and completeness is crucial. Validation is a necessary part of agentic AI systems.
Explainability: Understanding how AI models arrive at their conclusions is essential for building trust and ensuring responsible use.
Ethical Considerations: As AI agents become more autonomous over time, it’s critical to establish clear guidelines and ensure they operate within defined boundaries.

Addressing these challenges upfront as part of the design and development phases of AI initiatives is paramount [and that is precisely how we have gone about it at Kentik].

The Future: Network Intelligence

The future is network intelligence underpinned by AI. This will eventually enable networks that can self-heal, self-optimize, and adapt to changing conditions with little human intervention. In the near term, Agentic AI, personified by digital network engineers, will become an integral part of network operations, collaborating with human engineers to create a more efficient, reliable, and secure network infrastructure. This, in turn, will pave the way for a new and exciting era of digital innovation.

This is not science fiction. It’s the future we are building today.

The post AI in Network Observability: The Dawn of Network Intelligence appeared first on The New Stack.

Garuda Linux Might Be the Best Looking Desktop OS on the Market

Jack Wallen — Sun, 16 Mar 2025 14:00:02 +0000

I’m not going to lie, I’m a sucker for a pretty desktop, and within the realm of Linux, there are puh-lenty of them. On top of that, even a bland Linux desktop can be customized until it could very likely become prom queen.

Logo

And then there are other desktop distributions that go above and beyond to set themselves apart to become something totally “other.” That’s exactly what Garuda Linux has done for a long time, but the most recent release (Broadwing) is truly something special.

Garuda is based on Arch Linux and offers several versions, such as dr460nized, Hyprland, i3, and Sway, but it’s the default version I decided to take on this time.

I was not disappointed.

I’ve experienced the dr460nized version before and it’s a work of art, so I expected the Broadwing base release not to fall short. Garuda Broadwing uses the KDE Plasma desktop, complete with the Catppuccin theme which makes this release as beautiful as any on the market.

The developers have customized the KDE Plasma desktop with a top bar and a dock, so it slightly resembles the macOS desktop (only with a lot more panache). It’s one of the few desktops that ship with a dark(ish) theme that I don’t mind.

It really is beautiful.

Not Just a Pretty Face

But Garuda isn’t just a pretty face. This is a full-fledged desktop environment that is ready for any type of user. From average users to developers, Garuda has something for everyone. Even during the initial setup, you can add printer, scanner, and Samba support; add additional wallpapers; include pentesting software; and configure a large number of aspects, making the Garuda Setup Assistant one of the best welcome apps on the market.

The latest release includes the following features:

Kernel 6.13 for enhanced responsiveness and reduced latency.
Introduces Garuda Rani (Reliable Assistant for Native Installations), which is the new welcome app.
A new Garuda Mokka edition (the version I tested).
Dr460nized (KDE Plasma) received improved Panel Colorizer integration.
Hyprland has an updated screenshot script and special workspaces
i3 includes FontAwesome support and an improved CPU temperature display
Sway gets a new greet, lock screen, and revamped Waybar.

You can read all about the latest changes in this official Garuda announcement.

But what’s it like using Gardua Broadwing (Mokka edition)?

The first thing that impressed me was the onboarding app (Figure 2), where I could select from a list of kernels (even the Linux Hardened kernel), select from several office applications (office suites and finance apps), choose my browser of choice, choose an email client, add various communication apps, media players, and graphic tools, and even choose any development tool I might need.

Figure 2

This onboarding app is about as complete as you can imagine and it makes onboarding Garuda painless.

The end result was an exciting desktop that was as functional as it was beautiful. And when you’ve taken care of the onboarding, you’re then greeted with yet another Welcome app which is even more impressive than the first (Figure 3).

Figure 3: Welcome to Garuda, part 2.

The goodness of Garuda keeps going and going and going.

The new Welcome app gives you quick access to various features, such as maintenance, system settings, gaming apps, and so much more. Scan through that app, do whatever you need, and then close it to see all the beauty that is Garuda Mokko. Click the Garuda icon at the top left of the display to reveal the main menu (Figure 3), which further highlights the aesthetic of this desktop distribution.

Figure 3: This desktop is just stunning.

Keep in mind that Garuda is an Arch-based Linux, so it’s not going to be as user-friendly as a distribution based on Debian or Ubuntu. And even with this being a KDE Plasma desktop, you’ll need to jump through a hoop or two to get the Discover package manager front end to work. Said hoop is installing Flatpak, which can be done with the sudo command:

sudo pacman -S flatpak

Once you’ve installed Flatpak, you can open Discover and find all the Flatpak apps you need. If, however, you want to install from the standard repositories, you’ll either have to use the command line or a GUI like Octopi.

As far as the basic apps, you’ll find the Firedragon browser, Snapper tools (a powerful snapshot management tool for Btrfs and LVM volumes), an AppImage launcher, and so much more.

Who Is Garuda Broadwing (Mokka Edition) for?

I wouldn’t say that the Garuda Broadwing Mokka edition is for everyone. No matter how beautiful it is, it’s still Arch Linux, which does require a bit more Linux experience in order to enjoy the desktop without a certain level of frustration. I’ve also found this release is a bit more demanding on system resources, which are listed as:

Storage: A minimum of 30 GB storage space is required.
RAM: At least 4 GB of RAM is recommended.
Video Card: A video card with OpenGL 3.3 or higher is required.
System Architecture: A 64-bit system is necessary.
Installation Media: A thumb drive of at least 4 GB is needed for regular versions, while the gaming edition requires 8 GB.

I ran my VM on 3GB of RAM and found it to be less than responsive. After bumping that up to 5GB with 2 CPU cores, it ran much better but still wasn’t as responsive as I would have liked. I’m chalking that up to running Garuda as a virtual machine (with VirtualBox).

My Final Take

If you’re not as concerned about aesthetics as I am, Garuda Broadwing might not be the distribution for you. If, however, you like to wow your friends and family with a gorgeous desktop, I would highly recommend you give this distribution a try. Of every distribution I’ve tested, this might be the most beautiful to date.

The post Garuda Linux Might Be the Best Looking Desktop OS on the Market appeared first on The New Stack.

The Growing Significance of Observability in Cloud Native Environments

Vimal Patel — Sat, 15 Mar 2025 15:00:43 +0000

Imagine this scenario: It’s midnight, and a global online retail platform suddenly faces a spike in transaction failures. The operations team rushes to identify the problem, yet their conventional monitoring tools yield only high-level metrics without pinpointing the underlying cause. After several hours of troubleshooting, they discovered a latency problem in a third-party payment API. This type of scenario is becoming increasingly frequent as contemporary cloud architectures become more intricate.
This is the point where cloud native observability comes into play.

In 2025, observability will surpass fundamental log metrics and traces, incorporating AI open source frameworks and security-oriented strategies to generate extensive insights into system behavior. Let’s examine the crucial trends shaping the future of observability.

AI-Enabled Observability

Foreseeing Problems Prior to Their Occurrence

The period of reactive observability has become a relic of the past. By incorporating AI and machine learning into observability platforms, teams can effectively move towards predictive monitoring. AI-enabled observability solutions assess historical data, pinpoint patterns, and predict potential issues before they impact users. As an example, AI-driven anomaly detection can spot minor changes in microservices response times and alert engineers in advance of service outages. Companies like New Relic and Dynatrace are at the forefront of improving AI-driven insights, and we expect that by 2025, there will be significant progress in automation related to root cause analysis, autonomous systems, and dynamic observability dashboards.

Primary Benefits of AI in Observability

Quicker Incident Resolution: AI decreases mean time to detection (MTTD) and mean time to recovery (MTTR) by refining the root cause analysis process.
Proactive Performance Enhancement: Predictive analytics enable engineering teams to adjust applications before potential performance issues.
Alert Noise Mitigation: AI differentiates significant alerts from non-critical ones, focusing attention on essential matters while minimizing alert fatigue.

OpenTelemetry and Open Source Observability Standards

As vendor lock-in has presented considerable challenges in the observability sector, OpenTelemetry (OTel) and Open-Source Observability Standards are now positioned to revolutionize the industry. As the leading standard for gathering distributed traces, metrics, and logs, OpenTelemetry is seeing robust adoption among cloud service providers and enterprises.

By 2025, OpenTelemetry’s ecosystem is anticipated to further broaden, featuring improved integrations, enhanced trace visualization capabilities, and better support for event-driven architectures. An increasing number of organizations are likely to transition from proprietary agents, opting instead for OTel’s versatility in instrumenting applications throughout hybrid and multicloud setups.

Significance of OpenTelemetry

Standardization: An all-encompassing framework for collecting telemetry data in multiple environments.
Interoperability: Smooth integration with cloud-native observability tools, including Prometheus Grafana and Jaeger.
Cost Efficiency: Lowers operational expenses by removing the necessity for various proprietary agents.

Setting Up OpenTelemetry for Distributed Tracing Within Kubernetes

To assist you in getting started with OpenTelemetry, here’s a detailed guide on how to implement distributed tracing in a Kubernetes setting.

Step 1: Deploy the OpenTelemetry Collector

Create a Kubernetes namespace specifically for observability.

kubectl create namespace observability

Deploy the OpenTelemetry Collector via Helm.

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts 
helm repo update
helm install otel-collector open-telemetry/opentelemetry-collector -n observability

Step 2: Instrument Your Application

Incorporate OpenTelemetry SDKs into your application (example in Python).

pip install opentelemetry-sdk opentelemetry-exporter-otlp

Configure the application to relay traces to the OpenTelemetry Collector.

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter
tracer_provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
tracer_provider.add_span_processor(processor)

Step 3: Visualize Traces in Jaeger

Deploy Jaeger for trace visualization:

kubectl apply -f 
https://raw.githubusercontent.com/jaegertracing/jaeger-kubernetes/master/all-in-one/jaeger-all-in-one-template.yml

Access the Jaeger UI:

kubectl port-forward svc/jaeger-query 16686:16686 -n observability
Open http://localhost:16686 in your browser to view traces.

By following these steps, you can gain real-time visibility into microservices interactions and detect performance bottlenecks more effectively.

DevSecOps: The Convergence of Security and Observability

Security is no longer a separate function — it’s becoming an integral part of observability. As organizations implement DevSecOps workflow, the focus on security monitoring moves leftward, allowing for earlier detection of security vulnerabilities within the software development lifecycle. For instance, observability tools now feature real-time threat detection by scrutinizing application logs for irregular patterns that may indicate a security compromise. By 2025, security observability will encompass,

SBOM (Software Bill of Materials) Monitoring to uncover vulnerabilities in software dependencies;
Runtime Security Observability for the identification and mitigation of threats as they occur.
Compliance Automation to guarantee that cloud environments comply with regulatory standards such as GDPR and HIPAA.

The Influence of FinOps on Observability Expenditures

Observability incurs significant costs, and as organizations enhance their telemetry data collection, cloud spending can rapidly escalate. This is where FinOps (Cloud Financial Management) becomes indispensable.

In 2025, many companies will embrace cost-conscious observability, balancing visibility and financial limitations. FinOps-informed observability tactics will encompass

Smart Data Retention: Preserving high-value telemetry information while eliminating superfluous logs.
Dynamic Sampling Rates: Adapting trace sampling in response to system workload fluctuations.
Cloud-Based Cost Analytics: Delivering insights regarding observability expenditures for effective cost management.

Final Reflections

The Evolution of Observability As the adoption of cloud-native technologies accelerates, observability has become an essential factor in ensuring performance reliability and security. With the emergence of AI-driven analytics, open-source telemetry security integrations, and cost-effective approaches, organizations are well-positioned to enhance their observability frameworks in the future.

In the years ahead, engineering teams that embrace these innovations will be more adept at managing the complexities associated with modern cloud environments. Whether you are a DevOps engineer, Site Reliability Engineer (SRE), or security analyst, now is an opportune moment to reassess your observability strategies and prepare for the forthcoming innovations.

The post The Growing Significance of Observability in Cloud Native Environments appeared first on The New Stack.

The New Stack | DevOps, Open Source, and Cloud Native News

Choosing the Right Red Hat AI Solution: RHEL AI vs. OpenShift AI

Comparing RHEL AI and OpenShift AI

RHEL AI: A Foundation for Individual Servers

Why Starting With RHEL AI Makes Sense

OpenShift AI: Built for Scalable, Enterprise-Grade AI

Not Every AI Project Needs This Much Overhead

Which AI Solution Is Right for You?

Five Critical Shifts for Cloud Native at a Crossroads

Purpose-Built OSes as a More Secure Foundation

Moving Kubernetes Beyond Public Cloud Dependencies

Declarative Principles as the New Infrastructure Standard

Cloud Native Architecture as a Sustainability Driver

The Edge as the Next Evolution

Act Today to Build Tomorrow’s Cloud Native Infrastructure

Pagoda: A Web Development Starter Kit for Go Programmers

Pagoda: A Starter Kit for Go

The Pagoda Frontend

The Backend of Pagoda

CIQ Previews a Security-Hardened Enterprise Linux

RamaLama Project Brings Containers and AI Together

How RamaLama Got Started

How RamaLama Works

RamaLama’s Support for Hardware and Other Tools

What’s Ahead for AI?

After DeepSeek, NVIDIA Puts Its Focus on Inference at GTC

From Basics to Best Practices: Python Regex Mastery

What Does Regex Do?

Regex Best Practices

The re Module

Regex Categories and Their Applications

Characters and Literals

Character Classes

Quantifiers

Anchors

Groups and Captures

Alternation

Escaping Special Characters

Modifiers or Flags

Conclusion

The ROI of Speed: How Fast Code Delivery Saves Millions

Adding ROI to the Mix

Speed Is Key

Other Metrics Factor In

RecurShip and ROI

AI Will Bring Promise, Change

Strategies, Big and Small

(Team) Size Matters

AI Coding Trends: Developer Tools To Watch in 2025

One Dev Tool To Rule Them All

The Ideal Prototyping Tool

Cloud Native Tooling for AI

Conclusion

Coming Soon: New Ebook on Cloud Sustainability

What You’ll Learn

Table of Contents

AI Agents in Doubt: Reducing Uncertainty in Agentic Workflows

What Is Uncertainty in the Context of Agentic Workflows?

Strategies to Reduce Uncertainty in Agents

Human in the Loop

High Awareness of Uncertainty

Secondary Checking

Specialization

Menu of Possible Actions

Uncertainty in the Future of Agentic Workflows

Can You Trust Your Dashboard? The Critical Role of Data Freshness

How We Solved the Problem: ‘When Was This Updated?’

Our Design Philosophy

A Framework for the Technical Solution

Let Productivity Metrics and DevEx Drive Each Other

How Are Teams Measuring Developer Productivity?

Make Changes Where You Know They’ll Matter

The Inner Dev Loop Is Evolving

Can Productivity Be a Feel-Good Goal?

Improving Internal DevEx Feeds Productivity

Internal DevEx Focuses on the Inner Dev Loop

What Metrics Matter for Internal DevEx?

Build a More Resilient Team for Whatever Comes Next

Microsoft TypeScript Devs Explain Why They Chose Go Over Rust, C#

But Why Not C#?

The `re` Module