SecDevOps.comSecDevOps.com
How Can We Solve Observability’s Data Capture and Spending Problem?

How Can We Solve Observability’s Data Capture and Spending Problem?

The New Stack(3 weeks ago)Updated 3 weeks ago

The so-called “DevOps practitioner” can be a developer, an operations team member, a site reliability engineer or a business stakeholder. What they all have in common is that they must be able to...

The so-called “DevOps practitioner” can be a developer, an operations team member, a site reliability engineer or a business stakeholder. What they all have in common is that they must be able to interpret telemetry data in order to make business decisions. What they face is sprawl, siloed teams, and cost concerns. Add to the list making sure heavy investments in observability are at least paying for themselves. “If you’re not collecting the data about what the application is doing at per-second granularity, and at 100% capture, you end up missing a life cycle or multiples of what was actually occurring inside the application ecosystem,” said Jacob Yackenovich, director of product management at IBM, in this On the Road episode of The New Stack Makers. “You’ll miss those peaks and valleys that give you those key signals about something that is erroneous and or anomalous in nature, because you’re polling or you’re doing a derivative or sampling. What ends up happening is you end up with a combined set of technology in an application that is heritage technology, along with cloud native technology in the same application.” In this episode of TNS Makers, recorded at KubeCon + CloudNativeCon North America in Atlanta, I sat down with Yackenovich to discuss how observability must adapt and improve to often radically changing needs — and rising cost concerns. Improvements are required so observability systems not only offer effective analysis but are accessible to any stakeholder, regardless of whether they are a technical user or not. “I think of it as a service that I provide to my end users, and I want to know, is my service healthy performance from the end user’s perspective, and if there is a problem, are my operations teams not working with their intellectual property between their ears, a judgment call of their fiefdom, or they think this particular area is more important?,” Yackenovich said. “No, they’re working on particular issues — be it P3s or P2s, in the context of the relative impact that circumstance has to the overall business.” Integrating AI: Add-On Style and Blocking Style It is almost impossible to exclude AI from conversations about DevOps today. To wit, how the recent release of Kubernetes can now handle AI workloads was a major topic of discussion at KubeCon. So where does that leave observability?  The introduction of AI and large language models (LLMs) into application ecosystems, Yackenovich said, presents novel challenges for observability. He described two primary approaches technologists are using to integrate these capabilities: add-on style and blocking style. The add-on style involves adding a new service or experience, such as a chatbot, to an existing application workload. If this generative AI component is unresponsive or commits errors, the end user can still “conduct the business of the application,” Yackenovich said. The blocking style involves integrating generative AI for critical tasks, such as “to check for false information that’s coming in the application or fraud details,” he said. In this case, the AI microservice is part of the “overall workflow of the job to be done in the given application.” Tradeoffs Between Costs and System Health Spending remains a central concern for organizations, particularly as cloud and data-ingestion fees from major vendors continue to rise. The enormous cost associated with collecting all telemetry data — specifically the price of ingress — has become a significant challenge. Many observability tools attempt to filter or limit data ingestion to help manage these expenses, shifting towards business-impact-based prioritization, even as though doing so can make it harder to diagnose system problems. Cost pressures should not force a business to turn a “blind eye into your operations ecosystem,” Yackenovich said. He argued that organizations should not be put in a position where they must decide which applications they can afford to observe. With observability costs reaching “10, 12% of your annual recurring revenue in terms of the bill, or more,” he said, the situation requires a “re-analysis.” IBM’s proposed solution is to offer a predictable, fixed-price model. “You want to know what the price is going to be before you get into it,” he said. “As opposed to, I really want to take advantage of this new use case or feature, but I got to look at the menu and realize what that budget is, and then I have to open up a spreadsheet to try to do the math formula of what the bill is going to potentially look like.” Check out the full episode for more of our conversation. The post How Can We Solve Observability’s Data Capture and Spending Problem? appeared first on The New Stack.

Source: This article was originally published on The New Stack

Read full article on source →

Related Articles