The season is changing for frontend observability, as we’re seeing great community involvement in improving OpenTelemetry support for web apps
The post OpenTelemetry Experts on Tough Telemetry...
We’re so glad you’re here. You can expect all the best TNS content to arrive
Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences
and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your
first TNS newsletter.
The season is changing for frontend observability, as we’re seeing great community involvement in improving OpenTelemetry support for web apps and mobile apps. For example, there’s a new Browser Special Interest Group (SIG) in the OpenTelemetry project, and they’re working to improve OTel support for the browser runtime. You can learn more about what they’ll be working on in this on-demand panel discussion.The OTel community also has dedicated Android and Swift SIGs for improving the APIs, instrumentation libraries and semantic conventions for collecting telemetry on the two native mobile app platforms. And organizations are taking note, with a recent survey conducted by Enterprise Management Associates (EMA) revealing that adoption of OpenTelemetry for mobile data collection is set to triple in the next 12 to 24 months.I sat down with several key members of the Android and Swift SIGs for a fun, fall-themed panel discussion on the key challenges in mobile telemetry collection and the state of OpenTelemetry support for mobile. Panelists included:When mobile developers use OpenTelemetry, they must be mindful of the sheer scale of data that mobile apps can generate. Buchanan mentioned that while backend systems run on thousands of clients under tightly controlled conditions, mobile apps can run on millions of clients.Demarco chimed in, “That also leads to the problem of data volumes, because depending on the app, a mobile application can generate an enormous amount of telemetry. So, unlike backends that you can control sampling centrally, in mobile, the sampling decisions probably should be made on-device with kind of limited visibility into the bigger picture. And then you have the question, if you oversample, you’ll waste a lot of bandwidth or battery. […] But if you undersample, you probably miss critical telemetry that is necessary to identify issues or understand behaviors.”Mobile developers are also hyper-focused on the performance of their apps, which can be affected by the operational cost of capturing telemetry. Plumb mentioned several things developers must keep in mind, including which API calls the app must make to the platform, how long the app spends in those callbacks or event handlers and also the payload size of network requests on the wire.“Efficiently handling those payloads is also something people, I think, are specifically challenged with on mobile that doesn’t exist in other platforms, and we don’t have the luxury of just …[scaling] horizontally, like, fire up a few more instances,” said Plumb.The platforms that mobile apps run on are also tightly controlled by Google and Apple. As Bonafonte said, “The privacy that the platform puts you in is something that’s difficult.” Mobile developers need support from the operating system to collect data, so if the system doesn’t allow them to collect certain types of telemetry, they’re limited in how they can effectively observe their applications.Unlike servers, mobile apps have a life cycle complexity, which can make it incredibly difficult to understand the conditions that lead to issues.As Demarco pointed out, “Mobile apps don’t run continuously, so they are suspended, backgrounded, terminated, killed by OS, there’s a crash, … the OS can pre-warm your application, the application could launch because of a push notification, a background fetch or because a human tapped into the icon. So, when do you flush your telemetry? … How do you track session continuity across app restarts? What happens to, I don’t know, in-flight spans whenever there’s a crash, or the OS kills your process? So there’s a bunch of complexity in terms of what do you decide to do in those cases? And it’s not trivial … just solving one of those questions is not a one-liner thing you’ll solve in your code. It’s something you really have to think through to actually solve that.”Traditionally, observability is seen as being within the purview of backend teams, and as such, mobile developers frequently don’t understand it. Ho mentioned that mobile developers generally interact with OpenTelemetry because they’re told to as opposed to being something they themselves reach for.“Tracing and … telemetry is not a core competency of mobile developers … because, you know, the challenges that they face are different. … There’s so much to actually teach a team, and the architecture, the mobile app architectures also aren’t super well designed for maintainable instrumentation,” said Ho.Product managers might want better visibility to explain the performance (or lack thereof) in a new feature, so they ask mobile developers to collect more observability data. But neither the mobile developer nor the product manager knows what to collect. This lack of clarity when it comes to observability instrumentation for mobile apps was a common thread in our discussion.Buchanan mentioned that even something as simple as when you should start a span is not trivial on a mobile device. “On a backend, it’s very trivial. It’s like, ‘Oh, when I get a request, that’s when a span starts.’ But for a mobile developer, … should I do it when somebody clicks a button? When a network starts? … There’s no right answer to that, like, how should you instrument that? It really depends on what your app does and what you’re trying to monitor.”Plumb agreed that OpenTelemetry doesn’t have excellent guidance for developers around some of these client-side use cases.“We don’t yet have a really good data model or just a conceptual description of what sessions are.”He contrasted this challenge with backend observability tooling that has several use cases very well-defined at this point. For example, every vendor that has a tracing solution is going to have a trace waterfall view, and every real user monitoring (RUM) vendor is going to have a way to analyze funnels.As Ho pointed out, “When you’re a backend service, the goal is to take the request and shoot out the response. You want to log how long that took and if there’s anything interesting that’s happening in the middle. The goal is simple. The goal of a mobile app is to be defined.”What the Uber Eats team cares about is different from the Pinterest team, which is different from a banking app.“To understand the goals and translating that into what kind of telemetry is a non-trivial leap. It seems trivial, if you haven’t done it, but when you do it, you’re, like, ‘I care about everything.’ Do you really care about everything?” said Ho.The Android and Swift SIGs are improving the developer experience of using OpenTelemetry. Beyond manually capturing key OpenTelemetry signals of logs and traces, both SDKs can also capture mobile-specific telemetry:The Swift SIG also addressed a key challenge that stems from working in Apple’s tightly controlled mobile platform. Apple’s official package manager, Swift Package Manager, requires downloading all dependencies of all libraries in your projects, even if you don’t use them in your application. As a consequence, the OpenTelemetry Swift repository was very large, which meant mobile developers faced large package download sizes to use OTel in their iOS apps.As Bonafonte shared, “[OpenTelemetry Swift] had to support a protobuf OTLP [OpenTelemetry Line Protocol] protocol with protobuf, and that means that you have a dependency on Apple on a library from Apple that has a dependency of another library from Apple, and it has a dependency of another library and another and another and another.”Ari chimed in, “Whenever you have to download it, or compile your application, run tests, run this in CI, build the application and deploy that, all that takes a bunch of time, and obviously, for example, in terms of CI, minutes is money, so … for every single iOS developer, it was going to be a pain. And probably, maybe they just wanted to use the API or just our implementation of the OpenTelemetry SDK.”As a solution, the Swift SIG divided the code into two separate repositories. The official OpenTelemetry Swift repository is the main repository, and it contains everything needed to work with OTLP. The maintainers created another repository called OpenTelemetry Swift Core, which only contains the OpenTelemetry Swift API and OpenTelemetry Swift SDK. Those two pieces are the bare minimum to get started, create traces and emit logs. iOS developers can now instrument applications, process data and export it without all the overhead of the main repository.The Android SIG is working on three main improvements. The first is better stabilization for the initialization API for the Android agent, and is expected to be completed soon. The second is broadening the instrumentation, which includes enhancing support for build-time auto-instrumentation.As Plumb said, “The third category, which is, I think, maybe just as important, are semantic conventions. … With every bit of instrumentation, with every kind of new feature that we’re adding, we’re trying to mirror that in the semantic conventions, even if the first pass is in development or experimental, at least having that out there and documented, what it means, what the intent is when you see a piece of data marked with this name, what these attributes hang off of it mean.”The challenge is being inclusive of all the different opinions when it comes to observing mobile apps. An example Ho gave for the complexity in defining a mobile session was the problem of foregro