Product teams love using aviation terms like “instrumentation” or “telemetry,” and not just because it makes us feel clever. Imagine you’re flying a plane at night: how do you know which direction you’re traveling? Do you have enough fuel? Are you going to crash into a mountain?

Now imagine you’re responsible for growing YouTube’s engagement. You have billions of users and a complicated ecosystem. How do you know whether you’re increasing user engagement? Why are people using your product more or less? And what the heck is “engagement,” anyway?

Using Andy Grove’s framing from High Output Management: without instrumentation, telemetry, or indicators—all referred to as metrics below—our product is a “black box.” In this black box, input (code, design, hardware, time) goes in, and output (revenue, people using the product) comes out. Metrics help us quantify output and cut windows into the black box to understand why we’re seeing a particular output. They’re a tool that helps us:

The Types of Metrics Used by Product Teams

Metrics vary across industries, companies, & teams within companies. Instead of attempting to be exhaustive, I’ll focus on the core concepts applicable to most sets of product metrics.

Measure Success, Empathy, and Health

Define True-north, Signpost, and Guardrail Metrics

True-north is the singular metric used to track your product’s success (or output in our black box analogy). There can only be one, but we can supplement it with signpost and guardrail metrics. True-n0rth metrics are usually “late-funnel,” i.e., the culmination of a funnel involving many different steps. As a result, they’re often hard to move. A few examples from products you’ve probably used:

Signpost metrics don’t define our product’s success. But, using our black box analogy, they help us understand why our true-north is moving in a particular direction (or any other “important to keep an eye on” parts of our product). For example, if Spotify’s true north metric is “aggregate listening time,” three signpost metrics might be the (1) number of unique users, (2) unique songs played per user, and (3) average song length.

Guardrail metrics are used when running A/B experiments. They’re what we watch to ensure a change we’re making doesn’t negatively impact another part of our product. For example, you might run an experiment to increase paid user acquisition, offering a heavy first-month discount. A useful guardrail metric could be month two retention to help you understand whether you’re providing a good enough experience to retain these users over time (before further investing in user acquisition).

Identify Facts and Meaning

With a large enough sample size, qualitative data can become quantitative. E.g., recording enough user interactions can generate a heatmap; surveying enough people will reveal keyword trends. But in doing this, you’re at risk of losing the meaning behind the feedback.

Remember that any suite of product metrics needs a way of identifying facts and meaning.

Rules of Thumb When Working with Metrics

Defining, implementing, and using metrics is an imperfect and iterative process. At the same time, there are three (and probably more—I just like “three”) things that I find myself repeating.

Metrics Blindly Used as a Goals Cease to Be Good Metrics

Moving a metric is not the goal. Achieving a business or product outcome is the goal, and a metric is a way of quantifying whether you’re achieving that goal. Imagine you’re launching a new login experience to get more people into your product. You need to define a true-north metric to measure whether your new experience is better or worse. You have two choices:

  1. Impression-to-session rate: of the people who view the login screen, what percent successfully log in?
  2. Attempt-to-session rate: of the people who interact with the login screen, what percent successfully log in?

Like any public page, a login screen is vulnerable to bot traffic. Bot views (impressions) are numerous and hard to identify, but bots rarely interact (attempt), and they never log in. Given impression-to-session is vulnerable to bot traffic, attempt-to-session is a more precise measure of success, right?

Let’s find out. Imagine your users are parents and grandparents:

You launch an experiment that’s effective at increasing grandparent login attempts. But because only 50% of grandparents’ login attempts are successful, and they now represent a larger portion of attempts, your true-north attempt-to-session rate drops from 65% to 63%:

The attempt-to-session rate dropped from 65% to 63%.

You shouldn’t ramp this experiment because the true-north metric has dropped, right? Nope. Your goal isn’t to increase the attempt-to-session rate. Your goal is to get more people into the product; this experiment is doing that by +12%:

But at the same time, there’s a +12% gain in successes.

Does this mean that impression-to-session is a better choice? Not necessarily, because impression-to-session is still vulnerable to bot traffic. Metrics are rarely perfect, and mix shift, as shown above, is only one example of this. The point here is to be aware of where metrics can be misleading and set appropriate guardrails. Again—moving a metric isn’t the goal; achieving a business or product outcome is the goal.

Metrics Should Be Easily Understandable

Not only do metrics help measure progress toward a goal, but they also help unite a team and empower members of that team to operate with increased autonomy. To achieve this, a metric needs to be understood by a large and diverse group of people. If they can’t, it isn’t a good metric. There are some exceptions for highly technical teams (e.g., search), but as a general rule, keep it simple—could you quickly explain a metric to your CEO?

The Most Important Decisions Usually Don’t Have Perfect Data

Data and metrics don’t exist in isolation; they live in a broader decision-making ecosystem. In building products, metrics have two siblings: “gut feel” and “empathy.” These siblings are usually less famous but no less important. They’ll occasionally be described as “product sense” or “product gut,” meaning an intuition about what the right product is.

Product teams often stall while waiting for perfect data, and I can empathize. They’re nervous, and they don’t want to make the wrong decision. But there’s a cost to inaction, so always keep two factors in mind when making difficult product decisions:

  1. How reversible is this decision? Is the cost of making the wrong decision higher than the opportunity cost of waiting for perfect data?
  2. If we don’t have great data, do we have deep empathy? You can live without one if the other is strong, but you can’t live without both.

Take-Home: Ten Questions to Ask

While prescriptive guidance on metrics isn’t helpful because they vary so dramatically, here are a few open-ended questions you can ask when defining your own suite of product metrics:

  1. What goal are you trying to achieve—what’s your mission and vision?
  2. How will you know when you’ve achieved that goal?
  3. How will you measure progress toward that goal?
  4. What actions do you want your users to take, and how will you measure them?
  5. How can you measure aggregate user behavior and find context & empathy?
  6. Are your metrics absolute or relative—is your goal to get X users or a certain percent?
  7. How do you know that your product is working reliably for 99.9% of users?
  8. Can you explain your true-north metric in two sentences or less?
  9. What guardrail metrics do you track when running experiments?
  10. How do customers feel about your product?

There’s a lot to chew on here. Product metrics are as broad as they are nuanced. But investing time and resources into identifying and building the right metrics is one of the highest-leverage activities any product team can do.