Tim Farrell     health + engineering

Product Analytics from Scratch

Becoming Data-driven

Being a data-driven product team is critical to being competitive in the modern digital product marketplace.[1][2] However, many teams tend to overweight the importance of technology adoption, relative to culture and process change, in efforts to become more data-driven.[3]

Thus, product teams should begin their data-driven transformation by firstly buying into and applying current best practices in digital product analytics. And, then only secondarily, adopting modern data tools that facilitate and accelerate that transformation.

Optimizing the User Journey

The vision for any product is to delight users and make money in the process. There are many ways to measure how well teams and their products are achieving this vision, however, more than likely these measures will be lagging indicators, which are downstream measures of success that are difficult to control and not helpful guides for product development.

One example is product revenue, an obvious indicator of success but, by itself, doesn’t indicate what teams should do to improve it. More useful are leading indicators: measures that are more directly influenced by product improvements and ideally correlate well with lagging indicators.

A technique commonly used to develop leading indicators in consumer applications is user journey mapping. The idea is to map out the steps users complete to get to a successful outcome, often depicted as a funnel (since apps should be designed to “funnel” users towards the desired outcome). An example might look like this:

Pirates?!

One popular approach to defining the stage of a user journey is the “AARRR!” framework (also known as the Pirate metrics framework), where each letter in the acronym represents a step in the customer journey: acquisition, activation, retention, referral and revenue.[4]

Using this framework, we now have a clearer, more mechanistic picture of how we can increase revenue over time. If the team adds features that increase user retention, those users will be more likely to refer other users and then eventually more likely to contribute to revenue, which ultimately will lead to positive outcomes for the product and company.

From here, the would want to develop and operationalize specific measures at each of these stages, which can then be used to gauge how product changes impact performance at each stage. For this, we lean on another 5-step framework.

Let’s Talk Numbers

One well-known 5-step process for developing successful metrics is as follows: define, measure, analyze, improve and control. Called DMAIC for short, this is a Six Sigma process improvement method and was adopted by Amazon to develop metrics across their various business units.[5]

Importantly, this is an end-to-end process not only for defining and implementing metrics up front but also for continuously refining the definitions and implementations until they successfully aid product improvement, which in this case means they successfully correlate product improvements with leading indicators and leading with lagging indicators.

Define

The purpose of this step is to define how metrics quantify customer behavior at each user journey stage. This step is basically the mock-up design stage of analytics development.

Stage Metric Definition Expected Rate
Acquisition Visitor Visits landing page 100%
Acquisition Happy visitor Views 3+ pages, stays 30+ sec and clicks 3+ buttons 40%
Activation Registered Completes user onboarding 5%
Retention Repeat visitor 3+ visits in first 30 days 3%
Retention Weekly visitor Completes weekly session 50%+ of weeks 2%
Retention Daily visitor Completes daily session 25%+ of days 1%
Referral Recommender Refer 1+ user who visits the site 1%
Revenue Paying customer Monthly or yearly paid subscriber <1%

Measure

This step is performed by the data engineering/ product team, where data engineers develop software that accurately and reliably implements these measurements. A recent trend in this space is to leverage technologies now commonly referred to as the “modern data stack”, rather than developing tools in-house.

Side note: The modern data stack, or MDS, is a modular architecture built using cloud-based, open-source, usually managed solutions and is becoming the standard approach to implementing product analytics quickly and reliably, since home-grown systems can be expensive to develop and maintain.[6]

For implementing measurements, MDS tools that could be used are: Snowplow[7] for tracking user behavior events in applications, BigQuery[8] for storing data in a data warehouse and dbt[9] for data transformations. An example implementation of the “weekly visitor” metric might look like:

The nice part about using tools like Snowplow is they have built-in functionality for dealing with data quality issues, allowing users to define data schema and validation checks. It automatically saves records that fail these checks so data analysts can go back, analyze and diagnose why these failures might be happening.

Analyze

This step is all about deeply understanding all factors that influence a metric implementation. To do this, typically a team will implement a dashboard and data visualization layer so they can observe metrics over time and begin to ask questions about them. The technologies commonly used at this layer are Looker[10] or Mode[11]. An example implementation would look something like this:

A common occurrence at this stage is to encounter bugs or issues with the data, and to initiate a correction of error process (another best practice used at Amazon) to investigate the root cause of the issue and address that cause directly to improve quality.[12]

Another aspect to this step is to understand how metrics differ over time between different cohorts of users–for example, age and location demographics–which can be very useful for increasing user retention.[13]

Improve

In contrast to the prior step, the purpose of this step is to understand relationships between metrics, rather than examining them in isolation. Specifically, the goal is to understand how leading (or upstream) metrics impact lagging (or downstream) metrics. For example, the data product team might implement a dashboard visualizing both weekly and daily visitors to get a sense whether there is a correlation between the two metrics.

This functionality can be used to understand what leading indicators are the best predictors of revenue, the most important lagging indicator. If no current metrics serve this function, then additional metrics should be developed and tested in subsequent iterations.

When a company and product team get to a certain level of maturity, another important aspect to this stage is experimentation, or oftentimes called A/B testing. The idea is to understand how updating the product UI/ UX might influence certain metrics. Experimentation is a nuanced process that requires coordination between product, analytics and data science teams. There are modern tools that can simplify the process a bit (e.g. Amplitude Experiment[14]), but in general experimentation requires a large amount of cross-coordination.

Control

In the final step of the DMAIC process, the goal is to demonstrate that the team can control and manipulate specific lagging indicators by changing or experimenting with certain leading metrics. In short, to operationalize product analytics. One interesting part of this step, now possible with MDS tooling, could be to implement so-called “reverse ETL” to export metrics and other data back into upstream product marketing and engagement tools (e.g. MailChimp) for the purposes of increasing user engagement, retention and improving marketing.[15]

Part of this step can also involve adding additional automation to the data system such that data quality and reliability indicators are tracked clearly in dashboards and operationalized into notifications, where applicable.

Conclusion

To sum up, the highest impact analytics practice product teams can adopt is to map/ define, measure and analyze its users’ journeys. With that, the highest value data product is one that measures and analyzes changes in user journey metrics as they relate to each other, which can ultimately be used to increase retention and revenue. In terms of prioritization, user retention seem most likely to have the highest direct impact on revenue and so should be prioritized for experimentation and control.

In terms of data system architecture and implementation, the industry is quickly coming to a consensus that the “modern data stack” is the best approach for implementing reliable product analytics systems, especially for lean teams. As product teams progress in their effort to become more data-driven, they should adopt these tools opportunistically to facilitate and accelerate their data transformations.

Product Operations versus Development

Note: this post is mostly about software or digital products , but the general principles apply in some way to all types of products.

Product Operations

Product operations, or “product ops”, has recently been used to describe the “operationalization” of the feedback loop between product, engineering and customer success[1]. I actually think this is, in essence, operationalizing the product management function, so would be better termed “product management ops”. Or, since it relies heavily on data, “product analytics”.

The way I’m using product operations here is: manual or semi-automated processes the team has to perform to directly serve customers, as part of value delivery. For some products, the operational component is larger than others. Say, when the product is more like a service, or if the product is a technical product at an earlier/ less mature/ R&D stage (like with early stage AI or data products).

DevOps, Technical Debt and Development

There is a correlation/ relationship between technical debt and operations (i.e. manual processes) performed by the engineering team: the more technical debt a product has, the larger the operational component. However, I wouldn’t consider these product operations but rather development operations (or DevOps) since these operations don’t directly support customers.

One specific example is if the development team doesn’t yet have CI/CD set up and so the code deployment process is manual and cumbersome: this is technical debt that leads to increased operational load on the engineering team, as part of development, but shouldn’t impact product operations per se. Product operations are more like operations where manual processes have to take place in order to fulfill a customer request, like in a customer service scenario.

Development work then is work done by the engineering team to improve the product, either by improving product features, by addressing technical debt or by automating manual operations that support the product’s value delivery. The goal of technical debt work is to reduce development operations burden; while the goal of product improvement work, sometimes called “on-product” work, is to increase your product’s value for customers (to include also automating/ streamlining product operations).

On-Product versus Technical Debt Trade-off

One of the major tradeoffs that product managers/ owners have to make is deciding between addressing technical debt and adding new features (“on-product” work).

The best approach I’ve come across for thinking about this tradeoff is to agree up-front with the team on an “on-product index”, that is a proportion of tickets (or development time) that will be dedicated to improving the product. And this index should vary based on your company’s stage of development.

For example, if your product is at an early stage of development (i.e. pre-product market fit), you want a high on-product index because you should be focused entirely on learning what your customers actually want, as opposed to trying to make the product more scalable (i.e. reducing technical debt). Once the product is at a later stage, then the on-product index should probably be reduced as you focus on making your product more scalable, sustainable and easily maintainable.

Product = Revenue Independent of Time

For products with more of a product operational component, an additonal tradeoff has to be made is: how do we prioritize between product operations versus development (i.e. operating vs improving the product)? Very similar to the technical debt tradeoff, except here operations are directly supporting customers and potential revenue (if the company is at that stage).

The ideal product actually has no product operational component (i.e. is completely automated), since what you want is a product’s revenue to be independent of time. This is what it means for a product to be truly scalable.

In this framing, the highest priority of the development team should firstly be to automate product operations, so that product revenue can be independent of time, and then secondly to engage in the technical debt tradeoff (which is actually a slightly easier tradeoff to make, since it doesn’t directly involve revenue).

If product operations are so high volume/ time-consuming, the team should probably hire a dedicated resource for product operations (who doesn’t necessarily have to be an engineer per se, although that would be ideal scenario if possible), treating that as a product operating cost. The goal then should be to eventually redirect those operating costs towards development and product improvements as the level of automation increases.

Building AI Products

The AI Cold Start Problem

Here’s the scenario. You want to be fancy and build a product that leverages some of the latest and greatest in AI to satisy and delight your (future) users.

Given all the recent developments (like GPT-3), you want to use a pre-trained model to do the leg work but realize, after some light experimentation, you’ll need more data to futher optimize the model for your use case.

You then realize that the users you want to serve (and sell to) are exactly those that have the data you’ll need to do those optimizations…

After some self-reflection (and a hint of self-loathing), you remember a recent podcast you listened to and realize this is a variant of the cold start problem. You search on Google and find that the top Wikipedia article on the cold start problem basically describes your exact issue. Oof.

You realize the Wikipedia article could have articulated the problem in a more general way, not limiting itself to just recommender systems. You write out a more general definition to make yourself feel a little better, hoping that by more clearly articulating the problem it would suddenly make it less true. To the contrary.

The AI Cold Start Problem: The ideal customers for most (if not all) ML/ AI-based products are exactly those that own the data needed to develop said products.

By now you realize you have to build value and start solving your users’ problems up front, so you can entice them to your app in return for their data. You realize you’ll need to build something like a “marketplace”, a “platform” or even a regular ol’ app.

Apps Before AI

Bottom-line: many might think that AI can be leveraged as a core, and potentially killer, feature for a product that satisfies and delights its users. In others words, AI -> App.

But really those opportunities are likely very rare, and you can only get so far by applying data acquisition strategies that don’t rely on providing your users value up front. In actuality, the ordering must be more like App -> AI. And in particular, App -> Scale -> Data -> AI.

So next time you think about making AI the core feature of your killer app, remember that your users likely have the data you’ll need, and you’ll need to provide them value first (just like everyone else) in return for that data, to gain the right to develop that smae product you believe they want/ need.

Agile Product Management from Scratch

Product Objectives

This should be done at the beginning of a product management/ development lifecycle:

  • Describe 3-5 high-level objectives that your business wants to achieve with their product(s)
  • For each of those objectives, list 3-5 quantifiable key results that define the criteria for determining whether that objective was achieved. These key results should be metrics that can be tracked throughout the product development lifecycle.
    • These key results should also be designed such that they complement each other. For example, if there is one key result tracking a quantity (e.g. increasing # of active users) there should be another key result that tracks quality (e.g. decreasing churn rate).
  • These objectives and key results (OKRs for short) should be refined and revisited each quarter. The most important objectives will not change very frequently while the key results should be updated quarterly.

This is a great reference for how to write good OKRs https://www.whatmatters.com/faqs/okr-examples-and-how-to-write-them/.

Product Team

  • Composition:
    • Product Manager
    • Design Team: 0-1 Designer (depending on if product has a user interface)
    • Engineering Team: 2-8 Engineers
  • Duration: Ideally a sustained, durable team dedicated to developing a single product throughout its lifecycle.

Product Development Process

  • This as a four-step cyclic, iterative process that the whole team participates in. The third step in the cycle is commonly referred to in Scrum methodology as a “sprint”[1], while the first two steps are design-focused and commonly referred to as a “design sprint”[2].
  • The first two steps in the process can be done at the same time, on the same rhythm as the third step, however there should be an offset where the design team/ sprint is developing designs for future/ upstream features or user stories and the engineering team/ sprint is focused on implementing those features that have already been designed in prior design sprints. The last step of the process, “validation”, should ideally be done continuously and is mostly the responsibility of the PM.[3]
  1. Discovery (Frame and Plan)
    • Interact with users/ customers to understand their biggest pain points or most desired features.
    • Decide what product features to prioritize that address those needs/ wants, taking into account the business objectives defined above and what value propositions can be offered.
    • This phase should end with a prioritized list of user stories that can be used to design/ build and test prototypes for usability, business viability, feasibility, etc.
    • Participants: PM, designer and at least one engineer
  2. Design (Prototype and Test)
    • Take the prioritized list of user stories and produces prototypes to be validated with customers.
    • Based on those results, the most promising user stories should be added and prioritized on the product backlog.
    • Participants: PM, designer and at least one engineer
  3. Development (Build and Launch/ Deploy)
    • This phase should start with a prioritized product backlog.
    • This phase lasts 2-4 weeks, starts with a sprint planning meeting to discuss the goal of that sprint (ideally to complete a set of related user stories, or epic) and ends with a sprint “demo” where the team showcases what they’ve build to stakeholders. The sprint should also end with an internal retrospective to discuss what could have been improved during the sprint.
    • Throughout the sprint, the team is should meet for 15-mins at the start of each day to check-in on progress towards the sprint goal and remove any obstacles to progress. This meeting is commonly referred to as “standup”.
    • Participants: PM, engineering team
  4. Validation (Test)
    • This phase requires the PM to validate that what was built and released to users had a positive impact on the product objectives.
    • This phase will naturally lead back into the discovery phase if a feature didn’t have the desired outcome, or if it did and there are now new objectives the team needs to prioritize and focus on accomplishing.
    • Participants: PM

Product Backlog

The product backlog is important enough that it warrants its own section

  • The product backlog is a list, ideally prioritized, of user stories (essentially features) grouped into related sets called epics (essentially high-level user stories that describe a larger component with related features) that define the vision for the end product.
  • The backlog is owned and mainly built by the PM, especially for adding in stories gleaned from discussions with customers and stakeholders. However, the engineering team also adds stories and/ or tasks related to product technical debt that also need to be prioritized and addressed.
  • The product backlog should be on a system where everyone can view it and understand where the team is in terms of making progress towards sprint goals. One of the most popular systems out there is Jira but there is also Trello and others.

Resume CI/CD with Google Docs and Github Pages

Ever get frustrated by to keep your resume up-to-date on your Github Pages website? Here’s the process/ workflow I came up with to make that process easier using Google Drive and some light automation.

  1. Create a Google Cloud Platform account (if you don’t have one already), and enable the Google Drive API.

  2. Configure the OAuth consent screen for a Desktop application. Also make sure that all of the Google Drive APIs are in scope for this application.

  3. Create an OAuth client ID credential and download the JSON locally to credentials.json.

  4. Create a conda environment with the necessary dependencies by doing

    $ conda env create -n website-ci
    $ conda install -n website-ci -c conda-forge \
         google-api-python-client \
         google-auth-httplib2 google-auth-oauthlib
    

    You should also consider exporting this environment as a yaml (conda env export --name webite-ci > website-ci.conda.yml), so you can easily load it later (conda env create --file website-ci.conda.yml).

    (Note: if you don’t have conda you can install it here.)

  5. Add the following script to scripts/download_document.py:

    from __future__ import print_function
    import os.path
       
    from googleapiclient.discovery import build
    from googleapiclient.http import MediaIoBaseDownload
    from google_auth_oauthlib.flow import InstalledAppFlow
    from google.auth.transport.requests import Request
    from google.oauth2.credentials import Credentials
       
    # If modifying these scopes, delete the file token.json.
    SCOPES = ['https://www.googleapis.com/auth/drive']
       
    # The ID of a document 
    DOCUMENT_ID = '195j9eDD3ccgjQRttHhJPymLJUCOUjs-jmwTrekvdjFE' 
       
    def main():
        """Downloads document as .pdf"""
        creds = None
        if os.path.exists('token.json'):
            creds = Credentials.from_authorized_user_file('token.json', SCOPES)
            # If there are no (valid) credentials available, let the user log in.
        if not creds or not creds.valid:
            if creds and creds.expired and creds.refresh_token:
                creds.refresh(Request())
            else:
                flow = InstalledAppFlow.from_client_secrets_file(
                    'credentials.json', SCOPES)
                creds = flow.run_local_server(port=0)
            # Save the credentials for the next run
            with open('token.json', 'w') as token:
                token.write(creds.to_json())
       
        service = build('drive', 'v3', credentials=creds)
       
        request = service.files().export_media(fileId=DOCUMENT_ID,
                                               mimeType='application/pdf')
        with open('document.pdf', 'wb') as fh:
            downloader = MediaIoBaseDownload(fh, request)
            done = False
            while done is False:
                status, done = downloader.next_chunk()
    
    
    if __name__ == '__main__':
       main()
    

    Where the DOCUMENT_ID should point to your resume on Google Drive. (Note: You can find the document ID of any Google Docs document URL, like https://docs.google.com/document/d/<document ID>/). And document.pdf should be the local path to your resume within your Github Pages site.

  6. Then add a make command like this:

    DATETIME=$(shell date)
    
    update:
        python3 scripts/download_document.py
        git add document.pdf
        git commit -m "updated document: $(DATETIME)"
        git push
    

Now whenever you want to update your resume after making changes, just do make update and those will automatically be reflected on your website resume.