Measuring Non-Linear User Journeys: Rethinking Funnels Metrics in A/B Testing
In this blog from our tech team (originally published on Medium) they describe how inDrive uncovered hidden reorder patterns and redesigned funnel metrics using aggregated orders to improve A/B test interpretation in complex user journeys.
In a mature product, it is often difficult to achieve a statistically significant impact on key business metrics such as revenue per user or number of orders. Most changes are aimed at point improvements in the funnel or individual stages of the user journey, and the impact of such changes on business metrics is usually lost in the noise. Product teams therefore quite often choose a corresponding conversion as the target metric and design experiments in a way that achieves the required statistical power.
However, from time to time funnel metrics don’t move in line with the dynamics of top-level indicators. And, in some tests, conversions at stages preceding the implemented changes can change in a statistically significant way. This makes interpreting such experiments difficult, and raises the risk of making wrong decisions.
For example, consider a service where a user creates an order, gets offers from different performers, chooses one, and waits for the task to be completed. Suppose we’ve developed a new feature that highlights the best offer and is expected to increase the share of orders where a customer and performer are matched.
During the experiment, we may observe that:
- the share of successful orders decreases;
- the total number of orders and completed orders increases;
- the share of orders that received at least one offer decreases.
Such a pattern may occur if the user can return to previous stages and, for example, re-post the order.
We discovered similar patterns in our experiments. Using inDrive, passengers can propose their own price, receive offers from drivers and choose one. Many users use the bargaining features and may change the order conditions and recreate it to try and get a better price. This leads to a series of orders before a trip actually takes place.
Our passenger fulfillment team oversees the user journey from the initial order to the completion of the trip. Here’s how we investigated these behavioral patterns and then used them to introduce new metrics to make test results more interpretable.
This information may be useful for product analysts and product managers who work with products that have a complex, non-linear user journey, where metric interpretation requires taking behavioral patterns and repeated user actions into account.
How Do Key Metrics and Funnel Metrics Behave?
In our product, the funnel looks roughly like this: a passenger creates an order, gets bids from drivers, selects one, waits for the driver to arrive, and then starts and completes the trip.
Imagine that we launch a small UI change: we show the user a progress bar while searching for a driver, to reduce uncertainty. We expect that with this, users will more often wait for driver offers and so make more trips.
It is logical to choose the conversion from order creation to receiving a bid as the target metric for such a test.
As a result of the test, we see:
Rides count: ↑ (not statistically significant increase)
Orders count: ↑↑ (statistically significant increase)
CR from order to bid: ↓↓ (statistically significant decrease)
Done rate: ↓↓ (statistically significant decrease)
We see a slight increase in the number of rides and a statistically significant increase in the number of orders, but also a drop in conversion from order creation to receiving a bid and a decrease in the share of successful trips.
The user only interacts with the feature after creating the order, so at first glance it seems that we could not influence how many created orders are created. If the test group happened to include users who tend to create orders more often, the increase in the number of orders could distort funnel indicators and explain the positive dynamics in rides.
However, deeper analysis showed that this was not a randomization issue. After the progress bar appeared, some users who tended to wait a long time for driver offers began to cancel the order earlier and make another attempt to take a trip.
As a result, the number of reorders increased the most (statistically significant growth).
How Do Reorders Affect Key and Funnel Metrics?
After creating an order, a user can drop off at different stages: if they did not receive offers from drivers, if the offer price was not suitable, or later if the driver took too long to arrive. In such cases, some users do not stop trying, but create a new order to eventually get a ride. We call such repeated attempts reorders.
Instead of the expected linear user flow, we observe repeating cycles — users try to go through the same scenario several times.
When analyzing the efficiency of repeat attempts, we noticed that their success rate is often significantly lower. If users start reordering more often, this affects all stages of the funnel — including those that precede the actual change. At the same time, in a number of scenarios (for example, when we encourage users to try again instead of leaving), we may observe a positive effect on top-level business metrics.
Collapsing Reorders
Our goal is to understand whether users’ intentions (not individual attempts) have started to end in trips more often. To do this, we needed to give a stricter definition of a “trip intention” that would allow us to collapse multiple reorders of one user.
After discussions with the teams, we concluded that two orders should have the following properties in order to be considered as one intention to take a trip:
The pickup and drop-off points of both orders should not differ significantly.
The time of order creation should be close (orders placed within a short interval).
The previous order must not have been completed by a trip.
The remaining task was to define threshold values — what should be considered “close in time” and a “small route change.” Initially, these thresholds were defined based on business needs, so the first thing we decided to do was to re-check how well these values correspond to real user behavior.
We found that:
in the case of reordering, users rarely change the destination point (point B);
the pickup point (point A) shifts more often, but in most cases insignificantly — by about 50 meters from the original position;
most reorders happen within the first 10–20 minutes.
We then fixed points A and B within 500 meters and tried to see what share of reorders are made no later than X minutes.
The initial cutoffs suited us well: they cover more than 90% of reorders, and further increasing the thresholds hardly affects the coverage share.
In cases where a user creates three or more orders in a row, collapsing is performed sequentially: first the first and second orders are checked and merged, then the second and third, and so on — as long as the conditions of time and location proximity are met.
Alternatives
As an alternative approach, we considered using a mobile session identifier to group orders within a single intention.
However, this option turned out to be less reliable for two reasons:
A session can be interrupted or “stick”; for example, when a user places an order, then takes a trip and soon creates and completes a new one. In such cases, session boundaries do not match real behavior.
Mobile analytics data is less accurate than backend data: event times and their order can be recorded with delays or lost.
As a result, we decided not to use the session identifier as the basis for defining a trip intention.
New Metrics
As a result, we created a new entity and defined a rule for forming a unique identifier. The final and adopted name is “aggregated order.”
Based on this entity, we built several derived metrics:
Aggregated funnel — allows us to evaluate conversions without distortions related to reorders and makes test results more interpretable.
Funnels of the first, second, and subsequent attempts help us understand which actions stimulate users to make a repeat attempt and increase the probability of success.
Now let’s return to the test we discussed earlier and compare the value sobtained in different approaches.
Within an intention, users began to receive bids less often; the effect is close to statistical significance.
To explain why the aggregated done rate is growing while the “order → bid” conversion is falling, we looked at how exactly users perform reorders.
It turned out that behavior split into two patterns:
Some users began to stop searching sooner, without waiting for a bid.
In contrast, another group began to raise the price more often when reordering, and such orders were less often canceled after acceptance.
Additional observations:
CR to price increase after reorder: ↑↑ (statistically significant growth)
Aggregated bid → done: ↑↑ (statistically significant growth)
Conclusion
Sometimes user interaction with a product can’t be fully described by classic funnel metrics. The observed results may seem contradictory. Then it is important to use metrics that reflect customers’ behavioral patterns or, as in our case, to create new entities that describe reality more accurately.
Author: inDrive.Tech



