An event stream is a series of data points that flow into or out of a system continuously, rather than in batches. Event stream processing (ESP) refers to the task of processing event streams in order to identify the meaningful patterns within those streams.
Use Event Streams as Telemetry for Experimentation
In the context of an experimentation platform, an event stream is the incoming telemetry of user and system behavioral data, and event stream processing is the process of deduplication, attribution and statistical computation that transforms events into the metrics upon which subsequent conclusions are made.
Examples of events as telemetry consumed by an experimentation platform might include:
- At time “T” user “U” clicked the “show more info” button on the property listing page
- At time “T” system returned 6 rows to user “U” from a search query, taking 2.4 seconds
- At time “T” user “U” upgraded from “basic” to “pro” tier
Note that all events above have both a timestamp and an association with a specific user. These two event attributes are essential in order to associate the user with a particular cohort and to know what the active experiment state was at that time. Consider this example:
- A series of experiments are being run, at two week intervals, to determine the optimal configuration parameters to pass to a recommendation engine in order to best meet the needs of your site’s user population.
- Users are randomly split into three cohorts, with each cohort being treated to a different set of recommendation engine parameters.
- User behavior (i.e. purchases, upgrades, unsubscribes) and system performance (i.e. response time, errors) are observed for two weeks.
- Based on the results of the first experiment, parameters are changed and another two-week experiment is run.
If we didn’t know which user the events were associated with or exactly when the event occurred, we would not be able to allocate the behaviors to the right cohort or know which version of the parameter sets the behaviors occurred under. This is one reason why data aggregated across different time boundaries (i.e. monthly gross sales) isn’t useful as an event stream for experimentation.
Prioritize Event Stream Selection
“If we have data, let’s look at the data. If all we have are opinions, let’s go with mine.”
Jim Barksdale, (CEO of Netscape Communications from 1995-1999)
Experimentation is about using data, rather than merely opinions, to inform decisions. If you’ve read this far, you probably agree with that. That said, you don’t want to take that idea too far. Rather than attempting to create an event stream from every possible datapoint in your environment before you begin experimentation, consider working back from the most important metrics you will need to inform your decisions.
For example, “Bookings Per Platinum Member ” and “Average Booking Price Per Platinum Member” are metrics calculated from a stream of booking events that contain a timestamp, a user identifier, the users membership type and the booking amount. That stream doesn’t need any data about clicks, scrolls, or page counts. If “Ratio of Booking to Room Selection” is a metric you wish to track, you’ll need to add an event stream of room selection events. Working backward from the most important metrics will ensure that you source the most important event streams first, clearing the way for your most important experiments early on.
For more tips on choosing metrics, have a look at this blog post on How to Choose the Right Metrics for Your Experiments.
Source the Needed Event Streams
The ideal event stream for establishing or expanding an experimentation practice is a stream that already exists and can be routed to your experimentation platform without custom development work. Customer data platforms (CDPs) have simplified the process of discovering and integrating these existing streams, even to the point where a non-technical user can configure and manage event stream flows. If you have access to a CDP, by all means start here.
In the absence of a CDP, you’ll either need to build an integration that extracts, transforms and streams existing data to your experimentation platform, or you’ll need to add new instrumentation to create streams in cases where the data isn’t yet being captured. Most platforms have a variety of options for this, including SDK endpoints you call from inside your code, REST API endpoints you call per-event or to bulk-load events, and integrations that simplify the creation of event streams from other platforms such as Google Analytics.
Source Event Streams from Batch Data
It’s worth noting that an event “stream” can be created periodically from batched data (i.e. data that is only available after a nightly or weekly processing cycle). Sure, the “stream” may only flow now and then, but as long as the timestamps within the batch are preserved, calculations of attribution and impact can be accurately performed.
Look Outside the Box
Event data may need to come from a source outside the application you are focused on. Consider the case where an e-commerce team is experimenting with a free shipping offer for customers who buy three or more items in a single online session. If the same company’s brick-and-mortar stores have a return policy allowing in-store returns of online purchases, then it would be nice to know if the “buy three, get shipping free” cohort returns more products than the norm, right? For that reason, sourcing the event stream of in-store returns is critical for determining the business value of the experiment results. Bottom line? Don’t limit your thinking to your application’s data model when considering data stream candidates.