Introduction
As a long-time fan of Starbucks' rewards system (gives me free breakfast every now & then ๐ฅช), I was curious about it worked under the hood. Specifically, there are probably countless campaign possibilities (by combining different prices, products, durations, content, and more) from which Starbucks' data scientists would have to identify the best-performing ones.
The purpose of this analysis is to better understand the relationships between user demographics and campaigns to maximize revenue growth. ๐
Objectives
The objective of the analysis is to identify the key relationships between offers and demographic segments. There are a total of 10 offers, as shown below:
In order to narrow down which promotions are relevant to each segment, we're performing a heuristical & regressional analysis on the relationships between the 10 promotions and our identified segments.
Specifically, we want to:
- Determine which promotion(s) each segment is most responsive to.
- Determine which promotion(s) to send to each segment based on expected revenues.
Part 0 โ Pre-processing & Exploration
The first step to any data analysis is to check the quality of the dataset and pre-process to address issues that may muddle the analysis.
Based on the data exploration, the biggest oddity was the unusually high volume of users with age indicated "118". Coincidentally, these were also users who didn't have indicated income values. My guess is that these are users for whom we have missing data for age & income. Thus, these would only be helpful when considering the effects of "gender" on purchase behavior.
Since our segmentation will consider age & income (in addition to gender), I shall exclude these users from this analysis. After verifying observing the size & distribution of the dataset, the most appropriate way to segment seems to be by age group and gender. Though technically we could segment things further (e.g. by income, by join date), this will likely be enough to get a decent understanding of the relationships.
In addition, there were also several data pre-processing steps to ensure the dataset was ready for analysis, including:
- Convert categorical variables โ dummy variables
- Date formatting
- Feature extraction (from object data formats)
- Feature engineering (to determine segments)
Part 1 โ Promotion Responsiveness
In order to determine the responsiveness to each of the 10 offers for each segment, we need to observe the # of offer completions associated with viewed offers. Specifically, this excludes offer completions associated with offers that weren't seen by the user (i.e. they were going to spend that much $ anyway).
For promotional offers (i.e. nothing to "complete"), I distinguished by users who "viewed" the offer โ in order to observe the impact of the promotion.
From the above, there are a few key observations:
General
- Across all segments, offers 5โ8 seem to have the best response rates. Coincidentally, these are the offers that are delivered through social channels (as well as the web, email, mobile โ like the rest). It may be worthwhile to test this hypothesis by observing the differences in response rates by social channels.
- Offer 7 has particularly high response rates across the board, as it's simply a promotional offer (i.e. "viewed" offers). However, when compared to the other promotional offer (#2), there are stark differences in response rates โ further reinforcing the effectiveness of the social channel.
- Offer 4, the discount offer with the highest difficulty ($20 minimum spend within 10 days) and highest reward ($5 discount), had the lowest response rates by far across the board. It's very possible this may be due to its delivery via web & mobile only.
Segment-Specific
- Female segments (2, 5, 8, 11) seem to have better average response rates across the board than male segments, particularly for offers delivered via mobile & social channels.
- Segment 1 (Millenials, Male) seems to have particularly low response rates (~40%, whereas most others are >50%).
- The "Other" gender segments (3, 6, 9, 12) seem to have a fairly different distribution of response rates, which most likely has to do with their significantly low sample sizes.
Essentially, the highest response rates seem to be associated with offers delivered through social channels, particularly to female segments.
Part 2 โ Which Promotions?
As promotion responsiveness is only part of the overall story, we need to better understand the financial implications for each segment. In order to assess the financial metrics, a regression analysis was required.
From the candidate models considered, the most appropriate one seems to be a 2nd-degree polynomial, due to its relatively high training & testing scores:
Using the 2nd-degree polynomial regression, I determined the offers with the highest expected spend for each segment:
Furthermore, I honed in on the top offers for each segment:
There are a few key takeaways from these results:
- Offer #4 ($5 discount, for $20 spend within 10 days) seems to be the most effective campaign for most of our segments.
- Generally, offers 0, 1, and 9 seem to be strong performers as well (not far behind #4).
- The top 10 solutions for all segments consist of "discounts" and "bogo" offers (near-even split).
- Income generally seems to have a direct, positive relationship with spend amount (which ties out with intuition).
- "Silent Generation" segments seem to have the highest expected returns, followed by "Baby Boomers" and "Millennials".
Collectively, this has revealed high-potential offers (#4) that may be underperforming due to its delivery channels and offers & segments to focus on (for A/B testing, etc.).
Conclusion
This analysis has revealed a handful of key insights that will be very helpful in narrowing down future analyses and overall promotion strategy.
With the current offer mix, the best promotion strategy seems to be offers 0 & 1 for females with high income (especially "Millenials" and "Silent Generation"), as they have higher true offer completion rates while also having relatively high expected spending.
However, there are a few outstanding experiments & analyses to be done before settling on a strong promotion strategy. These are:
- Verify the relationship between completion rates & channels. Specifically, what happens if you deliver Offer #4 (highest return, lowest completion) via social channels?
- For each segment, which is the best combination of offers to send?
- Explore the value of "surprise completions" (not viewed, but completed offer) โ can it lead to changes in future spending behavior?
- Perform categorization analyses (e.g. K-Means Clustering) to validate/reinforce the identified segments.
- Are there ways to improve completion rates for males (especially as they have high expected spending)?
- Perform analysis for the "Other" gender segments (3, 6, 9, 12) with larger sample usage data.
You can find the GitHub repo here.