CDMP Fundamentals • 100 Questions • 90 Minutes
← Back to Case Studies

EcomNext's Recommendation Engine and Data Science Platform

Big Data and Data Science Medium

💼 Scenario

EcomNext is an e-commerce platform with 20 million active users generating 500 million clickstream events daily, 2 million transactions per day, and 50 TB of product image data. The company wants to build a personalized recommendation engine to increase average order value by 15% and reduce customer churn by 20%. The current analytics infrastructure is a traditional data warehouse that can only process batch reports with 24-hour latency. The data science team of 8 people has built prototype recommendation models in Jupyter notebooks on their laptops using small data samples, but these models have not been deployed to production. The models show promising results (12% improvement in click-through rate on test data) but there is no infrastructure for model deployment, monitoring, or retraining. Additional challenges include: clickstream data arrives in real-time but current infrastructure can only process it in batch, product catalog changes daily with 10,000 new items and 5,000 removals, cold-start problem for new users with no purchase history, and the need to serve recommendations with less than 200 milliseconds latency during peak shopping periods.

Question 1: What data architecture should EcomNext implement to support both real-time recommendations and batch model training?

Question 2: What MLOps practice is MOST critical for EcomNext to move from prototype notebooks to production recommendation models?

Question 3: How should EcomNext address the cold-start problem for new users with no purchase history?