CDMP Fundamentals • 100 Questions • 90 Minutes
← Back to Case Studies

ManufacturePro's Industrial Data Lake Operations

Data Storage and Operations Hard

💼 Scenario

ManufacturePro is a global manufacturer operating 35 factories with 10,000 industrial IoT sensors generating 2 TB of telemetry data daily. The company built a data lake two years ago to support predictive maintenance, quality analytics, and supply chain optimization. However, the data lake has become what the team calls a 'data swamp' with serious operational issues. Current problems include: data lake storage has grown to 800 TB with no lifecycle management (all data is kept indefinitely on high-performance storage), 40% of stored data has no metadata and cannot be identified or attributed to a source, query performance has degraded by 70% due to small file problems and lack of partitioning, there is no disaster recovery plan for the data lake, and the monthly cloud storage bill has reached $120,000. The VP of Manufacturing needs the data lake to reliably support real-time equipment failure prediction (currently 85% accurate but needs 95%), and the CFO demands a 50% reduction in storage costs. The data engineering team has 4 members, limiting the scope of remediation efforts.

Question 1: What is the MOST impactful first step to address the data swamp and cost issues simultaneously?

Question 2: How should ManufacturePro address the query performance degradation caused by small files and lack of partitioning?

Question 3: What disaster recovery strategy is MOST appropriate for ManufacturePro's data lake given the real-time predictive maintenance requirement?