It is well known that storage sensor data on storage systems can detect abnormal symptoms that can lead to failures. With the abnormal sensor data and machine learning techniques, we can predict a storage component failure ahead of time and proactively remove it, before it can impact the remaining system or interrupt customer’s operations. A successful predictive maintenance model must make trade off in detection rate, false positive and lead time. Through a machine learning feature selection process, we can train and build failure prediction models that can alert us critical component failures before they happen. Our approach has successfully predicted the failures and proactively removed many electronic components such as hard drives, solid state drives, or voltage regulators. The worldwide supply chain shortage during Covid re-opening created a new challenge to these predictive maintenance models. While you may know which component will fail soon, the long lead time on supply chain means you may not be able to get the new parts in time before the failure. To address the supply chain management issue, we develop a separate prediction model to assist our supply chain management team. The lead time is stretched to months or over a quarter. The key differences between the two models are, while the predictive maintenance models are catching the dying anomaly, the supply chain model is looking for stress and activities that can accelerate a failure. Our current models have prediction accuracy between 70-80% detection while maintaining a very low false positive rate. Our quarterly quantity prediction accuracy is 90-95% accurate in real world deployment.
Storage Device Quality Control and Supply Chain Management Using Dual Machine Learning Models
Wed Sep 18 | 8:30am
Location:
Cypress
Abstract
Learning Objectives
Upon completion, participant will be able to understand the three key trade-offs for machine learning predictive maintenance, detection rate, false positive and lead time.
Upon completion, participant will be able to understand how
Upon completion, participant will be able to understand how to build imminent failures prediction models for proactive component removal.
Upon completion, participant will be able to understand how to build long term supply chain failure component demand quantity models.
Upon completion, participant will be able to see real world deployment examples of the dual predictive model approach to manage storage device quality and supply chain management.
---
Yongjin Choi
HPE
- Mourad LarbiHPE
Related Sessions