Text-Mining Maintenance Records to Automate the Identification and Grouping of Failure Modes

Paul Christopher Pereira
DOI: https://doi.org/10.4043/30737-ms
2020-05-04
Abstract:Abstract The investigation of failure data typically involves manual interpretation of free text maintenance system records. Even if failure classification codes exist within an organization's system, they often do not include enough detail or accuracy to group or identify trends. The objective of this study is to develop an automated method to identify functional failures from maintenance record data. The result is a reduction in workload for manual analysis, as well as improved identification of failures and trends. This study's methodology involves the sequential arrangement of multiple text-mining techniques. The techniques include: term frequency-inverse document frequency (TF-IDF), clustering, association rules, term matrix creation, and lexicon development for pre-processing text. In isolation these techniques have shown to be effective in non-industrial pursuits, such as marketing and retail sales. This study serves to apply them in the domain of equipment reliability. They are iteratively implemented and refined on maintenance system records, including work orders (which may or may not represent failures), as well as failure report records. The ability to identify failure modes, failed components, and trends is then evaluated. The techniques were successfully implemented, and the effectiveness was evaluated for each when applied to science of equipment reliability. Text mining was shown to be partially effective in the pursuit of identifying failure modes from maintenance record free text. Certain sub-techniques were shown to be quite effective, in particular the clustering technique's ability to group failed components and failure modes. Hierarchical clustering is a promising technique for technical and industrial themed free text. It was also shown that the outputs of clustering can achieve different and valuable insights based on the types of text records implemented, and the types of pre-processing available to the organization. The association rules method was somewhat effective relative to clustering, as it was able to identify certain failure modes; however, this method still requires a degree of manual intervention and interpretation at this time. The overall results are promising. There is great opportunity for continued study along multiple fronts, including additional techniques such a sentiment analysis and topic modelling.
What problem does this paper attempt to address?