Abstract:Abstract In this chapter, we provide a review of the knowledge discovery process, including data handling, data mining methods and software, and current research activities. The introduction defines and provides a general background to data mining knowledge discovery in databases. In particular, the potential for data mining to improve manufacturing processes in industry is discussed. This is followed by an outline of the entire process of knowledge discovery in databases in the second part of the chapter. The third part presents data handling issues, including databases and preparation of the data for analysis. Although these issues are generally considered uninteresting to modelers, the largest portion of the knowledge discovery process is spent handling data. It is also of great importance since the resulting models can only be as good as the data on which they are based. The fourth part is the core of the chapter and describes popular data mining methods, separated as supervised versus unsupervised learning. In supervised learning, the training data set includes observed output values (“correct answers”) for the given set of inputs. If the outputs are continuous/quantitative, then we have a regression problem. If the outputs are categorical/qualitative, then we have a classification problem. Supervised learning methods are described in the context of both regression and classification (as appropriate), beginning with the simplest case of linear models, then presenting more complex modeling with trees, neural networks, and support vector machines, and concluding with some methods, such as nearest neighbor, that are only for classification. In unsupervised learning, the training data set does not contain output values. Unsupervised learning methods are described under two categories: association rules and clustering. Association rules are appropriate for business applications where precise numerical data may not be available while clustering methods are more technically similar to the supervised learning methods presented in this chapter. Finally, this section closes with a review of various software options. The fifth part presents current research projects, involving both industrial and business applications. In the first project, data is collected from monitoring systems, and the objective is to detect unusual activity that may require action. For example, credit card companies monitor customersʼ credit card usage to detect possible fraud. While methods from statistical process control were developed for similar purposes, the difference lies in the quantity of data. The second project describes data mining tools developed by Genichi Taguchi, who is well known for his industrial work on robust design. The third project tackles quality and productivity improvement in manufacturing industries. Although some detail is given, considerable research is still needed to develop a practical tool for todayʼs complex manufacturing processes. Finally, the last part provides a brief discussion on remaining problems and future trends.

DATA PRE-PROCESSING TECHNIQUES IN DATA MINING: A REVIEW

A Review on Data Mining Issues, Solution & Techniques

Data Preprocessing: The Techniques for Preparing Clean and Quality Data for Data Analytics Process

A Review: Data Pre-Processing and Data Augmentation Techniques

Data Mining Applications for Enhancing Healthcare Services: A Comprehensive Review

Data Mining Techniques and Its Application in Civil Engineering—A Review

Book Reviews: Data Mining Concepts and Techniques

A Study of Algorithms, Systems, and Applications of Multi-Agent Systems for Distributed Data Mining

Innovations in Healthcare Analytics: A Review of Data Mining Techniques

Pre-Processing: A Data Preparation Step

A review on application of data mining techniques to combat natural disasters

Data Mining Methods and Applications

Data Mining and Analytics in the Process Industry: the Role of Machine Learning

A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data

Big Data Analytics: A Review on Theoretical Contributions and Tools Used in Literature

Introduction to Data Mining

Data Research in Industrial Data Mining Projects in the Big Data Generation Era

Event Log Preprocessing for Process Mining: A Review

Principles of Data Mining

Research on Data Preprocessing Methods for Big Data

Review Paper on Educational Data Mining