What Is the Distance Between Objects in a Data Set?: A Brief Review of Distance and Similarity Measures for Data Analysis
Karim Abou-Moustafa
DOI: https://doi.org/10.1109/MPUL.2015.2513727
IF: 0.622
IEEE Pulse
Abstract:Digitally recorded data have become another critical natural resource in our current research environment. This reality is one of the tremendous victories for decades of research in computer engineering, computer science, electronics, and communications. While this scenario will continue to be the case in the future, our current era has also marked the beginning of another unstoppable activity that is intimately related to digitally stored data: extracting knowledge and information from such data. Digital data are recorded in different forms and at unprecedented scales. Examples include database tables in every business entity; Tweets; e-mails and text documents; audio and speech signals; seismic data (recorded as temporal multidimensional tensors); video data (possibly stored with other modalities such as audio and captions); graphs representing relations and interactions among different entities (links among web documents, users on social networks, devices on computer networks, or gene interaction networks); and images, including functional magnetic resonance imaging (fMRI) and more. Because of their different forms and structures, throughout this article, each datum in a data set will be referred to as an object.Digitally recorded data have become another critical natural resource in our current research environment. This reality is one of the tremendous victories for decades of research in computer engineering, computer science, electronics, and communications. While this scenario will continue to be the case in the future, our current era has also marked the beginning of another unstoppable activity that is intimately related to digitally stored data: extracting knowledge and information from such data. Digital data are recorded in different forms and at unprecedented scales. Examples include database tables in every business entity; Tweets; e-mails and text documents; audio and speech signals; seismic data (recorded as temporal multidimensional tensors); video data (possibly stored with other modalities such as audio and captions); graphs representing relations and interactions among different entities (links among web documents, users on social networks, devices on computer networks, or gene interaction networks); and images, including functional magnetic resonance imaging (fMRI) and more. Because of their different forms and structures, throughout this article, each datum in a data set will be referred to as an object.