Abstract:Digitally recorded data have become another critical natural resource in our current research environment. This reality is one of the tremendous victories for decades of research in computer engineering, computer science, electronics, and communications. While this scenario will continue to be the case in the future, our current era has also marked the beginning of another unstoppable activity that is intimately related to digitally stored data: extracting knowledge and information from such data. Digital data are recorded in different forms and at unprecedented scales. Examples include database tables in every business entity; Tweets; e-mails and text documents; audio and speech signals; seismic data (recorded as temporal multidimensional tensors); video data (possibly stored with other modalities such as audio and captions); graphs representing relations and interactions among different entities (links among web documents, users on social networks, devices on computer networks, or gene interaction networks); and images, including functional magnetic resonance imaging (fMRI) and more. Because of their different forms and structures, throughout this article, each datum in a data set will be referred to as an object.Digitally recorded data have become another critical natural resource in our current research environment. This reality is one of the tremendous victories for decades of research in computer engineering, computer science, electronics, and communications. While this scenario will continue to be the case in the future, our current era has also marked the beginning of another unstoppable activity that is intimately related to digitally stored data: extracting knowledge and information from such data. Digital data are recorded in different forms and at unprecedented scales. Examples include database tables in every business entity; Tweets; e-mails and text documents; audio and speech signals; seismic data (recorded as temporal multidimensional tensors); video data (possibly stored with other modalities such as audio and captions); graphs representing relations and interactions among different entities (links among web documents, users on social networks, devices on computer networks, or gene interaction networks); and images, including functional magnetic resonance imaging (fMRI) and more. Because of their different forms and structures, throughout this article, each datum in a data set will be referred to as an object.

Normalized Web Distance and Word Similarity

Web Similarity in Sets of Search Terms using Database Queries

An adaptive method for text domain similarity calculation

Computing Semantic Relatedness Using Structured Information of Wikipedia

Similarity of Objects and the Meaning of Words

A New Hypred Improved Method for Measuring Concept Semantic Similarity in WordNet.

A Semantic Similarity Measure Between Web Services Based on Google Distance

Minimum Normalized Google Distance for Unsupervised Multilingual Chinese-English Word Sense Disambiguation

A Novel Comprehensive Approach for Estimating Concept Semantic Similarity in WordNet

Normalized Compression Distance of Multisets with Applications

Uniform definition of comparable and searchable information on the web

Normalized Dependency Distance: Proposing a New Measure

Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings

Word Embedding based Edit Distance

Just an Update on PMING Distance for Web-based Semantic Similarity in Artificial Intelligence and Data Mining

Alignment-Aware Word Distance.

The earth mover's distance as a semantic measure for document similarity.

What Is the Distance Between Objects in a Data Set?: A Brief Review of Distance and Similarity Measures for Data Analysis

A Hypothesis on Word Similarity and Its Application.

On Normalized Compression Distance and Large Malware

Measures of lexical distance between languages