Abstract:Multisets are sets that allow repetition of elements. As such, multisets pave the way to a number of interesting possibilities of theoretical and applied nature. In the present work, after revising the main aspects of traditional sets, we introduce some of the main concepts and characteristics of multisets, followed by their generalization to take into account vectors and matrices. An approach is also proposed in which the real, negative multiplicities are allowed, implying the multiset universe to become finite and well-defined, corresponding to the multiset with null multiplicities. The complement operation in multisets is then defined, which allows properties involving complement -- including the De Morgan theorem -- to be recovered in multisets. In addition, it becomes possible to extend multisets to functions (which become multifunctions), scalar fields and other continuous mathematical structure, therefore achieving an enhanced space endowed with all algebraic operations plus set theoretical operations including union, intersection, and complementation. The possibility to define a set operation between mfunctions, namely the common product, that is analogous to the traditional inner product is also proposed, paving the way to obtaining respective mfunction transformations, and it is argued that the Walsh functions provide an orthogonal basis for the mfunctions space under the common product. This result also allowed the proposal of performing integrated signal processing operations on mset mfunctions, including filtering and enhanced template matching. Relationships between the cosine similarity index and the Jaccard index are also identified, including the presentation of an intersection-based variation of the cosine index. The potential of multisets in pattern recognition and deep learning is also briefly characterized and illustrated.

Compressing Sets and Multisets of Sequences

Integer Set Compression and Statistical Modeling

Disk compression of k-mer sets

Efficient Compression Technique for Sparse Sets

Lossy Compression of Individual Sequences Revisited: Fundamental Limits of Finite-State Encoders

The Minimal Compression Rate for Similarity Identification

Entropy Coding of Unordered Data Structures

Empirical Lossless Compression Bound of a Data Sequence

Elliptic Curve Multiset Hash

Normalized Compression Distance of Multisets with Applications

Overcoming the compression limit of the individualsequence (zero order empirical entropy) using the Set Shaping Theory

An Introduction to Multisets

An Efficient Biological Sequence Compression Technique Using LUT And Repeat In The Sequence

A Family of LZ78-based Universal Sequential Probability Assignments

Compressed Hashing

Metannot: A succinct data structure for compression of colors in dynamic de Bruijn graphs

Universal Graph Compression: Stochastic Block Models

Random Permutation Codes: Lossless Source Coding of Non-Sequential Data

Compressing genomic sequence fragments using SlimGene

Multi-Tier Preservation of Discrete Morse Smale Complexes in Error-Bounded Lossy Compression