Multi-view Sequence-data Representation and Non-metric Distance-function Learning

Yi Wu,Gang Wu,Edward Y. Chang
2005-01-01
Abstract:Abstract Sequence-data analysis plays a key role in many,scientific st udies and real-world applications such as bioinformatics, data stream, and sensor networks, where sequence data are processed and their semantics interpreted. In this paper we address two relevant issues: sequence-data representation, and representation-to-semantics mapping. For representation, since the best one is dependent upon the application being used and even the types of queries, we propose representing sequence data in multiple views. For each representation, we propose methods to construct a valid distance metric to compare sequences of variable lengths. For mapping, we propose a super-kernel fusionscheme to achieve the best combination of the individual distance metrics, which measure sequence similarity of different views, to depict the target semantics. Through theoretical analys is and empirical studies on UCI and real-world datasets, we show our approach of multi-view representation and fusion to be mathematically valid and very effective for practical purposes. *Note to the editor. The preliminary version of this paper appeared at CIKM 2004 conference [1]. In this journal version, we have made enhancements in three aspects. First, we performed a comprehensive survey on all nonmetric to metric conversion techniques. Second, we conducted thorough empirical studies on these and our proposed conversion techniques. Third, we added a comparison of our fusion technique with the state-of-art fusion method ( [2]) to the empirical study section. Index Terms Sequence-data mining, distance metric, super-kernel function-fusion
What problem does this paper attempt to address?