ECNU: Leveraging on Ensemble of Heterogeneous Features and Information Enrichment for Cross Level Semantic Similarity Estimation
Tiantian Zhu,Man Lan
DOI: https://doi.org/10.3115/v1/s14-2043
2014-01-01
Abstract:This paper reports our submissions to the Cross Level Semantic Similarity (CLSS) task in SemEval 2014.We submitted one Random Forest regression system on each cross level text pair, i.e., Paragraph to Sentence (P-S), Sentence to Phrase (S-Ph), Phrase to Word (Ph-W) and Word to Sense (W-Se).For text pairs on P-S level and S-Ph level, we consider them as sentences and extract heterogeneous types of similarity features, i.e., string features, knowledge based features, corpus based features, syntactic features, machine translation based features, multi-level text features, etc.For text pairs on Ph-W level and W-Se level, due to lack of information, most of these features are not applicable or available.To overcome this problem, we propose several information enrichment methods using WordNet synonym and definition.Our systems rank the 2nd out of 18 teams both on Pearson correlation (official rank) and Spearman rank correlation.Specifically, our systems take the second place on P-S level, S-Ph level and Ph-W level and the 4th place on W-Se level in terms of Pearson correlation.