Model-Based clustering of sequential data

Suzannah Bridget Molloy,David William Albrecht,David Leonard Dowe,Kai Ming Ting
2006-01-01
Abstract:Discrete, sequential data consists of multiple sequences of states, possibly containing some underlying structure or pattern. We develop two clustering approaches based on the following information theoretic criteria: Akaike's Information Criterion (AIC) and Minimum Message Length (MML), as a means of searching for any underlying structure. We compare the performance of our approaches with the method described in Cadez et al.(2000, 2001) by varying sequence length, number of states, and number of true classes within the data. The criteria are also compared using data describing navigation paths of web site users. It was observed that a penalty term is necessary to prevent overfitting of the data, and in the case of the AIC adaption, it was also necessary to incorporate prior information into the parameter estimates to ensure the criterion could handle previously unseen cases. The number of clusters inferred …
What problem does this paper attempt to address?