Efficient Shapelet Discovery for Time Series Classification
Guozhong Li,Byron Choi,Jianliang Xu,Sourav S Bhowmick,Kwok-Pan Chun,Grace Lai-Hung Wong
DOI: https://doi.org/10.1109/tkde.2020.2995870
IF: 9.235
2022-03-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Time-series shapelets are discriminative subsequences, recently found effective for time series classification (tsc). It is evident that the quality of shapelets is crucial to the accuracy of tsc. However, major research has focused on building accurate models from some shapelet candidates. To determine such candidates, existing studies are surprisingly simple, e.g., enumerating subsequences of some fixed lengths, or randomly selecting some subsequences as shapelet candidates. The major bulk of computation is then on building the model from the candidates. In this paper, we propose a novel efficient shapelet discovery method, called bspcover, to discover a set of high-quality shapelet candidates for model building. Specifically, bspcover generates abundant candidates via Symbolic Aggregate approXimation with sliding window, then prunes identical and highly similar candidates via Bloom filters, and similarity matching, respectively. We next propose a <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.259ex" height="2.009ex" style="vertical-align: -0.671ex; margin-left: -0.089ex;" viewBox="-38.5 -576.1 542 865.1" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-70" x="0" y="0"></use></g></svg></span>p-Cover algorithm to efficiently determine discriminative shapelet candidates that maximally represent each time-series class. Finally, any existing shapelet learning method can be adopted to build a classification model. We have conducted extensive experiments with well-known time-series datasets and representative state-of-the-art methods. Results show that bspcover speeds up the state-of-the-art methods by more than 70 times, and the accuracy is often comparable to or higher than existing works.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-70" d="M23 287Q24 290 25 295T30 317T40 348T55 381T75 411T101 433T134 442Q209 442 230 378L240 387Q302 442 358 442Q423 442 460 395T497 281Q497 173 421 82T249 -10Q227 -10 210 -4Q199 1 187 11T168 28L161 36Q160 35 139 -51T118 -138Q118 -144 126 -145T163 -148H188Q194 -155 194 -157T191 -175Q188 -187 185 -190T172 -194Q170 -194 161 -194T127 -193T65 -192Q-5 -192 -24 -194H-32Q-39 -187 -39 -183Q-37 -156 -26 -148H-6Q28 -147 33 -136Q36 -130 94 103T155 350Q156 355 156 364Q156 405 131 405Q109 405 94 377T71 316T59 280Q57 278 43 278H29Q23 284 23 287ZM178 102Q200 26 252 26Q282 26 310 49T356 107Q374 141 392 215T411 325V331Q411 405 350 405Q339 405 328 402T306 393T286 380T269 365T254 350T243 336T235 326L232 322Q232 321 229 308T218 264T204 212Q178 106 178 102Z"></path></defs></svg>
computer science, information systems, artificial intelligence,engineering, electrical & electronic