Method and system for cross-mode-based video time location, and storage medium

Liu Meng,Nie Liqiang,Wang Xiang,Song Xuemeng,Gan Tian,Chen Baoquan
2018-01-01
Abstract:The invention discloses a method and a system for cross-mode-based video time location, and a storage medium. The method and the system are applied in a location problem of a certain time segment in avideo. The method comprises the following steps: establishing a language timing model, to extract text information which is beneficial for time location and extract features; a multimodal fusion model fusing text-visual features, to generate enhanced time representation features; a multi-layer perception model being used to predict matching degree between time and text description, and starting time of the time segment; using a training model which trains data from end to end. The method and the system have higher accuracy than an existing model on a time location problem based on text query.
What problem does this paper attempt to address?