Spatial-Temporal Consistency Constraints for Chinese Sign Language Synthesis.

Liqing Gao,Peidong Liu,Liang Wan,Wei Feng
DOI: https://doi.org/10.1007/978-981-99-9666-7_11
2024-01-01
Abstract:Video splicing based sign language synthesis focuses on splicing word\sentence-level sign language videos to produce new sign language videos. However, directly splicing or combining video clips may result in video jumping problems. To this end, this paper proposes a novel spatial-temporal consistency constraints (STCC) approach for sign synthesis, which enhances the authenticity and acceptability of the synthesized video by generating intermediate transition frames. First, we use the cubic Bezier curve to generate human pose key points of transition frames by modeling motion trajectories. Then, we use a hierarchical attention generative adversarial network to generate smooth transition frames based on the generated pose and source image. Finally, we validate the effectiveness of the proposed STCC framework on two public Chinese sign language datasets. The visualization comparison with existing transition frame generation methods shows that our STCC approach offers the advantages of realistic textures, smooth motion and high comprehensibility for the synthesized video.
What problem does this paper attempt to address?