Animal speech and singing synthesis model based on So-VITS-SVC

Yutang Gong
DOI: https://doi.org/10.54254/2755-2721/68/20241430
2024-06-06
Abstract:Currently, when researchers in deep learning and neural network technology have made significant progress, the author makes a new bold attempt to apply the technical principles of speech and singing synthesis with artificial intelligence to the field of animal speech and singing synthesis, using So-VITS-SVC4.0 framework, which was originally designed for human voice synthesis. Taking dogs as an example of a species and putting datasets of their sounds to use, the author is committed to capturing its sound characteristics and vocalization through model training and generating synthetic sounds with a high degree of similarity. This research may not only contribute to a deeper understanding of how animals communicate, but also open up new possibilities for animal sound art and music creation. With the continuous progress and improvement of technology, synthetic animal speech and singing by artificial intelligence may play an increasingly important role in zoological research and entertainment, bringing new perspectives and possibilities for communication between humans and animals.
What problem does this paper attempt to address?