ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

Yanir Marmor,Kinneret Misgav,Yair Lifshitz
2023-07-17
Abstract:We introduce "<a class="link-external link-http" href="http://ivrit.ai" rel="external noopener nofollow">this http URL</a>", a comprehensive Hebrew speech dataset, addressing the distinct lack of extensive, high-quality resources for advancing Automated Speech Recognition (ASR) technology in Hebrew. With over 3,300 speech hours and a over a thousand diverse speakers, <a class="link-external link-http" href="http://ivrit.ai" rel="external noopener nofollow">this http URL</a> offers a substantial compilation of Hebrew speech across various contexts. It is delivered in three forms to cater to varying research needs: raw unprocessed audio; data post-Voice Activity Detection, and partially transcribed data. The dataset stands out for its legal accessibility, permitting use at no cost, thereby serving as a crucial resource for researchers, developers, and commercial entities. <a class="link-external link-http" href="http://ivrit.ai" rel="external noopener nofollow">this http URL</a> opens up numerous applications, offering vast potential to enhance AI capabilities in Hebrew. Future efforts aim to expand <a class="link-external link-http" href="http://ivrit.ai" rel="external noopener nofollow">this http URL</a> further, thereby advancing Hebrew's standing in AI research and technology.
Audio and Speech Processing,Computation and Language,Sound
What problem does this paper attempt to address?