Reliability and Performance of the Online Literature Database CAMbase after Changing from a Semantic Search to a Score Ranking Algorithm

Sebastian Unger,Christa K. Raak,Thomas Ostermann
DOI: https://doi.org/10.1007/s42979-023-02146-9
2023-09-10
SN Computer Science
Abstract:Despite the increase in scientific publications in the field of integrative medicine over the past decades, a valid overview of published evidence remains challenging to get. The online literature database CAMbase (available at https://cambase.de) is one of the established databases designed to provide such an overview. In 2020, the database was migrated from a 32-bit to a 64-bit operating system, which resulted in unexpected, technical issues and forced the replacement of the semantic search algorithm with Solr , an open-source platform that uses a score ranking algorithm. Although semantic search was replaced, the goal was to create a literature database that is essentially no different from the legacy system. Therefore, a before-after analysis was conducted to compare first the number of retrieved documents and then their titles, while the titles were syntactically compared using two Sentence-Bidirectional Encoder Representations from Transformers (SBERT) models. Analysis with a paired t-test revealed no significant overall differences between the legacy system and the final system in the number of documents ( t =− 1.41 , d f = 35 , p = 0.17 ), but an increase in performance ( t = 4.13 , d f = 35 , p < 0.01 ). Analysis with a t-test for independent samples of the values from the models also revealed a high degree of consistency between the retrieved documents. The results show that an equivalent search can be provided by using Solr , while improving the performance, making this technical report a viable blueprint for projects with similar contexts.
What problem does this paper attempt to address?