Orthography as a Fundamental Impediment to Online Information Retrieval

T. Brooks
DOI: https://doi.org/10.1002/(SICI)1097-4571(199806)49:8%3C731::AID-ASI7%3E3.0.CO;2-O
1998-06-01
Journal of the American Society for Information Science
Abstract:Orthography is the linguistic study of written language: Elements of text such as letters, punctuation marks, and spelling. Information retrieval systems operate in the orthographic realm matching some text strings (i.e., index entries) from documents with other text strings (i.e., query terms) from patrons. During the early history of information retrieval, it has been convenient to assume the rationality and uniformity of orthography in order to concentrate effort building information retrieval systems. Fundamental orthographic problems have persisted into modern information retrieval systems, however, where white-space normalization and the arbitrary treatment of punctuation have exacerbated the orthographic impediment to information retrieval.
What problem does this paper attempt to address?