"Term Clumping" for Technical Intelligence: A Case Study on Dye-Sensitized Solar Cells

Yi Zhang,Alan L. Porter,Zhengyin Hu,Ying Guo,Nils C. Newman
DOI: https://doi.org/10.1016/j.techfore.2013.12.019
2014-01-01
Abstract:Tech Mining seeks to extract intelligence from Science, Technology & Innovation information record sets on a subject of interest. A key set of Tech Mining interests concerns which R&D activities are addressed in the publication and patent abstract records under study. This paper presents six “term clumping” steps that can clean and consolidate topical content in such text sources. It examines how each step changes the content, potentially to facilitate extraction of usable intelligence as the end goal. We illustrate for an emerging technology, dye-sensitized solar cells. In this case we were able to reduce some 90,980 terms & phrases to more user-friendly sets through the clumping steps as one indicator of success. The resulting phrases are better suited to contributing usable technical intelligence than the original results. We engaged seven persons knowledgeable about dye-sensitized solar cells (DSSCs) to assess the resulting content. These empirical results advanced the development of a semi-automated term clumping process that can enable extraction of topical content intelligence.
What problem does this paper attempt to address?