Hybrid Clustering for Validation and Improvement of Subject-Classification Schemes
Frizo Janssens,Lin Zhang,Wolfgang Glnzel
DOI: https://doi.org/10.5772/6443
2009-01-01
Abstract:Data Mining and Knowledge Discovery in Real Life Applications 90 Earlier, a completely different approach was introduced by Callon et al., (1983) and Callon, Law and Rip (1986).Their mapping and visualisation tool Leximappe was based on a lexical approach, particularly, co-word analysis.The notion of lexical approach, which was originally based on extracting keywords from records in indexing databases, was later on deepened and extended by using advanced text-mining techniques in full texts (cf.Kostoff et al., 2001(cf.Kostoff et al., , 2005;; Glenisson et al., 2005a,b).Whatever method is used to study the structure of science, cluster algorithms have beyond doubt become the most popular technique in science mapping.The sudden, large interest the application of these techniques has found in the community is contrasted by objections and criticism from the viewpoint of information use in the framework of research evaluation (e.g., Noyons, 2001;Jarneving, 2005).For instance, clustering based on co-citation and bibliographic coupling has to cope with several severe methodological problems.This has been reported, among others by Hicks (1987) in the context of co-citation analysis and by Janssens et al. (2008) with regard to bibliographic coupling.One promising solution is to combine these techniques with other methods such as text mining (e.g., combined cocitation and word analysis: Braam et al., 1991; combination of coupling and co-word analysis: Small (1998); hybrid coupling-lexical approach: Janssens et al., 2007bJanssens et al., , 2008)).Most applications were designed to map and visualise the cognitive structure of science and its change in time, and, from a policy-relevant perspective, to detect new, emerging disciplines.Improvement of subject-classification schemes was in most cases not intended.Jarneving (2005) proposed a combination of bibliometric structure-analytical techniques with statistical methods to generate and visualise subject coherent and meaningful clusters.His conclusions drawn from the comparison with 'intellectual' classification were rather sceptical.Despite several limitations, which will be discussed further in the course of the present study, cognitive maps proved useful tools in visualising the structure of science and can be used to adjust existing subject classification schemes even on the large scale as we will demonstrate in the following.The main objective of this study is to compare (hybrid) cluster techniques for cognitive mapping with traditional 'intellectual' subject-classifications schemes.The most popular subject classification schemes created by Thomson Scientific (Philadelphia, PA, USA) are based on journal assignment.Therefore journal cross-citation analysis puts itself forward as underlying method and we will cluster the document space using journals as predefined units of aggregation.In contrast to the method applied by Leydesdorff (2006), who uses the Journal Citation Reports (JCR), we calculate citations on a paper-by-paper basis and then assign individual papers indexed in the Web of Science (WoS) database to the journals in which they have been published.The use of the JCR would confine us to data as available in the JCR and prevent us from combining cross-citation analysis with a textual approach.What is more, proceeding from the document level allows us to control for document types and citation windows, and to combine bibliometrics-based techniques with other methods like text mining.This results in a higher precision since irrelevant document types and 'lowweight journals' can be excluded.This way we can present the results of a hybrid (i.e., combined/integrated) citation-textual cluster analysis to compare those with the structure of an existing 'intellectual' subject classification scheme created and used by Thomson Scientific.The aim of this comparison is exploring the possibility of using the results of the cluster analysis to improve the subject classification scheme in question.www.intechopen.comHybrid Clustering for Validation and Improvement of Subject-Classification Schemes 91 1.1 Cognitive mapping vs. subject classificationThe objective of the present study is two-fold.The first task is not merely visualising the field structure of science by presenting yet another map based on an alternative approach, but to validate and improve existing subject classifications used for research evaluation.In particular, the question arises of in how far observed 'migration' of journals among science fields can be adopted to improve classification.The second issue is, however, a methodological one, namely to evaluate improved methods of hybrid clustering techniques.The 22-field subject classification scheme of the Essential Science Indicators (ESI) of Thomson Scientific, which actually forms a partition of the Web of Science universe with practically unique subject assignment, is used as the "control structure".In particular, we propose the following approach in seven steps to solve the integration of cluster analysis and cognitive mapping into subject classification.1. Evaluation of existing subject-classification schemes and visualisation of their crosscitation graph 2. Labelling subject fields using cognitive characteristics 3. Studying the cognitive structure based on hybrid cluster analysis and visualisation of the cross-citation graph 4. Evaluation of science areas resulting from cluster analysis 5. Labelling clusters using cognitive characteristics and representative journals suggested by the PageRank algorithm 6.Comparison of subject fields and cluster structure 7. Migration of journals among subject fields