A Scientometric Review of Research on Traffic Forecasting in Transportation
Jin Liu,Naiqi Wu,Yan Qiao,Zhiwu Li
DOI: https://doi.org/10.1049/itr2.12024
IF: 2.7
2020-01-01
IET Intelligent Transport Systems
Abstract:IET Intelligent Transport SystemsVolume 15, Issue 1 p. 1-16 REVIEWOpen Access A scientometric review of research on traffic forecasting in transportation Jin Liu, Jin Liu orcid.org/0000-0003-3781-5927 Institute of Systems Engineering and Collaborative Laboratory for Intelligent Science and Systems, Macau University of Science and Technology, Macao, 999078 ChinaSearch for more papers by this authorNaiqi Wu, Corresponding Author Naiqi Wu nqwu@must.edu.mo Institute of Systems Engineering and Collaborative Laboratory for Intelligent Science and Systems, Macau University of Science and Technology, Macao, 999078 China State Key Laboratory of Precision Electronic Manufacturing Technology and Equipment, Guangdong University of Technology, Guangzhou, 510006 China Correspondence Naiqi Wu, Institute of Systems Engineering and Colaboratory for Intelligent Science and Systems, Macau University of Science and Technology, Macao, 999078, China; State Key Laboratory of Precision Electronic Manufacturing Technology and Equipment, Guangdong University of Technology, Guangzhou, 510006, China. Email: nqwu@must.edu.moSearch for more papers by this authorYan Qiao, Yan Qiao Institute of Systems Engineering and Collaborative Laboratory for Intelligent Science and Systems, Macau University of Science and Technology, Macao, 999078 ChinaSearch for more papers by this authorZhiwu Li, Zhiwu Li Institute of Systems Engineering and Collaborative Laboratory for Intelligent Science and Systems, Macau University of Science and Technology, Macao, 999078 ChinaSearch for more papers by this author Jin Liu, Jin Liu orcid.org/0000-0003-3781-5927 Institute of Systems Engineering and Collaborative Laboratory for Intelligent Science and Systems, Macau University of Science and Technology, Macao, 999078 ChinaSearch for more papers by this authorNaiqi Wu, Corresponding Author Naiqi Wu nqwu@must.edu.mo Institute of Systems Engineering and Collaborative Laboratory for Intelligent Science and Systems, Macau University of Science and Technology, Macao, 999078 China State Key Laboratory of Precision Electronic Manufacturing Technology and Equipment, Guangdong University of Technology, Guangzhou, 510006 China Correspondence Naiqi Wu, Institute of Systems Engineering and Colaboratory for Intelligent Science and Systems, Macau University of Science and Technology, Macao, 999078, China; State Key Laboratory of Precision Electronic Manufacturing Technology and Equipment, Guangdong University of Technology, Guangzhou, 510006, China. Email: nqwu@must.edu.moSearch for more papers by this authorYan Qiao, Yan Qiao Institute of Systems Engineering and Collaborative Laboratory for Intelligent Science and Systems, Macau University of Science and Technology, Macao, 999078 ChinaSearch for more papers by this authorZhiwu Li, Zhiwu Li Institute of Systems Engineering and Collaborative Laboratory for Intelligent Science and Systems, Macau University of Science and Technology, Macao, 999078 ChinaSearch for more papers by this author First published: 28 December 2020 https://doi.org/10.1049/itr2.12024Citations: 5 Funding information The Science and Technology Development Fund (FDCT), Macau SAR, File Number: 0017/2019/A1 and 0002/2020/AKP AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat Abstract Research on traffic forecasting in transportation has received worldwide concern over the past three decades. While there are comprehensive review studies on traffic forecasting, few of them explore the research advancement in this field from a visual perspective. With the help of CiteSpace and VOSviewer, this study uses scientometric review to identify the evolution and emerging trends of the research in the field. Totally, 1536 bibliographic records with references are extracted from Web of Science and used as the datasets to form the author network, institutional network, keyword network, and co-citation network. The visualization of the results characterizes the research progress in the field. It can be found that Eleni I. Vlahogianni receives the highest citation frequency, China and the United States contribute most of the journal articles. Some influential institutions and articles are also identified. With the author keyword network, the words "recurrent neural network", "convolutional neural network", "spatio-temporal correlation", "traffic pattern", and "feature selection" are identified as the emerging trends. Also, the document citation bursts reveal that the applications of combined models and the study of traffic flow forecasting in atypical situations are becoming the emerging trends. This study provides a valuable reference for the research community in this field. 1 INTRODUCTION The rapid growth of motor vehicles highly accelerates the imbalance between the infrastructure service capacity and the traffic requirements in transportation, especially in urban transportation. Taking Hong Kong as an example, the number of private cars has increased more than 47.50% over the past 10 years while the road length grew only less than 5% during the same period [1]. This would lead to serious problems to urban transportation if it is not efficiently managed, such as severe traffic jam and long travel time, frequent traffic accidents, and serious air and noise pollution. All these issues further reduce the efficiency of urban operations and people's satisfaction with their quality of life. Fortunately, intelligent transportation systems (ITS) that are capable of improving operational efficiency and system integration are put forward and adopted worldwide [2, 3]. The provision of accurate and reliable real-time information and prediction of traffic parameters becomes one of the core aspects of ITS success [4]. Also, such information is one of the primary needs for the transportation community to understand future traffic conditions [5]. Generally speaking, accurate traffic forecasting can help to explain what the travel demands of the future might be and furnish benchmarks for proper planning and highly efficient operations so as to reduce traffic congestion and improve mobility and air quality [6, 7]. In other words, if the traffic conditions during the next time interval of a certain section can be predicted accurately, actions can be taken to effectively control traffic flows in advance according to the new traffic control scheme. The work in [8] describes traffic forecasting as a way to estimate directly the anticipated traffic conditions that are reflected primarily by some traffic parameters, including traffic flows, road occupancy, travel time, traffic speed, and so on, which provides a considerably transparent explanation for traffic forecasting. In terms of the duration of time, traffic forecasting falls into two categories—strategic traffic forecasting and short-term traffic forecasting. The former aims to predict traffic conditions for months or years in the future, while the latter focuses on predictions for the next few seconds through few hours [7, 9]. Comparatively, more attention has been drawn to the latter due to its capability of adaptive implementation. Previous literature reviews on traffic forecasting have been completed in several articles. The first article [10] presents neural network applications in civil engineering with traffic forecasting being mentioned as a part of the article. Later, [8] gave a systematic review on algorithm developments for short-term traffic forecast based on publications up to 2003 in the literature by analysing the determination of scope, conceptual output specification, and modelling. After ten years, in 2014, [9] highlighted the existing ten challenges to short-term traffic forecasting with a top-down approach and guided the directions for future research efforts. Based on the review work done in [9], [11] systematically surveys the recent progress on data-driven traffic forecasting models, covering the period from 2014 to 2016, and reveals the latest technical challenges faced by traffic forecasting. As pointed out by [9], Artificial Intelligence (AI) is a key technology and can be treated as an excellent alternative for data mining in transportation applications. Recently, Deep Learning (DL) has drawn more and more concerns and its applications in traffic forecast are also reviewed in [12, 13], respectively. They indicate that DL is truly the most effective candidate, resulting from the powerful capacity of processing non-linear data. Traffic forecasting is an awfully complicated problem in the transportation domain, involving various data sources, model selections, and so on. Although there have been a few articles that review this topic comprehensively, some gaps still need to be bridged from a new perspective. On the one hand, previous literature reviews significantly rely on the author' experiences and subjective judgments in this field; in other words, their studies are conducted mainly based on qualitative methods instead of quantitative ones. On the other hand, previous review articles lack visualized analysis for all the studies on traffic forecasting such that they cannot reveal their co-citation relationship and keyword evolution well. This is mainly resultant from the huge amount of research reports in the literature on this topic in the past more than four decades. Hence, obviously, it is almost impossible to review all of the studies manually, motivating us to conduct this study in a different way. Based on the above discussion, this study attempts to present a comprehensive review on the research in traffic forecasting from a different perspective, namely, scientometrics review, to bring more direct comprehension for all concerned aspects. The goal of this study is to comprehensively review all the relevant articles retrieved from the Web of Science platform (WoS), ranging from the first article on traffic forecasting indexed by WoS to the latest one indexed in 2019. With the employment of scientometric analysis, several tasks could be achieved automatically, including (i) constructing an author network in the field of traffic forecasting, (ii) developing an institution network in the field, (iii) ranking the journals that publish articles on this topic, (iv) establishing an article co-citation network, (v) presenting a keyword network analysis and burst detection, and (vi) identifying the emerging trends and technological evolution of traffic forecasting research. The main contributions of this study can be listed as follows: (1) for the first time, we introduce a scientometric approach to the field of traffic forecasting research to quantify the research progress in this field; (2) the technological evolution in this field is visualized and emerging trends are identified and analyzed; and (3) influential institutions, journals, scholars, and documents are identified to provide a reference for the research community, while popular forecasting techniques are listed to provide benchmarks of comparisons. The remainder of this work is structured as follows. Section 2 provides a detailed specification of the review methodology, namely, scientometrics and data acquisition. Section 3 discusses the co-authorship network and institutional network followed by keyword co-occurring network in Section 4. Journal co-citation network, author co-citation network, and document co-citation network are analysed in Section 5. Based on the scientometric analysis, Section 6 lists several challenges in the field of traffic forecasting. Section 7 summarizes the entire research effort and suggests some directions for future inquiry. 2 METHODOLOGY AND DATA COLLECTION 2.1 Methodology Scientometric review, regarded as a quantitative study of science, is an important method to comprehensively evaluate and examine the development of a research field [14-16]. Different from the traditional literature review method, this novel one can provide a wider range of articles to be reviewed, which means not only the articles themselves, but also their citing articles can all be reviewed simultaneously [16]. Additionally, with the help of the scientometric analysis and the support of its corresponding software, researchers can carry out the review work effortlessly and repeatedly rather than relying on domain experts, by which the emerging trends are always available. In recent years, the scientometric analysis has been applied by several researchers across different domains, including recommendation systems [16], building information modelling [14], sustainability and sustainable development [17], genome-wide association [18], and so on. In the present study, we employ CiteSpace and VOSviewer to conduct the scientometric analysis for the domain of traffic forecasting. For scientists, they are very useful knowledge visualization tools. CiteSpace is a powerful and popular visualization application developed by Dr. Chen to explore and visualize hot topics, emerging trends, and fundamental changes in the focused field over time [19]. Apart from generating clusters of authors, institutions, and co-citation, CiteSpace can be used to construct a co-occurrence network, providing different keywords that appear in the same article. More importantly, the burst detection, as one of the most critical functions, is used to mine emerging trends by a special algorithm for detecting sharp changes in terms of frequency of occurrences. This freely available computational application can be downloaded from its official website with related manual and books. In addition, VOSviewer is also used for scientometric analysis based on the same dataset. This visualization tool developed by van Eck and Waltman can analyze authors, citations, keywords etc. It is unique in clustering techniques and visualization [20]. Different from CiteSpace, VOSviewer mainly illustrates the clustering relationship among nodes in terms of distance and density, accurately exploiting the nature of research topics through an effective combination. 2.2 Data collection Web of Science (WoS) is one of the most frequently used literature search engines for researchers in different fields of sciences, providing a comprehensive citation indexing. Specifically, it provides basic information of publications in the literature, which involves the users' concerns, ranging from titles, authors, journals, organizations, keywords, abstracts to citation records. The core collection of WoS covers over 18,000 high impact journals and more than 148 million records from all over the world, tracing back to the early 20th century [21]. Here, we retrieve the records of the selected documents from WoS. When conducting the searching work, the database in WoS from which records are extracted is limited to "Web of Science Core Collection" to ensure the validity of the data source for this study. The searching rules for retrieving the related articles are described as follows. (i) Advanced search is carried out with the input being "(TS = (traffic AND forecast*) OR TS = (traffic AND predict*)) AND (TI = forecast* OR TI = predict* OR TI = estimat*)", where * represents a fuzzy search, and TI and TS mean an article title and subject, respectively; (ii) The language is restricted to English and the document type is restricted to article. This is mainly due to the fact that journal articles are usually subject to a rigorous peer review; (iii) Timespan is set to "from 1975 to 2019″; and (iv) The citation indices include Science Citation Index Expanded (SCI-EXPANDED) and Social Sciences Citation Index (SSCI). Totally, 4776 bibliography records are retrieved and among them a significant number of articles are not related to the subject of traffic forecasting in transportation. It is found that, based on the given search rules, the articles relevant to traffic forecasting in the domain of communication networks are also retrieved. Hence, those records need to be excluded manually to guarantee the relevance of the selected records to the topic on which we are studying. After selection, 1536 bibliographic records are downloaded in January 2020 with cited references being also included in order to conduct the co-citation analysis in Sections 3–5, 3–5, 3–5. CiteSpace and VOSviewer are employed to perform the scientometric analysis for the acquired bibliographies. CiteSpace is employed to display simple networks and time zone diagrams etc., while VOSviewer is used for some complex network analysis, mainly because some nodes in CiteSpace need to be manually positioned when a network is complex. Figure 1 gives a snapshot to show the number of articles published annually during the last 45 years on traffic forecasting in transportation. It is quite clear that the overall evolution is largely on a significant upward trend and has been growing rapidly in recent years, verifying that the research on traffic forecasting in transportation is gaining more and more attention. During this period, the first article in this domain was published in 1976, using a non-linear parametric model for traffic forecasting [22]. Since then, the number of published articles in this domain has increased year by year. Specifically, in 2019, the number of publications peaked at 330, accounting for 21.48% of the totally published articles during this period. A simple reason for this is that, along with the development in the domain of transportation, some new methods and their variants, especially machine learning and DL models, have been introduced into traffic forecasting in recent years. FIGURE 1Open in figure viewerPowerPoint Annual distribution of the published articles on traffic forecasting in transportation 3 AUTHOR NETWORK ANALYSIS AND DISCUSSION In this section, we first use the collected records to generate a co-authorship network and use the statistical results in CiteSpace to identify the top productive authors. Further, considering the authors' institutions and countries, we get the network of institutions and nations to distinguish the top institutions and nations in this field. 3.1 Co-authorship network An author's productivity level can, to some extent, represent the researcher's efforts in the corresponding field [23]. The application of co-authorship network not only identifies the most productive authors in the field of traffic forecasting research, but also clearly and visually demonstrates the co-authorship relationship among those authors. CiteSpace is used to visualize the co-authorship network on traffic forecasting in transportation, which is depicted in Figure 2. There are totally 804 nodes and 698 links in this figure, and the network density is 0.0022. It can be found that the number of researchers in this field is relatively large and the co-authorship network basically shows a "core-edge" structure. Overall, it is clear that the collaboration among the authors is relatively fragmented, only a small number of academic teams are formed, and there is a portion of the network, where author collaboration is not significant. FIGURE 2Open in figure viewerPowerPoint The co-authorship network on traffic forecasting in transportation In Figure 2, each node represents an author and the link between two nodes indicates the establishment of a co-authorship between them in their publications. Note that the size of each node denotes the number of publications, while the thickness of the links signifies the strength of the authors' relationship. Besides, different colours of links represent different time spans from 1975 to 2019. In terms of productivity, Yinhai Wang, a professor with the Department of Civil and Environmental Engineering at the University of Washington, is the most productive author in the domain of traffic forecasting. Other top seven productive authors with more than 10 publications include Bin Ran, Li Li, Lei Zhang, J. W. C. van Lint, Constantinos Antoniou, and Jianhua Guo. The rest of the top authors with more than eight publications and their details are listed in Table 1, where "Year" indicates year when the corresponding author published the first article in this field. Furthermore, when taking the cooperative relationship into consideration, several research communities exist in the collaborative network, with productive authors generally located at the centre of the communities which they belong to. The first primary community falls into the research circuit composed of Yinhai Wang, Jinjun Tang, and Yunpeng Wang as the central authors and others including Dongfang Ma, Yajie Zou, Weibin Zhang, Jing Qin, and Fang Zong. Another large community corresponds to the research circuit in which Wei Huang, Jianhua Guo, Yun Wei, Jinde Cao, and Bin Ran can be regarded as the central authors. TABLE 1. Top 16 productive authors on traffic forecasting Author Institution Frequency Year Yinhai Wang University of Washington 15 2014 Bin Ran University of Wisconsin-Madison 13 2000 Li Li Tsinghua University 10 2014 Lei Zhang University of Maryland 10 2013 J. W. C. van Lint Delft University of Technology 10 2002 Constantinos Antoniou Technical University of Munich 10 2005 Jianhua Guo Southeast University 10 2007 Hani s. Mahmassani Northwestern University Transportation Center 9 2004 Wei Huang Southeast University 9 2018 Ali Haghani University of Maryland 8 2013 Lelitha Vanajakshi Indian Institute of Technology Madras 8 2017 Yang Liu Southeast University 8 2018 Eleni I. Vlahogianni National Technical University of Athens 8 2004 Jinjun Tang Central South University 8 2016 Serge P. Hoogendoorn Delft University of Technology 8 2002 Yunpeng Wang Beihang University 8 2015 3.2 Network of institutions and nations Here, a network of institutions and nations, where the authors come from, is constructed by the use of CiteSpace to explore the geographical distribution of these articles. Figure 3 displays the network and we can see clearly that 16 nations/regions that make the most contributions to traffic forecasting research are depicted in this network with different size of nodes. They are China (623 articles, 40.56%), USA (455 articles, 29.62%), UK (68 articles, 4.43%), Australia (67 articles, 4.36%), South Korea (54 articles, 3.52%), Canada (51 articles, 3.32%), Taiwan (47 articles, 3.06%), India (44 articles, 2.86%), Greece (35 articles, 2.28%), Germany (32 articles, 2.08%), Iran (31 articles, 2.02%), Spain (30 articles, 1.95%), Japan (29 articles, 1.89%), Italy (28 articles, 1.82%), and Sweden (27 articles, 1.76%). Obviously, China and USA provide the majority of the articles, accounting for 70.18%. In terms of the publication year, Sweden, UK, and USA are the first countries to conduct traffic forecasting studies, owing to the strong industrial base of these developed countries. Besides, it is worth noting that links among these countries are considerably dense, indicating a close cooperation between them on traffic forecasting research. Traffic congestion is a common problem in many cities and needs to be alleviated by the development of ITS. However, the effective operation of ITS first relies on the advancement of traffic forecasting. Close cooperation among these nations can help improve people's mobility. Another index of the network, called centrality, can reveal a similar conclusion. Nodes with a high centrality score are those that connect two or more large groups of nodes [14, 17], and the identification of nodes with a high centrality also implies that these countries are the critical ones in this research field. The countries with high centrality include USA (0.95), China (0.52), UK (0.27), and Australia (0.22), indicating that they possess the core of the network and play a significant role in the research domain. FIGURE 3Open in figure viewerPowerPoint The network of institutions and nations Moreover, as far as institutions are concerned, the researches that has been done on traffic forecasting do not concentrate particularly on a small number of nodes; on the contrary, the nodes of institutions are numerous in the network, implying that large number of institutions conduct research on traffic forecasting. The institutions with more than 20 articles include Southwestern University (69 articles, China); Beijing Jiaotong University (57 articles, China); Tsinghua University (45 articles, China); Delft University of Technology (37 articles, Netherlands); University of Maryland (31 articles, USA); Tongji University (31 articles, China); Changan University (27 articles, China); Beihang University (23 articles, China); Hong Kong Polytechnic University (22 articles, China); National Technical University of Athens (22 articles, Greece); Chinese Academy of Sciences (22 articles, China), and Texas A&M University (21 article, USA). Each of these institutions has considerable influences in this research field, and their researches are largely representative of the level of development in the field. 4 KEYWORD NETWORK ANALYSIS AND DISCUSSION Keywords represent the core content of the articles and demonstrate the development of the research topic over time [14]. In this sub-section, we perform an analysis of author keywords by constructing keyword networks to explore the hotspots and emerging trends in the field of traffic forecasting research. There are two types of keywords in the WoS bibliography—author keywords, given by authors; and keywords plus, provided by journals. Author keywords are leveraged for analysis because they are usually more accurate in terms of expressing the core content of an article. It is worth noting that some of the author keywords are synonymous or there are differences in expressions, such as "traffic flow forecasting" and "traffic flow prediction", or "Gaussian process" and "Gaussian processes", whereas most of the scientometric software tools have no way to identify them automatically. Hence, the author keywords need to be merged first to facilitate an accurate analysis in the next step. By programming, the author keywords are efficiently cleaned up mainly by replacing the synonyms. 4.1 Network of co-occurring keywords VOSviewer is used to analyze and display the network of co-occurring keywords which can reveal research hotspots. Compared with other scientometrics software tools, VOSviewer is more suitable for the graphical representation of scientometric maps, particularly functional for presenting large scientometric maps in an easily interpretable way [20]. In all the selected articles, totally 3713 keywords are extracted among which only 605 keywords meet the threshold of two, meaning that the minimum of occurrences of a keyword is two. Here, we set the threshold to be four and obtain 159 keywords to generate the network of co-occurring keywords, which is depicted in Figure 4. FIGURE 4Open in figure viewerPowerPoint The network of co-occurring keywords with VOSviewer Almost as with CiteSpace, each node in Figure 4 represents a keyword. The size of a node or its label in the network expresses the occurrence frequency of the corresponding keyword, meaning that the larger the node or label is, the more important the keyword is. The distance between two nodes indicates the strength of the relationship between the two keywords. For this reason, the closer the two nodes are to each other, the more frequently these two keywords appear together [24]. In addition, the colour of the nodes explains the cluster to which the node belongs. The largest node in Figure 4 belongs to "traffic flow" with the weight of occurrences being 136 (see Table 2), followed in a descending order by "neural network" (103); "travel time"(78); "ITS" (77); "deep learning" (51); "short-term forecasting" (46); "time series" (41); "support vector machine" (39); "Kalman filtering" (33), and "genetic algorithm" (27). It is clear that these keywords present the research hotspots in the field of traffic forecasting. What is more, these nodes are very close in terms of the distance, including "traffic flow", "neural networks", "travel time", "intelligent transportation systems", "deep learning", "support vector machines", "short-term forecasting" etc. This also means that the research domain primarily focuses on using neural networks, DL, and support vector machines to study traffic flow prediction and travel time prediction in ITS, especially for short-term forecasting. TABLE 2. Top 29 keywords with high occurrences Keywords Occurrences Keywords Occurrences Keywords Traffic flow 136 Road traffic 33 Big data Neural network 103 Genetic algorithm 27 Learning (artificial intelligence) 18 Travel time 78 Traffic speed 26 Model predictive control 18 Intelligent Transportation Systems 77 Short-term traffic flow forecasting 25 Traffic engineering computing 18 Traffic forecasting 74 Machine learning 24 Convolutional neural network 17 Deep learning 51 Traffic state 21 Recurrent neural network 17 Short-term forecasting 46 Artificial neural networks 20 Bayesian network 16 Time series 41 Data mining 20 Freeway 15 Support vector machine 39 Support vector regression 20 Particle swarm optimization 15 Kalman filter 33 Traffic volume 19 Note that the temporal aspect cannot be reflected in Figure 4. Furthermore, we need to explore and discover which keywords are the emerging hotspots, which are slowly fading away, and which are the historical resea