Abstract:As the growing amount of data stored on the Internet, the work of searching for information becomes complicated. The traditional collection method cannot achieve a certain effect, it is cumbersome and time-consuming. Using natural language processing technology and web crawler technology to collect and analyze data about student evaluation, the purpose is to obtain the key factors affecting teachers' comprehensive evaluation results and propose the methods to solve the problems. For the traditional Web crawler technology, there is a lack of certain intelligence, initiative, etc. the design of the best priority crawler framework has improved and optimized its structure. And the improved PageRank value, user demand correlation degree, and NDC algorithm denoising are added, it can effectively solve a series of problems such as long retrieval time, overlapping information, incomplete information, and improve the accuracy of information collection. Introduction The proposal of student evaluation system is to find a solution to the current situation according to the specific needs of students and teachers' teaching requirements. As an integrated data processing technology, data fusion technology is applied to many traditional disciplines and emerging fields, which can improve the accuracy and reliability of target rule mining and prediction. In [1] combined with the crawler technology, the acquisition and analysis of multi-source spatial data is demonstrated, which is beneficial to better assist the urban planning work; In [2], the design of the Internet public opinion analysis system based on the principle of data fusion, and the data fusion analysis processing is realized by combining the crawler technology with the natural language processing technology; In [3], it is proposed a personal credit scoring system based on multi-source data fusion, which combines the logistic regression model to improve the model estimation accuracy; In [4], the data is collected by adaptive weighted fusion method based on data fusion principle, and the Grubbs criterion is used to eliminate invalid data, so as to comprehensively deal with the problem of measuring the parameters of the inlet section of the test piece in the afterburner of an aero-engine. Overall System Structure Design This paper uses multi-source data fusion technology to search for keyword group information about student evaluation in the webpages. The first chapter introduces the overall chapter arrangement of this article. The second chapter introduces the proposed system architecture and optimization scheme. In the third chapter, Applied to the comprehensive analysis of students' evaluation. Chapter four gives a summary and suggestions. As shown in Figure 1. Figure 1. Overall architecture flow chart Start Optimizati on scheme Web page classificat ion Data processin g Word frequency statistics Similar word substituti on Keyword list (NLP) END International Conference on Modeling, Analysis, Simulation Technologies and Applications (MASTA 2019) Copyright © 2019, the Authors. Published by Atlantis Press. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/). Advances in Intelligent Systems Research, volume 168

A novel combining method of dynamic and static web crawler with parallel computing

An Efficient Valid Page Crawling Approach for Websites with Dynamic Scripts

Research and Implementation of Algorithm Based on Data Fusion Technology

A novel agent-based parallel ETL system for massive data

A cognitive crawler using structure pattern for incremental crawling and content extraction

Parallelization in Extracting Fresh Information from Online Social Network

A Novel Web Scraping Approach Using the Additional Information Obtained From Web Pages

Architectural Design and Evaluation of an Efficient Web-Crawling System

The Implementation of Hadoop-based Crawler System and Graphlite-based PageRank-Calculation In Search Engine

Effective performance of information retrieval on web by using web crawling

A Parallel Pages Mining Approach: Combining URL Patterns and HTML Structures

A Dynamic Reconfiguration Model for a Distributed Web Crawling System

Optimized Focused Web Crawler with Natural Language Processing Based Relevance Measure in Bioinformatics Web Sources

Design and Research of Web Crawler Based on Distributed Architecture

A Semantic and Optimized Focused Crawler Based on Semantic Graph and Genetic Algorithm

Parallel Approach and Platform for Large-Scale WEB Data Extraction

A novel multi-threaded web crawling model

Unlocking New Insights for Electrocatalyst Design: A Unique Data Science Workflow Leveraging Internet-Sourced Big Data

Implementing dynamic high-performance computing supported workflows on Scanning Transmission Electron Microscope

Analysis of a Statistical Hypothesis Based Learning Mechanism for Faster crawling

Smart Bilingual Focused Crawling of Parallel Documents