Abstract:Purpose Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by learners all over the world, unprecedented massive educational resources are aggregated. The educational resources include videos, subtitles, lecture notes, quizzes, etc., on the teaching side, and forum contents, Wiki, log of learning behavior, log of homework, etc., on the learning side. However, the data are both unstructured and diverse. To facilitate knowledge management and mining on MOOCs, extracting keywords from the resources is important. This paper aims to adapt the state-of-the-art techniques to MOOC settings and evaluate the effectiveness on real data. In terms of practice, this paper also tries to answer the questions for the first time that to what extend can the MOOC resources support keyword extraction models, and how many human efforts are required to make the models work well. Design/methodology/approach Based on which side generates the data, i.e instructors or learners, the data are classified to teaching resources and learning resources, respectively. The approach used on teaching resources is based on machine learning models with labels, while the approach used on learning resources is based on graph model without labels. Findings From the teaching resources, the methods used by the authors can accurately extract keywords with only 10 per cent labeled data. The authors find a characteristic of the data that the resources of various forms, e.g. subtitles and PPTs, should be separately considered because they have the different model ability. From the learning resources, the keywords extracted from MOOC forums are not as domain-specific as those extracted from teaching resources, but they can reflect the topics which are lively discussed in forums. Then instructors can get feedback from the indication. The authors implement two applications with the extracted keywords: generating concept map and generating learning path. The visual demos show they have the potential to improve learning efficiency when they are integrated into a real MOOC platform. Research limitations/implications Conducting keyword extraction on MOOC resources is quite difficult because teaching resources are hard to be obtained due to copyrights. Also, getting labeled data is tough because usually expertise of the corresponding domain is required. Practical implications The experiment results support that MOOC resources are good enough for building models of keyword extraction, and an acceptable balance between human efforts and model accuracy can be achieved. Originality/value This paper presents a pioneer study on keyword extraction on MOOC resources and obtains some new findings.

Keyword Extraction: A Modern Perspective

Keyword Extraction: A Modern Perspective

Keyword extraction: Issues and methods

Exploring Simultaneous Keyword and Key Sentence Extraction

General-use unsupervised keyword extraction model for keyword analysis

Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking

Keyword Extraction in Scientific Documents

A Modified Approach To Keyword Extraction Based On Word-Similarity

Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods

Application of Keyword Extraction on MOOC Resources

FRAKE: Fusional Real-time Automatic Keyword Extraction

Impact analysis of keyword extraction using contextual word embedding

TF-IDF Keyword Extraction Method Combining Context and Semantic Classification

Automatic Keyword Extraction for Text Summarization: A Survey

Improved automatic keyword extraction given more linguistic knowledge

Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective

Empirical Analysis on a Keyword-Based Semantic System

Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words

Keywords Extraction and Thesaurus Construction for Domain News

Comparative Study of Domain Driven Terms Extraction Using Large Language Models

Keyword Extraction using the Word Co-occurrence Network Properties that is Independent of Languages and Document Types and Its Evaluation by Prediction of Headline Words