Abstract:The recent discoveries of new forms of quantum statistics require a close look at the under-lying Fock space structure. This exercise becomes all the more important in order to provide a general classification scheme for various forms of statistics, and establish interconnections among them whenever it is possible. We formulate a theory of generalized Fock spaces, which has a three tired structure consisting of Fock space, statistics and algebra. This general formalism unifies various forms of statistics and algebras, which were earlier considered to describe different systems. Besides, the formalism allows us to construct many new kinds of quantum statistics and the associated algebras of creation and destruction operators. Some of these are: orthostatistics, null statistics or statistics of frozen order, quantum group based statistics and its many avatars, and `doubly-infinite' statistics. The emergence of new forms of quantum statistics for particles interacting with singular potential is also highlighted.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of the lack of available annotated data in the field of natural language processing (NLP), especially in new languages or new domains. Specifically, the author explores two different methods to develop a base noun phrase chunker: 1. **Manual rule writing**: Build the system by writing rules by hand. 2. **Annotation - based learning**: Train the system through active learning and real - time human annotation. #### Research background - **High cost of data acquisition**: Collecting and annotating data is both time - consuming and expensive. For example, the construction of the Penn Treebank has significantly improved the performance of English systems, but for new languages, similar investments in time and money are often not feasible. - **Rule writing vs. data annotation**: In the face of the cost of data acquisition, rationalists may think that writing manual rules is more cost - effective than annotating data. Therefore, the question the author tries to answer is: under the given cost assumptions, which method is the most effective? #### Main research questions - **Compare the cost - effectiveness of different methods**: The paper presents a comprehensive empirical comparison, evaluating the efficiency and success rate of the two methods under the same manpower input. - **Introduce new active learning algorithms**: Explore several novel active learning variants and propose a comparative cost model for cross - modal machine learning. - **Experimental verification**: Compare the manual rule - writing and annotation - based learning methods through experiments, and analyze their performance under different conditions. #### Experimental setup - **Base noun phrase chunking task**: All experiments are carried out on the base noun phrase chunking task. - **Initial corpus**: Use part of the data from the Wall Street Journal Treebank as the initial corpus. - **Active learning framework**: Adopt the query by committee method, combined with a batch selection strategy. - **Rule - writing experiment**: Have computer science students write rules and record each student's rule - modification process. #### Key findings - **Advantages of active learning**: The results show that the annotation method based on active learning is more efficient and more successful than manual rule - writing on multiple indicators. - **Cost - model analysis**: Propose a comprehensive cost model for evaluating the time and monetary costs of different methods. ### Summary The core problem of this paper is to explore how to build an efficient base noun phrase chunker at the lowest cost under limited human resources. Through empirical research, the author proves that the annotation - based active learning method is superior to manual rule - writing in most cases, especially in the application on large - scale data sets.

Generalized Fock Spaces and New Forms of Quantum Statistics

Rule-Based and Word-Level Statistics-Based Processing of Language: Insights from Neuroscience

Human-centred Design on Crowdsourcing Annotation Towards Improving Active Learning Model Performance

Automatic Learning and Refinement Algorithm for Chinese Base Chunk Rules

Reasoning Makes Good Annotators : an Automatic Task-specific Rules Distilling Framework for Low-resource Relation Extraction

Automatic Rule Acquisition for Chinese Intra-chunk Relations.

Cost-Effective Data Annotation Using Game-Based Crowdsourcing

IDEA: Enhancing the Rule Learning Ability of Large Language Model Agent through Induction, Deduction, and Abduction

Artificially Evolved Chunks for Morphosyntactic Analysis

Experiments in Learning Models for Functional Chunking of Chinese Text

Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Integrating NLP and context-free grammar for complex rule interpretation towards automated compliance checking

Improving Task Instructions for Data Annotators: How Clear Rules and Higher Pay Increase Performance in Data Annotation in the AI Economy

A Machine-Learning Approach to Estimating the Referential Properties of Japanese Noun Phrases

RuleR: Improving LLM Controllability by Rule-based Data Recycling

Research on Annotation Rules and Recognition Algorithm Based on Phrase Window

Static Spin Correlation in LTT Phase of La$_{1.875}$Ba$_{0.075}$Sr$_{0.05}$CuO$_4$

A Machine Learning Approach to Coreference Resolution of Noun Phrases

A Meta-Rule-Based Approach to NLG Rules Interpretation

Text Chunking using Transformation-Based Learning

RulePad: Interactive Authoring of Checkable Design Rules