Abstract:We introduce the first large-scale dataset, MNISQ, for both the Quantum and the Classical Machine Learning community during the Noisy Intermediate-Scale Quantum era. MNISQ consists of 4,950,000 data points organized in 9 subdatasets. Building our dataset from the quantum encoding of classical information (e.g., MNIST dataset), we deliver a dataset in a dual form: in quantum form, as circuits, and in classical form, as quantum circuit descriptions (quantum programming language, QASM). In fact, also the Machine Learning research related to quantum computers undertakes a dual challenge: enhancing machine learning exploiting the power of quantum computers, while also leveraging state-of-the-art classical machine learning methodologies to help the advancement of quantum computing. Therefore, we perform circuit classification on our dataset, tackling the task with both quantum and classical models. In the quantum endeavor, we test our circuit dataset with Quantum Kernel methods, and we show excellent results up to $97\%$ accuracy. In the classical world, the underlying quantum mechanical structures within the quantum circuit data are not trivial. Nevertheless, we test our dataset on three classical models: Structured State Space sequence model (S4), Transformer and LSTM. In particular, the S4 model applied on the tokenized QASM sequences reaches an impressive $77\%$ accuracy. These findings illustrate that quantum circuit-related datasets are likely to be quantum advantageous, but also that state-of-the-art machine learning methodologies can competently classify and recognize quantum circuits. We finally entrust the quantum and classical machine learning community the fundamental challenge to build more quantum-classical datasets like ours and to build future benchmarks from our experiments. The dataset is accessible on GitHub and its circuits are easily run in qulacs or qiskit.

QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing

Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code

QCircuitNet: A Large-Scale Hierarchical Dataset for Quantum Algorithm Design

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models

A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B

Quantum Many-Body Physics Calculations with Large Language Models

MNISQ: A Large-Scale Quantum Circuit Dataset for Machine Learning on/for Quantum Computers in the NISQ era

Exploring LLM-Driven Explanations for Quantum Algorithms

Augmenting Math Word Problems via Iterative Question Composing

Application of Large Language Models to Quantum State Simulation

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

Developing Large Language Models for Quantum Chemistry Simulation Input Generation

Quantum Curriculum Learning

Open Source Variational Quantum Eigensolver Extension of the Quantum Learning Machine (QLM) for Quantum Chemistry

AI-Assisted Generation of Difficult Math Questions

LLM-Generated Tips Rival Expert-Created Tips in Helping Students Answer Quantum-Computing Questions

CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions

Unleashing the Potential of LLMs for Quantum Computing: A Study in Quantum Architecture Design

LongIns: A Challenging Long-context Instruction-based Exam for LLMs