TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

Zhiheng Huang,Peng Xu,Davis Liang,Ajay Mishra,Bing Xiang

DOI: https://doi.org/10.48550/arXiv.2003.07000

2020-03-16

Abstract:Bidirectional Encoder Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks including sentence classification, machine translation, and question answering. The BERT model architecture is derived primarily from the transformer. Prior to the transformer era, bidirectional Long Short-Term Memory (BLSTM) has been the dominant modeling architecture for neural machine translation and question answering. In this paper, we investigate how these two modeling techniques can be combined to create a more powerful model architecture. We propose a new architecture denoted as Transformer with BLSTM (TRANS-BLSTM) which has a BLSTM layer integrated to each transformer block, leading to a joint modeling framework for transformer and BLSTM. We show that TRANS-BLSTM models consistently lead to improvements in accuracy compared to BERT baselines in GLUE and SQuAD 1.1 experiments. Our TRANS-BLSTM model obtains an F1 score of 94.01% on the SQuAD 1.1 development dataset, which is comparable to the state-of-the-art result.

Computation and Language,Machine Learning,Sound,Audio and Speech Processing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to construct a more powerful model architecture by combining Transformer and Bidirectional Long - Short - Term Memory Network (BLSTM) to improve the performance of natural language processing tasks. Specifically, the author proposes a new model architecture - Transformer with BLSTM (TRANS - BLSTM), which integrates a BLSTM layer in each Transformer block to form a joint modeling framework. In this way, the author hopes to utilize the respective advantages of Transformer and BLSTM to further improve the performance of the model on tasks such as sentence classification, machine translation and question answering. The paper mentions that although Bidirectional Encoder Representations from Transformers (BERT) has achieved state - of - the - art performance on multiple natural language processing tasks, before BERT, Bidirectional LSTM had been the main modeling architecture for tasks such as neural machine translation and question answering. Given that both of these models have performed well in various benchmark tests, the author poses a question: whether it is possible to surpass their respective single architectures by combining Transformer and BLSTM. To answer this question, the author designed the TRANS - BLSTM model and verified its performance on the GLUE and SQuAD 1.1 datasets through experiments. The results show that this model can indeed significantly improve the accuracy. In particular, on the SQuAD 1.1 development dataset, the TRANS - BLSTM model obtained an F1 score of 94.01%, which is comparable to the state - of - the - art results at that time.

TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

Chinese Text Classification Using BERT and Flat-Lattice Transformer.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Segatron: Segment-Aware Transformer for Language Modeling and Understanding

Enhancement of Question Answering System Accuracy Via Transfer Learning and BERT

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Language modeling and bidirectional coders representations: an overview of key technologies

Using Bidirectional Transformer-CRF for Spoken Language Understanding.

Transformer with Bidirectional Decoder for Speech Recognition

Unified BERT for Few-shot Natural Language Understanding

Non-autoregressive Transformer-based End-to-end ASR using BERT

Overview of the Transformer-based Models for NLP Tasks

Cross-Domain Sentiment Classification With Bidirectional Contextualized Transformer Language Models

Bidirectional Transformer Reranker for Grammatical Error Correction

Enhancing Chinese-Braille Translation: A Two-Part Approach with Token Prediction and Segmentation Labeling

AC-BLSTM: Asymmetric Convolutional Bidirectional LSTM Networks for Text Classification

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Predictive Attention Transformer: Improving Transformer with Attention Map Prediction

Syntax-informed Question Answering with Heterogeneous Graph Transformer

XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding

Block Skim Transformer for Efficient Question Answering