Decoupled Rationalization with Asymmetric Learning Rates: A Flexible Lipschitz Restraint

Wei Liu,Jun Wang,Haozhao Wang,Ruixuan Li,Yang Qiu,YuanKai Zhang,Jie Han,Yixiong Zou
DOI: https://doi.org/10.1145/3580305.3599299
2023-06-24
Abstract:A self-explaining rationalization model is generally constructed by a cooperative game where a generator selects the most human-intelligible pieces from the input text as rationales, followed by a predictor that makes predictions based on the selected rationales. However, such a cooperative game may incur the degeneration problem where the predictor overfits to the uninformative pieces generated by a not yet well-trained generator and in turn, leads the generator to converge to a sub-optimal model that tends to select senseless pieces. In this paper, we theoretically bridge degeneration with the predictor's Lipschitz continuity. Then, we empirically propose a simple but effective method named DR, which can naturally and flexibly restrain the Lipschitz constant of the predictor, to address the problem of degeneration. The main idea of DR is to decouple the generator and predictor to allocate them with asymmetric learning rates. A series of experiments conducted on two widely used benchmarks have verified the effectiveness of the proposed method. Codes: \href{<a class="link-external link-https" href="https://github.com/jugechengzi/Rationalization-DR" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/jugechengzi/Rationalization-DR" rel="external noopener nofollow">this https URL</a>}.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper mainly attempts to solve the coordination problem between the generator and the predictor in the rationalization framework, especially the degeneration problem. Specifically: 1. **The degeneration problem**: - In the standard rationalization framework RNP (Rationale Neural Pipeline), the generator selects the parts of the input text that can best explain human intentions as rationales, and then the predictor makes predictions based on these rationales. - However, this cooperative game may lead to the degeneration problem: that is, the predictor may over - fit to uninformative fragments generated by the not - yet - well - trained generator, causing the generator to converge to a sub - optimal model and tend to select meaningless fragments as rationales. 2. **The relationship between Lipschitz continuity and degeneration**: - Through theoretical analysis, the paper links the degeneration problem to the Lipschitz continuity of the predictor. The author finds that a smaller Lipschitz constant can make the predictor more robust and less affected by uninformative candidate fragments, thereby reducing the occurrence of degeneration. 3. **The proposed method - Decoupled Rationalization (DR)**: - To solve the degeneration problem, the author proposes a simple but effective method called Decoupled Rationalization (DR). This method decouples the two by giving the generator and the predictor asymmetric learning rates. - Specifically, DR makes the learning rate of the predictor lower than that of the generator, thereby flexibly limiting the Lipschitz constant of the predictor with respect to the selected rationales without manually selecting a truncation value. 4. **Experimental verification**: - The author conducted experiments on two widely - used benchmark datasets, and the results show that DR significantly improves the performance of the standard rationalization framework RNP and outperforms several recently - released state - of - the - art methods without changing its structure. ### Summary This paper aims to solve the degeneration problem in the rationalization framework. By introducing the concept of Lipschitz continuity, it proposes a new method - Decoupled Rationalization (DR), and proves the effectiveness of this method through experiments.