Theses Doctoral

Towards Better Language Models: Algorithms, Architectures, and Applications

Wu, Qingyang

This thesis explores the advancement of language models by focusing on three important perspectives: Algorithms, Architectures, and Applications. We aim to improve the performance, efficiency, and practical usage of these language models. Specifically, we studied reinforcement learning for language models, recurrent memory-augmented transformers, and practical applications in text generation and dialogue systems.

Firstly, we address the limitations of the traditional training algorithm, maximum likelihood estimation (MLE). We propose TextGAIL, a generative adversarial imitation learning framework that combines large pre-trained language models with adversarial training to improve the quality and diversity of generated text. We further explore a modern reinforcement learning from human feedback (RLHF) pipeline to more effectively align language model outputs with human preferences.

Next, we investigate architecture improvements with Recurrent Memory-Augmented Transformers. In this direction, we first introduce Memformer, an autoregressive model that utilizes an external dynamic memory for efficient long-sequence processing. We build upon Memformer and propose MemBART, a stateful memory-augmented Transformer encoder-decoder model. Recurrent Memory-Augmented Transformers demonstrate superior performance and efficiency in handling long contexts compared to traditional Transformer architectures.

Finally, we make several contributions to effectively applying language models to dialogue systems in practice. We design task-oriented dialogue systems that leverage pre-trained language models to significantly reduce the need for human annotations. We also introduce DiactTOD, a novel approach to improving the out-of-distribution generalization ability of dialogue act-controlled generation in task-oriented systems. In this thesis, we also make progress by expanding the scope of traditional task-oriented dialogue systems by proposing a novel paradigm that utilizes external knowledge tools to provide more accurate knowledge. Our penultimate application tackles the data-scarcity problem common in many real-world dialogue systems. We propose an automatic data augmentation technique to improve training efficacy. Lastly, we make progress on end-user experiences by presenting FaceChat, a multimodal dialogue framework enabling emotionally-sensitive, face-to-face interactions, demonstrating the potential of multimodal language models in various applications.

Our work highlights the significance of building better language models, demonstrating how these improvements can positively impact a wide range of downstream tasks and applications. Our work makes a meaningful contribution to language model research, providing valuable insights and methodologies for developing more powerful and efficient models.

Files

  • thumnail for Wu_columbia_0054D_18787.pdf Wu_columbia_0054D_18787.pdf application/pdf 3.13 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Yu, Zhou
Degree
Ph.D., Columbia University
Published Here
September 25, 2024