View publication

Voice assistants increasingly use on-device Automatic Speech Recognition (ASR) to ensure speed and privacy. However, due to resource constraints on the device, queries pertaining to complex information domains often require further processing by a search engine. For such applications, we propose a novel Transformer based model capable of rescoring and rewriting, by exploring full context of the N-best hypotheses in parallel. We also propose a new discriminative sequence training objective that can work well for both rescore and rewrite tasks. We show that our Rescore+Rewrite model outperforms the Rescore-only baseline, and achieves up to an average 8.6% relative Word Error Rate (WER) reduction over the ASR system by itself.

Related readings and updates.

Accurate Knowledge Distillation via N-best Reranking

We propose utilizing n-best reranking to enhance Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we extract pseudo-labels for student model’s training data from top n-best hypotheses and leverage a diverse set of models with different inductive biases, objective functions or architectures, including some publicly-available large language models, to pick the highest-quality hypotheses as labels. The effectiveness of our proposal…
See paper details

Leveraging Large Language Models for Exploiting ASR Uncertainty

With the help of creative prompt engineering and in-context learning, large language models (LLMs) are known to generalize well on a variety of text-based natural language processing (NLP) tasks. However, for performing well on spoken language understanding (SLU) tasks, LLMs either need to be equipped with in-built speech modality or they need to rely on speech-to-text conversion from an off-the-shelf automation speech recognition (ASR) system…
See paper details