Poster
Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models
Huan Zhang · hai zhao
Great Hall BC #83
Keywords: [ sequence to sequence ] [ training criteria ]
Sequence to sequence (seq2seq) models have become a popular framework for neural sequence prediction. While traditional seq2seq models are trained by Maximum Likelihood Estimation (MLE), much recent work has made various attempts to optimize evaluation scores directly to solve the mismatch between training and evaluation, since model predictions are usually evaluated by a task specific evaluation metric like BLEU or ROUGE scores instead of perplexity. This paper puts this existing work into two categories, a) minimum divergence, and b) maximum margin. We introduce a new training criterion based on the analysis of existing work, and empirically compare models in the two categories. Our experimental results show that our new training criterion can usually work better than existing methods, on both the tasks of machine translation and sentence summarization.
Live content is unavailable. Log in and register to view live content