In-Person Poster presentation / top 25% paper

Hidden Markov Transformer for Simultaneous Machine Translation

Shaolei Zhang · Yang Feng

MH1-2-3-4 #47

Keywords: [ Applications ] [ transformer ] [ machine translation ] [ natural language processing ] [ Simultaneous machine translation ]

[ Abstract ]
[ OpenReview
Mon 1 May 7:30 a.m. PDT — 9:30 a.m. PDT
Oral presentation: Oral 2 Track 6: Applications & Social Aspects of Machine Learning
Mon 1 May 6 a.m. PDT — 7:30 a.m. PDT


Simultaneous machine translation (SiMT) outputs the target sequence while receiving the source sequence, and hence learning when to start translating each target token is the core challenge for SiMT task. However, it is non-trivial to learn the optimal moment among many possible moments of starting translating, as the moments of starting translating always hide inside the model and can only be supervised with the observed target sequence. In this paper, we propose a Hidden Markov Transformer (HMT), which treats the moments of starting translating as hidden events and the target sequence as the corresponding observed events, thereby organizing them as a hidden Markov model. HMT explicitly models multiple moments of starting translating as the candidate hidden events, and then selects one to generate the target token. During training, by maximizing the marginal likelihood of the target sequence over multiple moments of starting translating, HMT learns to start translating at the moments that target tokens can be generated more accurately. Experiments on multiple SiMT benchmarks show that HMT outperforms strong baselines and achieves state-of-the-art performance.

Chat is not available.