引用本文: | 和为,刘挺.基于分析和生成的复述与SMT语料扩展[J].哈尔滨工业大学学报,2013,45(5):45.DOI:10.11918/j.issn.0367-6234.2013.05.009 |
| HE Wei,LIU Ting
.Parse-realize based paraphrasing and SMT corpus enriching[J].Journal of Harbin Institute of Technology,2013,45(5):45.DOI:10.11918/j.issn.0367-6234.2013.05.009 |
|
摘要: |
为了解决统计机器翻译语料对调序现象覆盖不足的问题,采用复述方法对语料进行扩展.提出了一种基于依存分析和句子生成的复述方法.对句子进行依存分析得到依存树,然后从依存树生成多个自然语言句子.生成的句子与原句相比没有词汇上的改变,但可以在词序方面进行变换.实验表明方法在不引入额外资源的前提下,有效缓解了语料覆盖不足的问题,提高了机器翻译质量. |
关键词: 复述 统计机器翻译 依存分析 句子生成 |
DOI:10.11918/j.issn.0367-6234.2013.05.009 |
分类号: |
基金项目:国家自然科学基金面上资助项目(6,2);国家高技术研究发展计划重大资助项目(2011AA01A207). |
|
Parse-realize based paraphrasing and SMT corpus enriching |
HE Wei, LIU Ting
|
(School of Computer Science and Technology, Harbin Institute of Technology, 150001 Harbin, China)
|
Abstract: |
To resolve the low-coverage problem of the statistic machine translation training corpus, a dependency parsing and sentence realization based paraphrasing method is proposed. The input sentence is first parsed into a dependency tree, and then the tree is realized into multiple natural language sentences. Although the generated sentences have the same lexical words, the expressions of word orders are re-arranged. The experiments shows that the paraphrasing method can be used to enlarge the bilingual corpus for statistic machine translation and the method efficiently relieves the low-coverage problem of training corpora without any extra resources, finally the translation quality is improved. |
Key words: paraphrase statistic machine translation dependency parsing sentence realization
|