基于分析和生成的复述与SMT语料扩展

doi:10.11918/j.issn.0367-6234.2013.05.009

首页 > 过刊浏览>2013年第45卷第5期 >45-50. DOI:10.11918/j.issn.0367-6234.2013.05.009

基于分析和生成的复述与SMT语料扩展
DOI:
                        10.11918/j.issn.0367-6234.2013.05.009
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:(哈尔滨工业大学 计算机科学与技术学院, 150001 哈尔滨) 
作者简介:和为(1982—),男,博士研究生; 刘挺(1972—),男,教授,博士生导师.
通讯作者:
中图分类号:
基金项目:国家自然科学基金面上资助项目(6,2);国家高技术研究发展计划重大资助项目(2011AA01A207).

Parse-realize based paraphrasing and SMT corpus enriching

Author:

Affiliation:

(School of Computer Science and Technology, Harbin Institute of Technology, 150001 Harbin, China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为了解决统计机器翻译语料对调序现象覆盖不足的问题,采用复述方法对语料进行扩展.提出了一种基于依存分析和句子生成的复述方法．对句子进行依存分析得到依存树,然后从依存树生成多个自然语言句子．生成的句子与原句相比没有词汇上的改变,但可以在词序方面进行变换．实验表明方法在不引入额外资源的前提下,有效缓解了语料覆盖不足的问题,提高了机器翻译质量．

Abstract:

To resolve the low-coverage problem of the statistic machine translation training corpus, a dependency parsing and sentence realization based paraphrasing method is proposed. The input sentence is first parsed into a dependency tree, and then the tree is realized into multiple natural language sentences. Although the generated sentences have the same lexical words, the expressions of word orders are re-arranged. The experiments shows that the paraphrasing method can be used to enlarge the bilingual corpus for statistic machine translation and the method efficiently relieves the low-coverage problem of training corpora without any extra resources, finally the translation quality is improved. 

参考文献

相似文献

引证文献

引用本文

和为,刘挺.基于分析和生成的复述与SMT语料扩展[J].哈尔滨工业大学学报,2013,45(5):45. DOI:10.11918/j. issn.0367-6234.2013.05.009

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2013-05-30
出版日期:

出版声明

期刊订阅

引用本文

相关视频

分享

文章指标

历史

文章二维码