引用本文: | 邱锡鹏,缪有栋,黄萱菁.基于主动学习的中文问题分类数据集构建[J].哈尔滨工业大学学报,2012,44(5):125.DOI:10.11918/j.issn.0367-6234.2012.05.025 |
| QIU Xi-peng,MIAO You-dong,HUANG Xuan-jing.Constructing Chinese question classification dataset with active learning[J].Journal of Harbin Institute of Technology,2012,44(5):125.DOI:10.11918/j.issn.0367-6234.2012.05.025 |
|
摘要: |
为解决在开放领域问题回答问题中语料规模较小、难以满足问题分类训练需要的问题,用主动学习
方法来构建中文问题分类数据集,根据主动学习的方法进行中文问题类别标注,并且通过主动式特征选择方
法来提升性能.实验结果表明:在使用主动学习方法时可以快速收敛到最佳准确率(85%),在使用人工标注
特征下特征集明显的减小.基于主动学习的标注方法在需要较小人工标注同时取得很好的分类性能,并且在
一定程度上还可以明显提高问题分类的准确率 |
关键词: 主动学习 Passive Aggressive算法 特征选择 中文问题分类 |
DOI:10.11918/j.issn.0367-6234.2012.05.025 |
分类号:TP391 |
基金项目:国家自然科学基金资助项目( 61003091,61073069) |
|
Constructing Chinese question classification dataset with active learning |
QIU Xi-peng,MIAO You-dong,HUANG Xuan-jing
|
Abstract: |
The current corpora of question classification are relatively small and difficult to meet the practical
needs of Question Answering system, so that we use active learning methods to construct a Chinese question
classification dataset and for question labeling. In addition, we improve the performance of labeling with fea-
ture selection. Experimental results show that by using active learning we can quickly converge at the best ac-
curacy (85% ) and by using manual tagging we can have small feature set size. The active learning-based la-
beling method achieved very good classification performance with less manual annotation tagging, which can
significantly improve the accuracy of classification to some degree |
Key words: active learning passive aggressive feature selection Chinese question classification |