Author Name | Affiliation | PEI Bing-zhen | Dept.of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China, peibzgz@163.com College of Computer Science and Technology, Guizhou University, Guiyang 550025, China | CHEN Xiao-rong | College of Computer Science and Technology, Guizhou University, Guiyang 550025, China | HU Yi | Dept.of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China, peibzgz@163.com | LU Ru-zhan | Dept.of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China, peibzgz@163.com |
|
Abstract: |
This article proposes a new general, highly efficient algorithm for extracting domain terminologies. This domain-independent algorithm with multi-layers of filters is a hybrid of statistic-oriented and rule-oriented methods. Utilizing the features of domain terminologies and the characteristics that are unique to Chinese, this algorithm extracts domain terminologies by generating multi-word unit (MWU) candidates at first and then filtering the candidates through multi-strategies. Our test results show that this algorithm is feasible and effective. |
Key words: domain terminology multi-word unit (MWU) automatic extract filter |
DOI:10.11916/j.issn.1005-9113.2009.02.029 |
Clc Number:TP391 |
Fund: |