引用本文: | 钱立兵,季振洲,吴昊.一种改进的分布式搜索引擎模型[J].哈尔滨工业大学学报,2014,46(7):8.DOI:10.11918/j.issn.0367-6234.2014.07.002 |
| QIAN Libing,JI Zhenzhou,WU Hao.An improved model of distributed search engine[J].Journal of Harbin Institute of Technology,2014,46(7):8.DOI:10.11918/j.issn.0367-6234.2014.07.002 |
|
摘要: |
为了解决传统分布式搜索引擎存在的搜索性能问题,从索引结构、查询算法方面改进了传统模型.提出了一种非集中的高并行化搜索模型,该模型按照文档主题对索引分类,对较长的倒排记录表采用位图结构,利用多线程技术对索引节点实现并行搜索算法(multi max score heap,MMSH).实验结果表明:改进模型中的索引分类方法与倒排表结构的位图策略,能够增强Merge层查询的针对性,降低Merge层节点的CPU和内存开销;在倒排表不能完全存入内存情况下,MMSH算法能够实现高度并行化查询,其查询效率高于经典的term-at-a-time算法,缩短了平均查找时间,提高了系统吞吐量.索引分类、位图结构以及并行查询算法能够避免查询的盲目性,改善了分布式搜索引擎的性能. |
关键词: 分布式引擎 索引分类 倒排结构 并行搜索 |
DOI:10.11918/j.issn.0367-6234.2014.07.002 |
分类号:TP393 |
基金项目:国家自然科学基金资助项目(61173024);广东省部产学研结合基金资助项目(2011A090200037). |
|
An improved model of distributed search engine |
QIAN Libing, JI Zhenzhou, WU Hao
|
(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
|
Abstract: |
To solve the problem of search performance in traditional distributed search engine, a non-centralized high parallelization search model was proposed and the traditional model was improved in the index structure and search algorithm. In the model, the index was classified according to document theme, bitmap structure was employed for longer inverted record list, and parallel search algorithm (multi max score heap, MMSH) was achieved in index node by using multi-threading technology. Experimental results show that the improved search model with index classification and bitmap strategy of the inverted list structure can enhance the search pertinence in Merge layer, reduce CPU and memory cost. In the case that the inverted list can not be completely stored in memory, MMSH algorithm can implement highly parallel search and its query efficiency is higher than the classical term-at-a-time algorithm, which shortens the average search time and improves the system throughput. Index classification, bitmap structure and parallel query algorithm can avoid query blindness and improve the performance of distributed search engines. |
Key words: distributed indexing index classification inverted structure parallel search |