欢迎访问《哈尔滨工业大学学报》编辑部网站！

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家
【12-13】哈工大学报2019优秀审稿专家
【11-23】哈工大学报2018优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	钱立兵,季振洲,吴昊.一种改进的分布式搜索引擎模型[J].哈尔滨工业大学学报,2014,46(7):8.DOI:10.11918/j.issn.0367-6234.2014.07.002
	QIAN Libing,JI Zhenzhou,WU Hao.An improved model of distributed search engine[J].Journal of Harbin Institute of Technology,2014,46(7):8.DOI:10.11918/j.issn.0367-6234.2014.07.002

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 1457次下载 1720次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
一种改进的分布式搜索引擎模型
钱立兵, 季振洲, 吴昊
(哈尔滨工业大学计算机科学与技术学院, 150001 哈尔滨)

摘要:

为了解决传统分布式搜索引擎存在的搜索性能问题,从索引结构、查询算法方面改进了传统模型．提出了一种非集中的高并行化搜索模型,该模型按照文档主题对索引分类,对较长的倒排记录表采用位图结构,利用多线程技术对索引节点实现并行搜索算法(multi max score heap,MMSH)．实验结果表明:改进模型中的索引分类方法与倒排表结构的位图策略,能够增强Merge层查询的针对性,降低Merge层节点的CPU和内存开销;在倒排表不能完全存入内存情况下,MMSH算法能够实现高度并行化查询,其查询效率高于经典的term-at-a-time算法,缩短了平均查找时间,提高了系统吞吐量．索引分类、位图结构以及并行查询算法能够避免查询的盲目性,改善了分布式搜索引擎的性能．

关键词: 分布式引擎索引分类倒排结构并行搜索

DOI：10.11918/j.issn.0367-6234.2014.07.002

分类号:TP393

基金项目:国家自然科学基金资助项目(61173024)；广东省部产学研结合基金资助项目(2011A090200037).

An improved model of distributed search engine

QIAN Libing, JI Zhenzhou, WU Hao

(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)

Abstract:

To solve the problem of search performance in traditional distributed search engine, a non-centralized high parallelization search model was proposed and the traditional model was improved in the index structure and search algorithm. In the model, the index was classified according to document theme, bitmap structure was employed for longer inverted record list, and parallel search algorithm (multi max score heap, MMSH) was achieved in index node by using multi-threading technology. Experimental results show that the improved search model with index classification and bitmap strategy of the inverted list structure can enhance the search pertinence in Merge layer, reduce CPU and memory cost. In the case that the inverted list can not be completely stored in memory, MMSH algorithm can implement highly parallel search and its query efficiency is higher than the classical term-at-a-time algorithm, which shortens the average search time and improves the system throughput. Index classification, bitmap structure and parallel query algorithm can avoid query blindness and improve the performance of distributed search engines.

Key words: distributed indexing index classification inverted structure parallel search

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS