引用本文: | 孙大烈,李建中.基于MapReduce的Skyline-join查询算法[J].哈尔滨工业大学学报,2012,44(1):103.DOI:10.11918/j.issn.0367-6234.2012.01.020 |
| SUN Da-lie,LI Jian-zhong.MapReduce-based Skyline-join processing[J].Journal of Harbin Institute of Technology,2012,44(1):103.DOI:10.11918/j.issn.0367-6234.2012.01.020 |
|
摘要: |
Skyline查询是一种非常耗时的操作,而涉及多个表的Skyline查询(Skyline-join查询)则会给数据库系统带来更多的负载,从而影响整个系统的响应时间.为解决这个问题,提出了基于Google设计的MapRe-duce并行处理框架的Skyline-join查询处理算法,采用分片剪枝的方法降低复杂度,进而提高查询性能.在Amazon的云计算平台(EC2)上进行的实验表明,该算法可以有效减少冗余操作和网络数据传输,基本不受节点个数以及数据量的影响,具有很好的可扩展性. |
关键词: Skyline查询 MapReduce 分布式算法 云计算 |
DOI:10.11918/j.issn.0367-6234.2012.01.020 |
分类号:TP311.13 |
基金项目:国家自然科学基金资助项目(61033015) |
|
MapReduce-based Skyline-join processing |
SUN Da-lie, LI Jian-zhong
|
School of Computer Science and Technology,Harbin Institute of Technology,150001 Harbin,China
|
Abstract: |
Skyline query is one of the most expensive operators in the database system.Some Skyline queries involving multiple tables,which are called Skyline-join queries,are even more costly to evaluate.Therefore,in this paper,we adopt Google’s MapReduce,a parallel processing framework,to handle Skyline-join queries.A novel parallel algorithm is proposed to prune the dataset progressively and hence the network transfer cost is reduced.The algorithm is evaluated on Amazon’s EC2 and the experiments verify its efficiency. |
Key words: Skyline query MapReduce distributed algorithm cloud computing |