WU Jin,LIU Jiaxi,DONG Jian,ZUO Decheng,ZHAO Yao.An accrual failure detector in cloud computing[J].Journal of Harbin Institute of Technology,2019,51(11):16.DOI:10.11918/j.issn.0367-6234.201903149
(1.容错与移动计算中心(哈尔滨工业大学),哈尔滨 150001; 2.江苏省宽带无线通信和物联网重点实验室(南京邮电大学),南京 210003)
为更好地解决云计算中网络环境的动态性对失效检测性能的影响,提出一种适用于云计算环境的双滑动窗口accrual失效检测器(Two Windows Accrual Failure Detector, 2WA-FD).首先,对不同网络环境下心跳消息到达时间间隔的概率分布进行实验分析,发现威布尔分布是一种更加合理的描述心跳消息到达时间间隔的概率分布模型,以此概率分布模型作为计算accrual失效检测器怀疑值的依据能够获得更高的精确度;其次,通过对accrual失效检测器实现架构的分析,通过采用双滑动窗口处理机制处理心跳消息能够更好地应对网络环境的突然变化;最后,通过开源实验数据以及租用云服务器搭建的实验平台比较和分析5种不同的accrual失效检测器实现的性能.实验结果表明,该失效检测器较其他同类型失效检测器在同样的检测负载情况下,能够获得更好的检测速度与检测准确性.因此,本文所提出的基于威布尔分布的双滑动窗口的accrual失效检测器2WA-FD能够准确、快速地发现云计算中的节点失效,有效地降低网络环境动态性对失效检测性能的影响.
关键词:  云计算  accrual失效检测器  服务质量  威布尔分布
An accrual failure detector in cloud computing
WU Jin1,LIU Jiaxi2,DONG Jian1,ZUO Decheng1,ZHAO Yao1
(1.Fault-tolerant and Mobile Computing Research Center (Harbin Institute of Technology), Harbin 150001, China; 2.Jiangsu Key Laboratory for Broadband Wireless Communication and Internet of Things(Nanjing University of Posts and Telecommunications), Nanjing 210003, China)
In order to better solve the problem that the performance of failure detection is effected by dynamic of network environment in cloud computing, a new adaptive accrual failure detector (Two Windows Accrual Failure Detector, 2WA-FD) was proposed. First, two groups of actual data from two network conditions were analyzed, and we found that the Weibull distribution is a more reasonable distribution assumption for heartbeat inter-arrival time. According to the Weibull distribution, the suspicion level of accrual failure detector is more accurate. Second, the framework of accrual failure detector was analyzed and improved, and the suspicion level was calculated by two sliding windows. This framework is fit for dealing with the dynamic of network conditions. Finally, the 2WA-FD and other failure detectors were tested on open source experimental data and our experimental platform. The experimental results show that the 2WA-FD has better performance in terms of low detection time and high detection accuracy with the same detection overhead. Thus, the 2WA-FD can accurately and quickly find out the node failures in cloud computing, and effectively reduce the influence of dynamic on the performance of failure detection.
Key words:  cloud computing  accrual failure detector  quality of service  Weibull distribution