Abstract:Gaussian processes (GPs) are distributions of functions, and are commonly used for regression in machine learning community. For n training samples, their time complexity for training and prediction is O(n3) and O(n2) respectively. The high computational cost hinders their application to large datasets. Inspired by the divide and conquer strategy, the paper proposed a simple but efficient approximate model, called Overlapped Local Gaussian Process (OLGP). The method assumes that given the nearest neighbors, the random variables are independent of farther ones. The training samples are recursively divided and a ternary tree is constructed in the end. Sibling nodes have intersections where the samples play the role of inducing points to model the dependency between neighboring regions. Each leaf node is associated with a local GP regression model. The evidence and predictive distribution in parent nodes are composed of the ones from their sons, which reduces the computational cost significantly. Furthermore, the fitted function is guaranteed to be continuous. A theoretical analysis shows that the time complexity for training and prediction reduces to O(nt) for n training samples, where t depends on the proportion of the intersection in each level, and is usually between 1 and 2. The paper demonstrated the effectiveness of the method by speed-accuracy performance on several real-world datasets.