Abstract
Data distribution summary has been commonly used in databases to support query optimization, and histograms are of particular interest. A significant issue in histogram estimation is the large amount of data transmission. This paper presents a distributed and parallel construction method for equi-width histogram in cloud database (called DPHCD). Unlike previous methods, the DPHCD does not require the transfer of any table detail during histogram construction. Only small information about buckets and a few necessary data need to be transmitted over the network. The data transmission of DPHCD is unrelated with table size. DPHCD divides the histogram task into small tasks that could be simultaneously executed in a distributed cluster. It uses an innovative tablet-level sampling method to reduce the computing overhead in each cluster node. DPHCD is implemented in the Xugu cloud database management system. Experimental results demonstrate that DPHCD can achieve small data transmission and speed up histogram construction.
Get full access to this article
View all access options for this article.
