Abstract
Today’s large scale distributed platforms comprise thousands of resources from production, educational, and ad hoc environments including Clouds, Grids, P2P, etc. However, finding suitable resources from such a large pool to store large amounts of data and run multi-resource, long-running data processing applications (usually with few or no fault tolerance capabilities) is restricted by the dynamic availability of distributed resources. In addition to resource failures, the resources may be unavailable due to their owners’ policies for sharing their resources as well as the nature of domain they belong to (e.g. P2P systems, non-dedicated desktop Grids etc.). As a result, the availability-aware selection of distributed resources has become a challenging problem for data management, resource provisioning and job scheduling services. To this end, we present a novel resource availability characterization and prediction method for dynamic heterogeneous distributed environments. We identified 14 availability attributes that can be effectively used to model resource availability in dynamic distributed environments. Three data mining methods (particularly the neural network) are proposed to model and predict resource availability using our identified availability attributes. The availability of a resource is predicted for an instant of time as well as for a time duration. Our experiments for 28 different resources in Austrian Grid show that the predictions through the proposed approach are 18% and 31% (on average) more accurate than those by so far the best method (Naive Bayes’ Classifier) for
Keywords
Get full access to this article
View all access options for this article.
