Kerestey, P. (2010):
Automated Fault Recovery Planning in Cloud Computing
This work investigates the applicability of the automated planning approaches to fault management in cloud computing implementations on the infrastructure as a service level. A decision support solution for the fault management in cloud computing is examined to identify the possibility of the automation of fault recovery in large scale cloud computing deployments.
Cloud computing is a fairly new topic with increased industrial interest. Cloud computing services are popular due to their flexible resource allocation and optimal economic usage. This allows to avoid under- and over-utilization of the computing resources and makes planning and management less cost-intensive task. At present, no good cloud computing management solution for fault recovery exists, which makes cloud computing services unattractive to many potential users. As mistakes do happen in every system it must be possible for a cloud service provider to guarantee that the terms of provisioning will not be breached even when faults happen. This can be achieved by automating error-prone and time-consuming tasks. Therefore the aim of the fault recovery solution examined in this work is the time minimization of complete service recovery.
To diminish the problem, an automated planning approach in the field of artificial intelligence is chosen as a solution. In addition, this work is based on operation research studies. The aim is to create a prototype of a decision support solution, which will help to lessen the complexity of fault recovery and also the expenses for the whole fault management. A system and its services should recover from different kinds of faults using fast and a systematic composition of recovery plans. A scenario will be created to prove the usefulness of the solution. The aim is a machine aided improvement of IT service availability.
This work explores existing approaches of automated planning. It targets the analysis of the applicability of automated planning approaches for the fault management in cloud computing. An automated planning algorithm is examined and a prototype is implemented for a scenario to prove that functionality of the planning system is given.