Panagiotis Kritikakos, EPCC, The University of Edinburgh.
The use of multi-core processors opens new horizons in the deployment and the use of computer systems as well as in the design and development of software. Moreover, the modern multi-core processors provide hardware virtualisation support by default. This gives great advantage to virtualisation platforms as virtual machines can gain increased performance and can be deployed in sectors where, traditionally, physical machines were deployed. Virtualisation uptake has been rapid on standard server applications, but has not been yet exploited in Grid and High Performance applications. We identify a number of appliances where virtualisation can be used to enhance a Grid environment, either in academia or industry:
Reliability / Availability / Fault tolerance
The cost of failure in the Grid and HPC domain applications is significant. Restarting the whole machine is not the most effective way to fix the problem. Virtualisation offers extra reliability as we can divide single systems into multiple ones for different groups of users. Having applications that need low-level access for privileged operations can lead to compromised integrity of the system. Application-induced failure of the operating system will affect only the virtual machine the application is running on. In addition, virtualisation can offer improved performance, as intelligent, pro-active, fault-tolerant solutions enable a virtual machine to migrate automatically from an ‘unhealthy‘ machine to a ‘healthy‘ one. Fault tolerance and live migration increase the probability of completion for applications with long run-times.
Portability
Virtual machines can be set up to test new systems when upgrades are needed. They can be used to exploit new hardware resources by different operating systems and different user groups. A specific virtual machine can easily be ported to another physical host which can even be a desktop or laptop.
Productivity and development
Virtual machines can be used to set up a lab grid environment for testing purposes. The operating system can be configured in a way to meet the requirements of the specific application. Any tools required can be installed without affecting other applications and users who work on different virtual machines. Geoffroy Vallee terms this “adapting systems to applications and not applications to systems”. For instance, a virtual cluster (i.e. networked virtual machines within a single physical host) could demonstrate the scaling capabilities of Grid applications before being deployed in large scale.
Virtualisation middle-ware supports all major operating systems: Linux, Solaris, BSD variants and Windows. Applications that are developed on one platform do not necessarily need to be ported to another. A virtual machine with the requested operating system can be configured to host the application. On the other hand, multiple virtual machines with different operating systems and different versions of operating systems and system libraries can be made available to developers to experiment, test and debug their code.
Management
Virtual machines and virtual clusters offer ease of management. A virtual machine template can be easily cloned to as many virtual machines as desired. Management tools provide central management of the virtual machines even if they do not run on the same physical host. Monitoring tools of virtual machine inventories help to identify faulty systems and restart or re-configure when and as needed. A crash of the operating system does not need physical presence of the administrator to reboot the machine; it can be done remotely through the management tool.
In summary, virtualisation offers significant user and provider benefits. The combination of wide virtualisation support in modern commodity hardware (Intel VT and AMD-V) shows great promise for virtualisation to become one of the default ICT infrastructure technologies of the future.