AMATA: Software Architecture and Implementation of High Availability Support for Beowulf Cluster

Jullawadee Maneesilp, Putchong Uthayopas
Parallel Research Group, CONSYL
Department of Computer Engineering, Faculty of Engineering,
Kasetsart University, Bangkok, Thailand
Phone: (662) 942-8555 Ext. 1416
Email: {g4265082, pu}@ku.ac.th


ABSTRACT -- High-availability support for Beowulf cluster becomes a critical factor in the acceptance of this platform for mission critical use in enterprise environment. A well defined and extensible HA software architecture is needed. This paper presents the proposed high-availability software architecture called AMATA. AMATA architecture clearly defines the software component and interaction for High Availability support. Many cases such as system software failure, registered user software error, hardware mal-function, and hardware overload can be detected and handle in a systematic way. Both discovery and recovery process can be added to provide an intelligent and automatic fault recovery process. Currently, a prototype implementation has been developed and the results obtained from that implementation have also been included.

KEYWORDS -- High availability, Beowulf clusters, Fault Tolerance


National Electronics and Computer Technology Center (NECTEC)
Copyright  © 2001 By Information System Service Section. All right reserved.