Special Guest Lectures | |
Migol: A Fault Tolerant Grid Service Framework for Computational Applications in the Grid | |
Andre Luckow, University of Potsdam, Germany | |
Johnston Hall 338 February 07, 2008 - 03:00 pm |
|
Abstract: A major challenge in a distributed, inherently dynamic Grid is fault tolerance. The more resources and components involved, the more complicated and error-prone becomes the system. In a Grid with potentially thousands of machines connected to each other the reliability of individual resources cannot be guaranteed. This talk discusses how the fault tolerance of long-running Grid applications can be ensured. Migol is a Grid middleware, which supports the fault tolerance of Grid applications. A key feature of Migol is the ability to transparently migrate parallel applications in the Grid. Migol comprises of different services for resource allocation, selection, and application and resource monitoring. The framework is based on open standards and is built on top of the Globus Toolkit 4. In addition, this talk will discuss methods to ensure the fault tolerance of critical infrastructure services. For example, Migol replicates critical services, such as the central information service and the monitoring services, using a ring-based replication protocol to achieve data consistency. |
|
Speaker's Bio: Andre Luckow is pursuing a Ph.D. in Computer Science at the University of Potsdam (Germany). He received a Bachelor degree in Information Systems and Management from the University of Applied Sciences Munich (Germany) in 2003 and a Master degree in Computer Science from the University of Potsdam in 2005. His research interests are in Grid and cluster computing with focus on fault tolerance. |
|