A Grid Architecture for Distributed Data

This article investigates how a Grid architecture can be used to provide a fault-tolerant, scalable framework for distributed systems. We look at the specific example of FilmGrid and how its architecture can be expanded into a generic and secure Data Grid.

The FilmGrid software supports film post-production by providing a framework for partners to collaborate and share the huge amounts of data produced in a typical film production. FilmGrid uses a radically different architecture to other Digital Asset Management (DAM) systems, choosing a distributed Grid architecture over a centralised solution.

FilmGrid Solution

The figure below shows an example FilmGrid installation. There are multiple users, each of whom connect to the FilmGrid server within their own organization. The servers handle user's search and data access requests. A registry maintains a list of all the servers within the network, allowing new sites to be dynamically added. Servers cache the list internally, so if the registry is unavailable the network can still operate.

Example FilmGrid Deployment

 

The architecture itself follows a Data Grid paradigm: it is essentially a decentralized system that is designed to allow distributed partners to share and transfer large amounts of data. There are two major advantages to using this architecture in FilmGrid: it has no central point of failure and unnecessary data transfers are avoided.

Web services, constructed using the Globus Toolkit, are used to invoke operations on the servers and registry. These provide a clean, lightweight, platform-independent interface to FilmGrid. GridFTP is used to transfer files between servers and clients. GridFTP was chosen for its speed and support for authentication and encryption - important considerations in film post-production.

Generic Solution

If we extrapolate from the FilmGrid architecture to a more generic one, we can see how it can be pertinent to other projects and can readily exploit many IT-tude.com components. The diagram below shows a Data Grid of multiple servers which use security components. The servers may be located at separate sites, enforce site-specific security policies and contain data (which may or may not be homogeneous across the sites) that needs to be used in accessed and aggregated by the client (the interface for the end user).

Generic Data Grid

 

The following sections show how IT-tude.com components can be used to deliver this architecture.

Security

Security is of particular importance in Data Grids. It is vital that no unauthorised access can occur, especially when dealing with confidential financial or film industry data. It is also essential that data isn't deleted or modified by unauthorised parties.

There are several IT-tude.com components related to security that are of relevance to Data Grids:

Together, the PDP, PEP and STS form a complete authentication and authorization system. The above diagram shows how the PDP and PEP can be integrated into a Data Grid.

A typical interaction involving the PDP and PEP in our Data Grid architecture may be:

  1. The client requests some data from the server.
  2. The PEP intercepts the client request and asks the PDP to decide whether the request should be authorised.
  3. The PDP returns its decision to the PEP who then forwards the request to the server, assuming it has been authorised.
  4. The server then deals with the request and returns the data to the client. If the server requires data held by other servers in the network, it will send requests to them which will also be subject to authorisation by a PEP.

The STS can be integrated into this framework to issue and validate security tokens for clients and servers, using the WS-Trust protocol. The STS can greatly simplify the administration, authentication and authorization of users between multiple organizations.

The PDP, PEP and STS use the XACML standard which is a platform and application independent standard for access control. This means that other applications can use the same security framework; if the users of the Data Grid also need access to another distributed or remote service they can re-use the existing security infrastructure. This saves effort as only one security infrastructure needs to be developed and maintained and reduces the scope for potential security holes.

Data Management

A Data Grid commonly has requirements such as: Fast Transfer of Large Files, Accessing Data from Different Locations, Accessing Heterogeneous Data, Replication of Data for Speed and Robustness and Federate a Number of Data Sources.

To address these requirements, the Access to a Remote Data Source and Treat Multiple Data Sources as One capabilities are often implemented. The Treat Multiple Data Sources as One capability means that end users only see a single unified source of data, although there are multiple distributed data resources behind the scenes. In our generic Data Grid this means that the user is likely to be unaware of how many servers exist in the architecture and where they are, although their requests may involve any or all of them. Data Grid's can meet the Fast Transfer of Large Files requirement through the use of GridFTP, a high performance file transfer protocol and framework.

OGSA-DAI can help with accessing data from different locations, accessing heterogeneous data and federating data sources. A possible use of OGSA-DAI within a Data Grid is shown in the diagram below.

Using OGSA-DAI to Federate Heterogeneous Resources

 

In this example, the OGSA-DAI server provides a single, consistent, access layer to resources in a variety of formats (in this case an XML database, a relational database and some text files). The user does not need to be aware of the location of the underlying data resources, and, depending on how OGSA-DAI is deployed, the user may be unaware of the format used to store the data within the resources.

The OGSA-DAI Trigger component can be used to synchronize other data repositories with the Data Grid. For example, imagine if FilmGrid needed to integrate with a third party application which stores new footage in its own database as it is digitised. By adding a trigger to the database, we could execute an OGSA-DAI workflow which automatically copies the footage into FilmGrid along with any metadata, every time new footage is added.

For any Data Grid which makes use of OGSA-DAI, the OGSA-DAI Data Publisher component can be used to greatly simplify installation and configuration of OGSA-DAI to expose data resources in the Data Grid.

For further information on data management please refer to the article Data Management in Grid Computing.

Other Components

SLA components can be used control the service levels provided to various organizations within the Data Grid. For example, companies may want guarantees on the availability of data and bandwidth at certain times (such as immediately before a reporting period).

A portal interface to could be provided for users to access the Data Grid without requiring them to install bespoke client software. The Data Grid would then be available wherever users have access to a web browser and the network.

Conclusion

This article started by looking at the distinctive architecture employed within the FilmGrid project, which comes from following a Grid paradigm (unusual within the realm of Digital Asset Management). Grid is particularly relevant to FilmGrid, due to the distributed nature of film development and the large amounts of data involved.

We then presented a generic Data Grid architecture applicable to many situations. In particular, projects may want to consider this or a similar architecture if:

  • they have multiple peers which need to collaborate or share data; and
  • they wish to avoid the risk inherent in choosing a centralized architecture.

This article has also shown how several IT-tude.com components and other information is of extensive use and importance to Data Grids, especially with regard to data management and security.

Further Reading

  • The IT-tude.com article on SLA
  • The IT-tude.com article on Portals