OGSA-DAI Query Translator

In order to effectively use multiple data sources it is often necessary to homogenise the format of these databases to allow data to be queried, updated and joined consistently. One way to do this is to update all the data sources by converting all the data to one common format. In the case of cross-organisation collaborations, this is not always possible. Organisations are understandably keen to maintain control over their own data and they may have existing applications which depend upon their specific data formats. An alternative is to take a generic query and translate it into queries specific to each data source, and to translate the results back from each data source into a common format. The query translator is a component which aims to allow you to perform this action within OGSA-DAI.

High Level Architecture

The component will consist of a layer in front of a number of databases each of which has different schemas. Each database will have an OGSA-DAI server. In front of these OGSA-DAI servers will be an additional OGSA-DAI server which will hold a mapping between an abstract schema and each of the different schemas in the databases. This server will provide a unified view onto the information in the databases.

In order to provide a single view on to multiple databases the front-facing OGSA-DAI will use a stored set of transforms to translate the query into a specific query tailored for each database. The server will then gather the results from each database and build one response which will be returned in a form compatible with the same abstract schema.

The example below demonstrates the system using a variety of name/address schemas. A generic query is submitted and the specific query for each database is generated by applying the stored transforms.

Query Translator Example

Usage Scenario

An example use case for this component would be where a number of different companies wish to offer a unified view onto the services they supply. They have a number of existing data sources, each of which has it’s own schema. In order to produce a unified view of the services they offer, the companies create a central location for users to go to (a portal for example) in order to access information on their services. Since these companies are different entities, they have heterogeneous data sources which each require the data in their specific formats. The companies don't want to spend the time converting the data and their existing applications so they all want to keep their data sources as they are. In order to do this they could make the portal aware of all the different data types. This would make maintenance of that portal very difficult and complex. If the portal could use one query language to talk to the data sources of each company, and have a component that maps a generic query to the specific queries required by each data source, the portal could be made much simpler.

Dependencies

The component is built as extensions to OGSA-DAI and so will require OGSA-DAI v3.0 to operate. This component is also still in the planning stage, so additional details and dependencies may appear.

Interface

The interface of this component will take the form of new activities added to OGSA-DAI. There will be activities that accept query statements which will be converted to the appropriate format for the data source, activities that transform data from each data source into a common format and activities that combine data in the common format. At this point the component is in the planning stage so more details will become available at a later date.

Further Details

  • Related Design Patterns: Query Translator Pattern
  • Related Common Capabilities: Homogenise Data Sources, Treat Multiple Data Sources as One
  • Related Technical Requirements: Accessing Heterogeneous Data
  • Release Date: TBC
  • License: Apache 2
  • Contact: c.thomson(at)epcc.ed.ac.uk
  • Development Status: Beta
  • Programming Language(s): Java 1.4
  • Supported Operating Systems: Platform Independent
  • Supported or Required Middleware: OGSA-DAI 3.0
  • Other Dependencies: Globus Toolkit 4.0.5, Jakarta Tomcat 5.0