Towards an interoperable Scientific Cloud for Europe

February 1st, 2010

By Stephanie Parker

To ensure world-class research, energy efficiencies and competitive edge in the global marketplace, Europe needs to evolve current Distributed Computing Infrastructures (DCIs) by  encompassing new, industrial-quality technologies such as virtualization, service orientation and convergence with the digital world.  While grid infrastructures have captured the requirements of several specific communities, smaller and ad-hoc groups with significant  applications have struggled to get their requirements satisfied with grid technology because the inherent complexity and long deployment times  (with outcomes not always meeting with success). Moreover, industry adoption of grid has not taken off as widely as once expected. By contrast, a business case for cloud computing is increasingly gaining consensus in both the public and private sectors and as several standardisation development organisations focus efforts on interoperable solutions for clouds through strategic alliances in which Europe is playing a pro-active role. Furthermore, a recent Expert Group Report on the Future of Cloud Computing produced with the support of the European Commission DG INFSO recommends that the European open source movement should work strongly with industry to support commercial cloud based service provisioning.

A cloud-based e-Infrastructure for eScience, currently missing from Europe’s service portfolio, would ensure a leap forward in the European Research Area by integrating flexible and easy-to-use utility services, complementing current computing services like grids and supercomputers at the hands of researchers and scientists. Value-add needs to come from new business models in a shift away from costly and complex “run-by-scientists-for-scientists” approaches on the one hand and the use of pay on demand on the other. Sustainable growth needs to be addressed by a deeper understanding of policy and legal issues, ensuring cost-effective investment at EU level and interoperability while also fostering new public-private partnerships in the longer term. A new culture of cloud research, “scientific cloud”, and a spirit of entrepreneurship cannot be achieved without the involvement in R&D initiatives of pioneering enterprises with a commitment to industry quality standards and interoperability working alongside research organisations.

Recent developments led by experts in industry and research would help to gain efficiencies and make savings by optimising resource utilisation, reliability, energy efficiency and maintenance costs, all key objectives highlighted by EU policy bodies. This new approach focuses on the provisioning, operation and user-testing of an industrial quality, virtualised e-Infrastructure in the form of a cloud computing service platform, open for usage by the research and scientific community and tested by major categories of scientific and industrial communities across disciplines and sectors important to Europe. The aims of these new developments are to broaden inter-disciplinary scientific collaboration in Europe, ensure co-ordinated, strengthened and focused software deployments, improve the usability of DCI platforms targeting the largest possible base across a range of fields in science and engineering, and advance exploitation in the rapidly changing hardware environments through appropriate software developments.

This novel component in the e-Infrastructure ecosystem would help expand existing Distributed Computing Infrastructures (DCIs) serving eScience by ensuring easy access to virtually “infinite” resources and high mobility while hiding the complexity of set-up, maintenance and communication from users and reducing the length and costs of application porting through automation, as well as overcoming the need for in-depth knowledge of ICT technologies. Economies of scale will be achieved by optimising resources, reducing operational costs, especially energy costs, where savings are crucial for sustainability.

An ideal approach could be based on both open source and commercial solutions, combining the best of both worlds. Users would be enabled through access to a commercial multi-layer solution including compute and storage power, a development environment and immediate services, while advances in open source would also be ensured through community contributions to extend the capabilities of current DCIs and support efforts towards interoperability and portability.

Open source initiatives would be leveraged to pave the ground for interoperability. A good case in point is the Zend Framework project, which has invited the open source community and software vendors to participate in the formation of a Simple Cloud API. IBM, Microsoft, Rackspace, Nirvanix and GoGrid have already joined the project as contributors. In coming months, they will work together to define APIs for these cloud application services, enabling a generation of cloud native applications written in PHP. The Simple Cloud API is an open source project that makes it easier for developers to use cloud application services by abstracting insignificant API differences. One of the design goals of the project is to encourage innovation. To this end, the Simple Cloud API can be used for common operations while users can easily drop down to vendor libraries to access value-add features. One example of this is Microsoft Azure, which now also supports the full Java stack including open source tools such as the Apache web server, working towards interoperability.

But it doesn’t stop here. A cost and energy efficient on-demand environment has much potential to support incubators, industrial clusters and scientific parks, which are central to Europe’s economic strength, particularly in terms of high value-added categories like ICT, Biotechnology and Pharmaceuticals and R&D across diverse sectors. What’s more, such a solution would enable SME and small research labs by bringing the value-add needed to compete with the larger organisations that currently dominate the pharmaceutical landscape.

Significantly, such an approach meets with all four additional recommendations of the EC’s Expert Group Report for the future of cloud computing, that is, the need for large-scale research and experimentation test beds; developing joint programmes encouraging expert collaboration groups with industrial and public stakeholders; supporting the development of cloud interoperation standards and open source reference implementation and European leadership position in software through commercially relevant open source approaches. The time has come for Europe to tap into the expertise that will help make this happen, opening up strategic opportunities for a new scientific cloud that brings interoperability and innovation into sharp relief.

Bookmark and Share

Is flexibility in Grid/Cloud business relationships really an illusion?

January 18th, 2010

By Bastian Koller

Flexibility, interoperability or adaptability are just some of the well known buzzwords which come across when dealing with Grid or Cloud computing technologies. Even though these technologies are driven from different (end) user and provider groups, at a certain point they all face similar problems.

One of these is the limitation of existing  implementations with respect to their adaptability and extendibility in terms of the logic and capabilities they provide. Looking at the area of business relationships, a prominent example is the use of Service Level Agreements.

Current Cloud Infrastructure Services such as Amazon S3 or EC2 provide SLA capabilities, but in a very simple way.  They give its users the possibility to choose within a pre-defined set of service description the best fitting, but they provide no flexibility at all with respect to the (re-) definition of terms themselves or on how the agreement is reached. When you want to have a service, you can either take it with the pre-defined terms or leave it. There is no alternative.

Looking at current research activities, the aims are different. With the experience from handling Service Level Agreements within the Grid, it is obvious that this simple negotiation approach is needed, but it is not the “casus sui generis.”

Already heavily discussed, the list of potential ways (and by that protocols) how to reach an agreement is manifold and existing solutions are addressing a variety of them, but never all of them. Additionally the extendibility of these solutions is often not addressed adequately.

Imagine a Service Provider X which has installed an SLA Negotiation framework realizing negotiation protocol A (e.g.  Discrete Offer Protocol).  In parallel there is a potential Customer Y which is interested in X´s services, but has installed a framework, providing exclusively multi-phase SLA negotiation capabilities. By being restricted to their respective technologies (and inheriting their limitations), both entities would not be able to enter a business relationship, covering the delivery and usage of a service from X.

Generally speaking, this is only one case out of many.  Looking at the overall lifecycle of a service, it is not only about SLAs, it’s about Security, Workflow Management and many more aspects that might need this enhanced adaptability.

Lets make a mind experiment:

What may be a solution is the split of logic and base functionalities of components and structure the framework  as loosely coupled components, following the SOA paradigm. Coming back to the SLA Negotiation, an SLA Negotiator component providing capability A is in some parts similarly realized as a SLA Negotiator providing B. So why not taking out this base functionality (call it a fragment) and keep this as a stable basis for all further developments and enhance this with mechanisms to adapt this component during runtime with a logical bit.

So in our case, Service Provider X could adapt its negotiation component with logic B and by that enable interoperability with Customer Y. After successful negotiation, the provider could re-configure its component back to A.

Looking at this, there is a definitive need to perform research with respect to adaptability (and by that flexibility). If a solution like discussed in the mind experiment would exist, business relationship handling could be improved and the establishment of those relationships could be enabled, that were not possible before.

It will be a long way, but at the end, such a solution needs to be there or we won´t be able to benefit at a maximum from all the capabilities of the Cloud.

Bookmark and Share

OPENNEBULA: a flexible and interoperable cloud operating system

January 13th, 2010

By Ignacio M. Llorente

Future enterprise data centers will look like private clouds supporting a flexible and agile execution of virtualized services, and combining local with public cloud-based infrastructure to enable highly scalable hosting environments. The key component in these cloud architectures will the cloud management system, also called cloud operating system (OS), being responsible for the secure, efficient and scalable management of the cloud resources. Cloud OS are displacing “traditional” OS, which will be part of the application stack.

Flexibility in Cloud Operating Systems

A Cloud OS administers the complexity of a distributed infrastructure in the execution of virtualized service workloads. The Cloud OS manages a number of servers and hardware devices and their infrastructure services which make up a cloud system, giving the user the impression that they are interacting with a single infinite capacity and elastic cloud. In the same way that multi-threaded OS define the thread as the unit of execution and the multi-threaded application as the management entity, supporting communication and synchronization instruments; multi-tier Cloud OS define the VM as the basic execution unit and the multi-tier virtualized service (group of VMs) as the basic management entity, supporting different communication instruments and their auto-configuration at boot time. This concept helps to create scalable applications because you can add VMs as and when needed. Individual multi-tier applications are all isolated from each other, but individual VMs in the same application are not as they all may share a communication network and services as and when needed.

A Cloud OS has a number of functions:

  • Management of the Network, Computing and Storage Capacity: Orchestration of storage, network and virtualization technologies to enable the dynamic placement of the multi-tier services on distributed infrastructures
  • Management of VM Life-cycle: Smooth execution of VMs by allocating the resources required for them to operate and by offering the functionality required to implement VM placement policies
  • Management of Workload Placement: Support for the definition of workload and resource-aware allocation policies such as consolidation for energy efficiency, load  balancing, affinity-aware, capacity reservation…
  • Management of VM Images: Exposing of general mechanisms to transfer and clone VM images
  • Management of Information and Accounting. Provision of indicators that can be used to diagnose the correct operation of the servers and VMs and to support the implementation of the dynamic VM placement policies
  • Management of Security: Definition of security policy on the users of the system, guaranteeing that the resources are used only by users with the relevant authorizations and isolation between workloads
  • Management of Remote Cloud Capacity: Dynamic extension of local capacity with resources from remote providers

OpenNebula is an open cloud OS that provides the above functionality on a wide range of technologies. However, in my view, the main differentiation of OpenNebula is not its leading edge functionality but its open, modular and extensible architecture that enables its seamless integration with any service and component in the ecosystem. The open architecture of OpenNebula provides the flexibility that many enterprise IT shops need for internal cloud adoption. Cloud computing is about integration, one solution does not fit all. Moreover, as pointed out in the CloudScaling “Infrastructure-as-a-Service Builder’s Guide“, the right configuration and components in a Cloud architecture also depend on the execution requirements of the service workload.

Interoperability at the Cloud Management Level

The IEEE defines interoperability as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged” and Wikipedia introduces  interoperability as “the property referring to the ability of diverse systems and organizations to work together (inter-operate)“. Being the core component in any cloud solution,interoperability is crucial for the success of a cloud management system. We can compare the cloud OS with a the kernel in “traditional” operating systems. The cloud OS represents the basic functions in a cloud and requires a well defined communication with underlying devices and interface to expose administration and user functionality.

At the cloud management level, interoperability means:

  • Modularity and flexibility to easily interface with any service or technology in the virtualization and cloud ecosystem, and
  • Standardization to avoid vendor lock-in and to create a healthy community around

In fact interoperability should be evaluated from three different angles:

  • Infrastructure User Perspective: Users, application developers, integrators and aggregators are requiring a standard interface for the management of virtual machines, network and storage. OCCI is a simple REST API for Infrastructure as a Service based Clouds that is being defined in the context of OGF. This interfaces represents the first standard specification for life-cycle management of virtualized resources. OpenNebula has been the first referent implementation of this open cloud interface, and also implement the Amazon EC2 API.
  • Infrastructure Management Perspective: Administrators are requiring cloud OS to inteface into existing infrastructure and management services, so fitting into any data center. OpenNebula provides a flexible back-end that can be integrated with any service for virtualization, storage and networking.
  • Infrastructure Federation Perspective: Administrators are requiring cloud OS to manage resources from partner and commercial clouds

With high-end computing demands, cloud operating systems will continue to be a very active field of research and development. An open and flexible approach for cloud management ensures uptake and simplifies adaptation to different environments, being key for interoperability. The existence of an open and standard-based cloud management system like OpenNebula provides the foundation for building a complete cloud ecosystem, ensuring the new components and services in the ecosystem to have the widest possible market and user acceptability.

OpenNebula is being enhanced in the context of the RESERVOIR project, flagship of European research initiatives in virtualized infrastructures and cloud computing.

Bookmark and Share

Exploiting Social Networks for Building the Future Internet of Services

November 17th, 2009

Author: Dora Varvarigou

  • Introduction

Social Network (SN) environments are the ideal future service marketplaces. It is well known and documented that social networking users are increasing at a tremendous pace. Web 2.0 and (SN Sites) SNSs are attracting more than 125 million regular users within just 5 years of existence. The number of the potential customers is huge, coming from almost every societal class, cultural backgrounds, and ages. The requirements are simply a computer, a browser, network access, and the natural need for socializing. The latter results in a great number of SN service consumers who, through their interactions, create complex social schemes that are valuable to them.

Taking advantage of these social dynamics as well as the vast volumes of content generated every second is a major step towards creating a potentially huge market of services. Application developers can be anyone from an individual home user who plays around with Facebook, to a company exploiting the SN spaces to deliver its services. Providing developers with tools that enable them to manage the dynamically generated content and complex social interactions in a service application platform that allows them to build applications disregarding the underlying SN platform implementation will create an agile and profitable market of services and will bring the Internet of Services concept a step closer to realization.

Of course the basic question is why this hasn’t happened yet. Why SNSs haven’t turned into a prospective market of services? It is clear that there already is some business action going on in the SN framework. However, there are some characteristics that weaken the current case which are analysed below.

  • Lack of application interoperability

Many social networking platforms are aiming at providing different types of social networking services, yet their customers overlap to a great extent. This is indicative of the need to provide a wealth of services in a common target group. This scales even more if we consider the potential that is created if we mix and match these services along with the content that they are associated with.

However, the core services provided by SN that can be used for building larger workflows are limited in number and functionality since they are platform dependent. Google’s OpenSocial and Sun’s Zembly have attempted to deal with this issue by providing a common API for the application development and deployment in various SN spaces. Even though this effort has met significant success, it has also encountered the following major issues: Weakness to incorporate all services that the SN provides, leading to almost standard workflows with few variations and platform API-dependent services; poor usability thatleads to the exclusion of the home users and the creativity that they can bring; and, specific social networking platforms not opening their APIs because they do not have the business incentives to do so.

  • Support for Business-oriented Services

Modern SN platforms provide a range of utilities and services ready for consumption by the user with the purpose to entertain, educate, inform, collaborate, etc. Apart from these core functionalities, SN platforms also offer tools to allow the users themselves to become application developers and content providers. Facebook applications are a typical example of an API being provided for developers to freely build SN services given specific rules. The related Wiki* states that over 650,000 developers are using this API for contributing applications, however, a closer reading will reveal that there is a clear distinction between these developers and SN users. An evaluation of SN profile applications will soon lead to disappointment with regards to the quality, functionality and design of the application. These characteristics are not fitting for business services, which are expected to satisfy functional and non-functional requirements in an efficient way and with quality.

The reason for which this is currently happening has been summarised above: lack of adequate APIs for service composition across SN platform, lack of tools to manage content and social behaviour, and lack of social capital monetization mechanisms. The notions of QoS, trust and security, reliability, confidentiality and all these non-functional requirements, that are so important when building business services, need to be better explored in the context of social networks. Quantification of social capital in SNs implies that these concepts need to be revised and supported by appropriate mechanisms such as SLA managing mechanisms for ensuring QoS levels, reputation systems, and trust platforms, that will make it possible to evaluate and reward a good service.

SN platforms as they currently are do not provide the necessary support for developing and deploying business services. They do offer basic APIs and software libraries to allow for the development of applications but they cannot support sophisticated, high quality and business-oriented SN services since fundamental notions such as QoS, trust and security are not quantified and incorporated as mentioned above.

  • Lack of tools to manage content

Social Networking platforms contain an abundance of content that generates value and hasn’t been modelled yet. 3D Virtual Worlds and microblogs are typical examples of how text, pictures, movies or even scenes can attract the interest of users and create critical mass and trends. This, plus the volume of content generated every second, lead to a challenging management problem that needs to be resolved if we are to take advantage of the content wealth.
Current social networking platforms do not possess adequate tools to manage this content in a combinatorial manner. They only deliver core services for content representation that mostly fail to incorporate the user’s reaction when invoking them. Content in social networking is a raw material that, when used wisely, can create a service market on its own.
Yet, the lack of tools to handle content ownership, consistency, confidence and privacy keep this opportunity far from being seized.

  • Lack of instrumentation of social dynamics models

Most users of social networks aren’t exhibiting commercial behaviour; they are there just for having fun. Still, there is a great potential in making money out of SN and Virtual Communities. This potential has managed to generate a 130% increase on Facebook’s annual growth, escorted by 300M$ for the 2% of its shares in its first steps. And this success story follows many SNS of different types. But now that the hype around SN is starting to wear off and the SN technologies are entering the “Trough of Disillusionment”, is the time to investigate their real-world applicability and benefits and identify the actual sources of revenue. The volume of potential customers, service and application providers as well as the generated social capital and the wealth of actions and content available in these spaces are indicative of the technologies’ worth, however, we still lack the understanding of how their combination is generating value.
As a first step, we need to build utilities that capture the notions of social network dynamics and human behaviour so as to be able to build business on top of it. Identifying the actions that bring value is utterly important in supporting business models within social networks. What is more interesting is the modelling of the interactions among humans through the social networking core services as well as between humans and the virtual environment. Each action in a social network triggers a series of reactions. The reactions generate events that affect both the environment and the users. This process represents the user’s social networking behaviour and is indicative of how a critical mass of individuals might behave as customers (social network economics).
It is necessary to materialize in a tangible manner, the notions of actions, events, behaviour and trends and encompass them in a service provisioning platform operating in a service networking context. Incorporating these utilities and delivering them to users will permit them to build applications that maximize their effects and range by taking into account the social networking context.

  • Incentives for Development of Business Services

By dealing with the abovementioned weaknesses, we will pave the way for building qualitative, functional and usable business applications. By providing application interoperability; support for SLAs and QoS; tools for content management and tools for materializing social behaviour; and most importantly, a usable framework to build services in SN spaces using all the above capabilities, we can provide incentives for the development of business applications, while utilizing the full potential that the openness that SNs provide as well as the wealth of content available in an efficient and intelligent manner.

Bookmark and Share

The customer’s point of view

November 9th, 2009

Author, Csilla Zsigri

Dear Cloud Expert,

I’m a potential cloud user. Ongoing technical discussions about what cloud computing is, do not help me much in understanding what clouds are useful for, they rather confuse me. I guess I will never fully understand what you really mean when you use all those expressions like on-ramp or bursting or runbook automation. Also, my feeling is that not every company that bills itself as a cloud player has the experience or domain knowledge to back that claim. What I really want to know is why I should use a cloud, how to get started, how the different models work, which vendors I should be considering, what their competitors are doing, if there are deployment experiences and best practices I can look at, and so on.

I would appreciate if you shed some light on this.

Posted by
The Customer

Dear Customer,

I´m glad you posted your needs and concerns. I´ll try to walk you through some important business drivers which can help you understand the benefits of the cloud.

I see cloud computing as a way to use technology rather than a technology itself. It’s a logical endpoint for some combination of activities including grid and utility computing, virtualization and automation. With clouds, enterprise IT is not organized neither wholly in-house nor entirely outsourced, instead it is located at some optimal point between the two.

Cost reduction is a key driver in the short term. For instance, you can run simulations in a day and then have more time to write a report. This translates into cost savings that would otherwise have been spent on provisioning new hardware. Cloud computing eliminates the need to procure and manage hardware and you pay only for what you consume. You reduce capital spending without negatively impacting operations.

Operational flexibility (scale up/scale down) that clouds offer, the ability to more quickly respond to changing needs and faster time to market are often more important in the long term than lower cost. You can expand on on-demand and pay-per-use, scale fast and if it doesn’t, fail it cheaply.

Business agility, flexibility and execution enable you to effectively start over with new projects and reuse components between projects. This drives better utility and community across your organization or organizations. You can also support your developers by addressing their needs to create their own applications or do simulations as cloud computing allows them to quickly develop and deploy applications and services.

Automation is another key driver for using clouds. Provisioning automation is a force multiplier for overworked systems administrators. The benefits are immediate in terms of both time and money.

Faster time to market is a key enabler of innovation that gives you competitive advantage. It provides you with the ability to more quickly respond to changing needs.

Using the cloud, you can make your revenue grow and shrink your operating costs at the same time. You can quickly deploy and scale your services along with reduced administrative overhead and achieve higher total ROI.

Green issues are climbing the corporate agenda and legislation will reward them in IT as elsewhere. Choosing third-party cloud services means that you are able to reduce your own phys¬ical IT footprints and power consumption and pass along responsibility for the associated carbon use to a third party. By the same token, the use of internal cloud mechanisms enables organizations to better manage underutilized resources.

I hope you find this useful. In order to answer all your questions, let me invite you to our next event, a half-day program, where industry experts, cloud providers, financial professionals and end-user companies will join together to learn, understand, network and develop strategies for today’s cloud marketplace.

Posted by
The Cloud Expert

Bookmark and Share

The difficult marriage of cloud and data-intensive apps

October 30th, 2009

Author Pawel Plaszczak
Pawel also blogs regularly at bigdatamatters.com

In certain sectors that were the early adopters of Grids, migration to the Cloud is bound to happen soon.
Pharmaceuticals is a good example. As Bob Cohen pointed out in a recent presentation:

• Eli Lilly has already tried using the Amazon EC2 external Cloud,
• GlaxoSmithKline is looking at using internal Clouds

As I remember, years ago Glaxo was among the early Grid users. Like many other pharmas, they used
software from UnivaUD to distribute protein docking simulations over a large number of machines. Now UnivaUD also sells Cloud services.

This supposedly common movement from Grid to Cloud is thought provoking. What does this “evolutionary step” really mean? Something conceptually quite simple: The Grid makes it possible to manage processes in many physical machines. The Cloud offers even an greater potential: to manage processes in many virtual machines, or even to manage those virtual machines like they were processes. This is what VMware vSphere offers or what internally powers Amazon EC2. So:

Cloud = set of virtual machines managed by a scheduler (Grid).

All those above named are great products. If you want an internal Cloud. The thing is: in solving large data challenges, the Cloud is no less limited than its predecessor, the Grid. Chris Dagidigian of gridengine.info said at BioIT that he “solved real problems” on the Cloud. This is not surprising (BioTeam once teamed with Univa to demonstrate Grid Engine on AWS), but such a statement needs an explicit remark: problems solvable on the Cloud are still a small subset of the World’s important data processing challenges.

Virtualized Cloud environments are perfectly isolated from each other. If you have one, pray that you only happen to compute tasks that can be domain-decomposed into millions of perfectly independent pieces.

Protein docking, mentioned earlier, is like that: thousands of simulations, one per each set of chemical
compounds, that do not need to communicate. But most apps, in most industry sectors (not just bioinformatics) do not share these characteristics: they require intensive database querying and/or data sharing. Genomics is like that. The Cloud will not help here. Clouds may even make it more difficult. I also agree here with another of Chris’s statements: Cloud data ingest is a pain.
The answer to large data challenges is a puzzle of three pieces:

1. efficient distributed processing
2. efficient data provisioning
3. efficient storage

Workload management engines (Grids, Clouds, you name it) provide the first point. Today’s data intensive apps need the full stack, an efficient integration of (1), (2) and (3). A fully scalable data integration. Where does the challenge lie?
Efficient processing is easy. With many great scheduling vendors, this is not rocket science any more.
Efficient storage is becoming commonplace too, with interesting examples of federated storage distributions. The trick is in the middle layer: an efficient connection between these two. And that is really difficult. There is certainly no universal solution, but we have recently had some successes here.

Bookmark and Share

Cloud for Academia?

October 5th, 2009

By Adrian Mouat

Grid computing was born in academia and was originally designed to support scientific and research computing. In contrast, Cloud computing has a business background and is designed to enable the delivery of scalable Web applications.

The BEinGRID project has looked into how Grid is appropriate for business use (and has run several successful business experiments proving this proposition), but what about looking at whether the Cloud is useful for academia? Can it be effectively used to run scientific codes, such as those found in climate modelling, fluid dynamics or molecular physics simulations, which have traditionally required the use of supercomputers?

On the face of it, Cloud services offer a compelling, simple and relatively cost-effective HPC proposition – just pay for as many CPUs as you want, when you want them. Of course, the truth isn’t quite as simple as that.

Take a look at the Google Cloud offering, App Engine. Users access App Engine through an API, which places quite a lot of restrictions on the code that can be run, including:

• It must be written either in Python or Java (or use a JVM based interpreter or compiler) – meaning any C or Fortran codes have to be ported
• It can’t start any threads (instead the API is used to start a new task)
• Any single request/task must complete in 30 seconds
• It has to stay within quotas on CPU, bandwidth and storage usage

For full details see the App Engine documentation. Note that there are different quotas for the free and billed service, and that it is possible to negotiate increases to the quotas.

This doesn’t make scientific computing impossible, but it does put in place a lot of barriers. It would be interesting to see what could be achieved computationally, given the above restrictions, by an academic research group that chose to use App Engine. However, the bottom line is that Google App Engine is more suited to creating dynamic web applications, such as photo and document editing tools, than to processing long-lived scientific computations.

Amazon’s offering, EC2, is a lot more promising. EC2 gives users much more access and control over the system through the use of virtualisation. Users are free to install whatever software and applications they need on EC2.

Users provide “virtual images”, instances of which can be launched at any time and will normally be running in under 10 minutes. By default, only 20 instances (per region) can normally be launched, but users can apply to increase this limit, potentially allowing thousands of instances to be launched. Amazon also supports Hadoop, Condor and OpenMPI for batch/parallel processing. The Data Wrangling blog has some in-depth information on using Amazon to set up an MPI cluster.

For data storage, the Amazon S3 service can be used to store the enormous amounts of data produced by some scientific applications. Access to the data is controlled via Access Control Lists (ACLs) and data is encrypted during transmission using SSL. Users are encouraged to encrypt any sensitive data being stored in S3. It is important to note that Amazon does not guarantee that data will not be lost or compromised (see point 7.2 of the AWS Customer Agreement).

So, it should be fairly easy to get most scientific computing codes running in parallel on EC2. But what’s the performance like? There has been some research and the results are mixed. Comparing a roughly equivalent amount of CPU resources, super-computer clusters are typically much faster at processing scientific codes, largely due to having a better interconnect (see this article by Edward Walker). However, if we include the amount of time it takes to get the code running (i.e. to request and boot the images on EC2 and to wait in the queue on a super-computer cluster), EC2 is likely to be faster in many cases, dependent on the size of the job and scheduling policies (as shown by Ian Foster on his blog). In the future, EC2 may offer an even more competitive service if they upgrade their systems.

I haven’t taken into account Microsoft Azure, which is still in “Community
Technology Preview” at the time of writing, but may be interesting for any .NET based scientific codes. The offering is similar to EC2, the main difference being that users must use a supplied Windows 2008 Server VM.

With all the money and time being invested in Cloud computing, it will be interesting to the see the effect it has on HPC resource providers over the next decade. Will the emergence of cloud lead to a greater trust in outsourcing compute resources and a direct boost to HPC resource providers? Will there be a level of symbiosis where Cloud resources can be built on top of or alongside HPC computing resources? Or will they just be direct competition? One things for sure; the rules of the HPC game are being questioned.

For more information on using Cloud platforms for scientific computing, see the HPCCloud Google Group.

Bookmark and Share

I do also read Millennium during my holidays

September 8th, 2009

By Rosa M. Badia

When being asked for a new post, I was discussing about the possible topic and a colleague suggested: “Maybe you can summarize or give your thoughts about something you have read during your holidays”. My answer was: “Well, if you want, I summarize the last book of the Millennium trilogy…”. Although initially I was kidding, taking a second thought on it, these series of books brings several aspects that may have interest to the gridvoices audience.

One of the topics that I would like to mention is a social one, the participation of women in IT. One of the main characters of the books is a girl, Lisbeth Salander, extremely intelligent, with photographic memory, very good investigator, and with special aptitudes for computer technology. Though, the first thought can be very positive, for once the character of a best-seller is a woman with interest and skills on technology! This can be one of the ways to attract girls to engineering studies, specially IT ones. However, nobody’s perfect: Lisbeth is a punk, wearing several piercings a large dragon tattoo in her skinny body. While I have nothing against the clothes that girls wear and against piercings or tattoos, the real negative aspects of Lisbeth are that she is a very asocial and violent girl. Although the books end up positioning Lisbeth as a heroine, it is clear that the image that is given from her is that she is a freak.

When I ask myself about the reasons why girls are not attracted by computer sciences studies or related topics, I have to admit that I do not understand why. For me not only computers, but any engineering or technology field is highly interesting. And it was like this till I was a child, therefore I can not understand the reasons for this global low interest of girls in these fields.

According to some studies1, men and women view computers very differently. Studies show that women view computer as a tool and with much more societal context than men do; they are much more concerned with effect of technology on other disciplines, and how it can be used to improve society. On the other hand, men have much narrower focus of interest; they do not require a “larger goal” in connection to their interest.

We have also to fight with the extreme social stereotype that computer scientists are “geeks” and “nerds” without social interaction is particularly detrimental to females. Girls often dislike the idea that computers “become their life”.

The other topic that I would like to outline from these books is computer security, specially when considering distributed platforms. Lisbeth is not only good with computers, she is a hacker. She has developed a program called Asphyxia that replaces the browser in the victim computer by a new one that then is used to download a mirror version of the hard drive in a remote server. The download is periodically updated in the mirror, in such a way that Sally has access to updated information of her victim. According to the books, Asphyxia is not build all at a once, but its source code is transfer by small amounts in infected emails. Once all the code has been transferred, it is build and replaces the original explorer. Although more close to fiction than to science, I found the idea not very far from something that can be feasible.

The weak point I find in the program is the starting point: theoretically, there is an application that intercepts the user emails and randomly adds the lines of code the will afterwards build the Asphyxia program. Another weak point is how the email reader will strip the source code lines and strip in a whole file, and also how the build process will be started. But probably there would be minds that will think about these issues…

Given that there would be always people willing to find ways of perform similar activities, there has been intensive research activities in the area of security for grid computing in the last years. As in other areas, the existence of standards is of prominent importance. The Open Grid Forum (OGF) has a whole area that considers security.

The Security Area2 is concerned with technical and operational security issues in Grid environments, including authentication, authorization, privacy, confidentiality, auditing, firewalls, trust establishment, policy establishment, and dynamics, scalability and management aspects of all of the above.

The area is composed of four groups: Certificate Authority Operations WG (caops-wg), Firewall Issues RG (fi-rg), Levels of Authentication Assurance Research Group (loa-rg) and OGSA Authorization WG (ogsa-authz-wg).

Recently, a new group has been proposed: The Firewall Virtualization for Grid Applications – Working Group will leverage the application requirements from the FI-RG to standardize a set of service definitions for a virtualized control interface into firewalls and other midboxes allowing the grid applications to securely and dynamically request application/workflow-specific services from those devices, for the duration of the service.

The security in Cloud computing is also under research, but there are no security standards or accepted best practices.

1 Barriers to the advancement of technical women, Anita Borg Institute, http://anitaborg.org/files/womens-tech-careers-lit-reviewfinal_2007.pdf

2 http://www.ogf.org/gf/group_info/areasgroups.php?area_id=7

Bookmark and Share

‘Future Internet’ is already happening

August 10th, 2009

by Csilla Zsigri

The other day, as I was reading the press on the plane, a few very interesting articles in the ‘Companies&Markets’ section drew my attention and made me realize that the ‘future internet’ was already happening. In academic and research circles, there is a lot of thinking around the future of the internet –which is by the way becoming an over-used term-, while market players are already making it happen…

Microsoft is pushing ahead with online versions of some of its core software, among them a cloud operating system designed to extend Windows to the internet. Moving in the opposite direction, Google is trying to extend its internet platform to PCs. Google´s decision to include a computer operating system in its Chrome web browser shows that browsers are breaking out of their traditional role. Browsers will no longer be application icons on a PC desktop. Microsoft has also been working on a prototype browser called Gazelle, which would be tied in with the OS and run web apps more securely. Apple´s Safari 4.0 has been designed so users can flick through previously viewed pages (like iTunes´ cover flow feature). Opera added a server to its browser, and through Opera Unite, they will allow users to host their own websites, share files and stream media. And browsers are just the starting point of the further evolution of the internet. Today’s work on the development of the browser is helping fuel the future internet.

People start their day on the Web, using different tools. The forthcoming Google Wave is a reinvention of e-mail that uses browser technologies to blend in instant messaging and file sharing in a new kind of interface. You can surf the Web over a Wi-Fi or 3G connection using your handset that is a phone (with multi-touch interface), an internet device and an iPod in one. The iPhone brings us all kinds of apps, if you believe Apple’s advertising (“From games to business to health and fitness, there’s an app for that.”), there is an application for almost everything on the iPhone.

NTT DoCoMo has unveiled plans to launch its next-generation 4G mobile phone network next year. With this, they will be one of the world’s first companies to deploy the Long Term Evolution (LTE) standard on a large scale. LTE has been developed to boost download speeds for mobile devices and open the way for Japanese handsets to become compatible with the rest of the world. This year, DoCoMo launched a service called BeeTV, which offers videos tailored to the small screens of the mobile phones. DoCoMo hopes that movies will prompt customers to sign up for unlimited data packages, which can be delivered economically via LTE.

Shouldn’t these ongoing and forthcoming activities and a zillion other activities of both big players and small startups be dubbed future internet? They evidently shape the internet. We are witnessing how networks, infrastructures, applications, services, content and devices converge. They serve us and surprise us. IT services are procured and delivered in new ways for organizations of all sizes, improving end users’ ability to deliver services in a faster, easier and more economic manner.

To make it all happen, these companies are investing huge amounts of money in R&D. In 2008, Apple spent 1.1 billion USD (751.8 million Euros), 3.4% of its fiscal revenue on R&D. Microsoft´s R&D spending hit a record high in 2008, approximately 8.1 billion USD (5535.6 million Euros). Google investments were around 2.94 billion USD (2009.2 million Euros) last fiscal year. These investments serve strongly focused business strategies.

An interesting exercise would be to compare these figures with European ICT research and development funding. There are roughly 1900 million Euros dedicated under the umbrella of FP7 ICT to finance R&D projects in 2009 and 2010 altogether. Within this framework programme, the ‘future internet’ themes belong to Challenge 1 where projects dispose of a funding of 557 million Euros approximately during the abovementioned 2 years. Depending on the size of the organizations involved in the winner R&D projects, this funding may be filled up with additional private investment of 25-50%.

Bookmark and Share

Virtualized Infrastructures: What is the next step?

July 27th, 2009

Posted by Theodora A. Varvarigou

Various attempts are conducted nowadays to shape the vision of the Future Internet, the Internet of Things and in general future flexible and scalable infrastructures, which will support and enable the provision of services that utilize different kinds of resources such as devices and digital equipment (e.g. sensors). Virtualization of resources (including computational nodes, supercomputers, workstation-clusters, network elements, data-storages, internet networks, etc) has been identified as a way that allows for service provision with Quality of Service attributes as requested by the end-users, maximizing at the same time the resource utilization on the service providers’ side. But what has been achieved nowadays in the aforementioned context? Current environments not only aggregate but also virtualize a large number of independent and geographically distributed computational and information resources. Furthermore, the advent of Service Oriented Infrastructures that take advantage of virtualization technologies made feasible the provision of services by addressing at the same time a set of challenges such as live migration, fault tolerance, quality of service etc. Nevertheless, dynamic virtualized infrastructures also include a number of non-virtualized resources (mainly referring to digital devices). And these are the resources that have not yet been virtualized, even though they consist as main elements in such environments: devices and digital equipment.

Taking into consideration that emerging applications are tightly coupled with the aforementioned digital devices, service offering is limited (a simple example refers to the guaranteed network links through virtual environments that cannot be provided to digital devices that are not part of them). Future applications, such as social networking environments, will not only use but will require device-enriched environments, which results to an increasing demand of services and as a domino effect to the complexity of their composition and management since the number of components disposed to failure will increase as the size of applications will increase in terms of factors such as devices, hardware components, software components and geographical scale.

Is virtualization of devices a trivial task? The answer is certainly “No”! In order to make the next step on virtualization, a number of aspects have to be addressed, ranging from data management – since high quality input will be transmitted from sensors of all kinds – to network management (due to the exponential traffic growth), privacy, data integrity issues as well as the economic and societal impact of new networked resources (sensors, actuators, communication devices, etc). Applications in the near future will connect any kind of devices and taking into account that computing became a commodity, research efforts need to focus on how digital equipment and devices with specific characteristics can be virtualized. Sensors, displays, actuators and – generally – any type of electronic, interactive mechanism will be soon regarded as an integral part of dynamic virtualized infrastructures.

Bookmark and Share