Between Cluster Computing And Grid Computing Information Technology Essay

Published: November 30, 2015 Words: 3469

The most commonly used analogy when describing grid computing is that of the 'electrical power grid', whereby users simply plug their appliances in the power grid to receive electricity without being aware of where or how it was generated. Similarly, grid computing allows resources, storage, and processing power to be shared over a network of computers.

To be part of a grid, users simply have to register to one or more project/s they want and install a piece of software on their computers. After that whenever the user is connected to the grid and has idle resources to offer, jobs will be sent to him/her to be processed on his/her machine. Once the processing is completed the results are sent back to be analysed.

"The Grid is an emerging infrastructure that will fundamentally change the way we think about-and use-computing. The word Grid is used by analogy with the electric power grid, which provides pervasive access to electricity and, like the computer and a small number of other advances, has had a dramatic impact on human capabilities and society. Many believe that by allowing all components of our information technology infrastructure-computational capabilities, databases, sensors, and people-to be shared flexibly as true collaborative tools, the Grid will have a similar transforming effect, allowing new classes of application to emerge. "

(Foster, I. & Kesselman, C., The Grid 2 Blueprint for a New Computing infrastructure 2nd Ed.)

Difference between cluster computing and grid computing

Grid computing is similar to cluster computing in the sense that both use more than one computer to solve a big problem. However, their architecture and implementation is completely different. In cluster computing the connected computers are dedicated to work as a single unit, whereas in grid computing only the unutilised resources are used to perform a task. Another difference between cluster computing and grid computing is that computers connected in a cluster have the same operating system and hardware while computers connected in a grid need not have the same operating system and hardware. Also, in a cluster resources are managed by a centralised resource manager while in a grid each computer is independent and has its own resource manager.

Types of Grids

Computational

A computational grid, as its name suggests uses its resources specifically for its compute power. In this type of grid most of the machines are high-performance servers.

Scavenging

A scavenging grid is most commonly used with large numbers of desktop machines. Machines are scavenged for available CPU cycles and other resources. Owners of the desktop machines participate in the grid as and when they have idle resources.

Data grid

A data grid is used for housing and providing access to data across multiple organisations. Users are not concerned with where this data is located as long as they have access to the data. For example, you may have two universities doing life science research, each with unique data. A data grid would allow them to share their data, manage the data, and manage security issues such as who has access to what data.

The Grid Architecture

There are two main types of grid architecture:

The intra grid architecture

The intra grid architecture is one whereby the computers are interconnected over a Local Area Network and usually restricted within a department or organisation. In an intra grid the number of computers can be limited depending on the requirements of the organisation.

The inter grid architecture

The inter grid architecture is the extension of the intra grid architecture whereby the computers are connected over a Wide Area Network. Users from any part of the earth can connect to an inter grid project via the Internet. For huge projects in an inter grid participants can keep on joining at any point in time of the project.

Figure 1.0: Inter grid and intra grid architecture.

(Source: Bart, J. et al., Enabling Application for Grid Computing with Globus p.39)

Benefits of using Grid Computing

Exploiting under utilised resources

One of the main aims of using grid computing infrastructure is to run existing applications on different machines. The machine on which the applications are normally run might be unusually busy due to a peak in activity. The job/s could thus be run on idle machines elsewhere on the grid.

Parallel CPU capacity

Another attractive feature of grid computing is parallel process. Instead of running a huge, processing intensive job/s on a single or few machines, it is more desirable to running smaller units of the job on several machines whereby combined CPU powers speed up the processing.

Virtual platform for collaboration

Grid computing provides an environment for collaboration among a wider audience. Resources can easily be shared in a grid; therefore it enables several individuals and organisations to work together on projects.

Easy Scalability

New machines can easily be added to the grid independent of the operating system, hardware or software being used on the machines.

Barriers for Grid Computing

Though there are several advantages of using the grid for solving problems, there are certain factors that may slow its evolution. These may be:

Internet connections

For participants to be able to contribute to an inter grid project, they need to be connected via the Internet. It might be difficult to make use of maximum available resources, as participants will be available on the grid at different points in time.

Security Issues

Access rights will have to be closely monitored especially in intra grids that are used within an organisation to protect trade strategies. Security during communication is also another issue to taken into consideration.

The Globus Toolkit

The Globus Toolkit is an open source software toolkit used for building grids. It is being developed by the Globus Alliance and many other companies over the world. Several projects and companies have adopted the solution offered by Globus for different purposes. The latest release of the Globus Toolkit is 5.0.2. However, the toolkit allows setting up a grid only in Linux environment.

Existing Grid Applications

SETI@home project is a scientific experiment that uses Internet-connected desktop PCs in the Search for Extraterrestrial Intelligence. It uses the idle processing power of connected machines to analyse data sent by radio telescope.

Figure 1.1

(Source: Vladimir, S., Grid Computing for Developers p.7)

Network for Earthquake Engineering Simulations (NEES)

The NEES project allows earthquake engineering researches across the United State to share research equipments, data and leading-edge computing resources through grid computing. The purpose of this project is to pursue advance study about earthquakes and their effects through simulations.

Butterfly.net

Butterfly.net is in the gaming industry and is using grid technology to distribute high-performance 3D games online. With multiple concurrent games running on one grid, publishers can dynamically allocate resources to more popular games, launch new ones more quickly, and offer flexible and innovative subscription plans to monitor revenue growth.

MediGRID

MediGRID is a grid infrastructure for medical and bioinformatics research. It provides access to shared resources and data and used processing power of connected machines to reduce processing time of data, such as processing of images for researches form different organisations to work on.

Some Statistics

The following statistics have been obtained from http://www.worldcommunitygrid.org. The statistics below gives the number of users who have participated for the respective project, the total amount of processing time utilised, amount of valid results returned and points generated for each user. 'Points generated' refers to a sort of acknowledgment, users are given points according amount of processing time contributed and valid result submitted. Allocating points to users help to rank users; it is a way of identifying users who have contributed more than the others as well as a way of rewarding users for their contribution.

Number of active users around the world.

Figure 1.2

Global statistics for all active projects for the period 28th August to 25 September.

Figure 1.3

Statistics for a particular project: Help defeat cancer

Figure 1.4

Steps to contribute to grid computing projects

The first step to be able to donate ones idle processing power to any grid computing project is to download and install a small program called an 'agent'. The agent is the is a task manager and scheduler which is going to request jobs for processing when the cpu is idle and submit back the results. Below are some steps to install the agent used by World Community Grid. The agent used by the World Grid Community is called BOINC (Berkeley Open Infrastructure for Network Computing, which was developed at University of California, Berkeley, USA)

1. Go to the location where you downloaded the BOINC Windows Installer (it will be named something like: wcg_boinc_#.##.##_windows_intelx86.exe )

2. Double-click to run the installer, and follow the prompts that appear.

Click "Run"

Note: Please notice the Publisher is University of California, Berkeley

Click "Next"

Select "I accept the terms in the license agreement" and click "Next"

Click "Next"

If you would like to install the agent as a service, click advanced button. It is recommended to then check all 3 boxes.

Click "Install"

Click "Finish"

3. When the installer is finished, the BOINC Manager will run automatically, and should automatically log in you in to World Community Grid.

4. If the User Information dialog (below) appears, just enter your Member Name and Password, and click "Next"

5. You should now be attached to the World Community Grid project.

Receiving new task for processing

Agent requesting for new tasksExample of task being process

Active project running in cpu.

Tasks from the following projects are being processed.

faah (FightAids@Home Project)

hccp1 (Help Conquer Cancer Project)

Analysis

Gathering Information

In order to know whether this project is feasible and will be of interest to people using compression software, a small survey will be carried out. A sample of 50 people will be asked to fill a questionnaire to gather certain information pertaining to video compression and the people's preferences. A sample questionnaire is given in the next page

.

University of Mauritius

The following survey is being carried out for the purpose of collecting information for a project based on video compression using grid computing. Video compression refers to reducing the size of digital data to reduce space. Grid computing refers to using idle resources of computers such as processing power to speed processing of tasks. It would be very helpful if you could spare few minutes to answer the following questions.

Have you ever used any compression software?

Yes  No

If yes, answer the following questions else skip to question no.4

How frequently do you use the compression software?

 Very often  Once a week  Rarely

How much time does it take on average to compress a video?

 Less than 30 minutes  Less than 1 hour  More than 1 hour

Do you have any preference for a particular video format?

 Yes  No

If yes, please specify.

 3gp  mpeg  wmv  avi  dvix  mp4

Others, please specify: ……………………………………………………………….

Have you heard about grid computing before?

 Yes  No

If a grid computing platform were to be implemented at the University of Mauritius, would be interested in using it?

 Yes  No

What is your opinion on such a platform?

………………………………………………………………………………………………………………………………………………………………………………………………………………

Thank you for your time!

Analysis of questionnaire

After the questionnaires were duly filled and collected, the acquired data was complied and presented in the form of charts to give a clear overview of the audiences' response. Pie chart was chosen to show the results of the survey as it is a better way of summarising information, it allows easy comparison of results in terms of percentage and it is most suited for the types of questions in the questionnaire which had mostly fixed answers.

Question 1: Have you ever used any compression software?

Response:

From the above pie char it can be seen that the majority of the people do use compression software. Very few, 24% do not use any compression software. So it can be concluded that video compression is very demanding nowadays.

Question 2: How frequently do you the compression software?

Response:

The majority of people use a compression software very often. Represented above 45% of the people. 39% of them use the software at least once a week and only 16% of them use the software rarely. Not only is video compression very demanding, but it is also being used regularly.

Question 3: How much time does it take on average to compress a video?

Response:

Video compression takes quite a lot of time and processing resources. As indicated above huge videos can take more than one hour to be compressed. Even average size videos can take upto thirty minutes to be compressed. 63% of the sampled people said that it takes them around thirty minutes for regular video compression. For 29% of them video compression can take upto one hour. For only 8% of them compressing huge video can some times take more than one hour.

Question 4: Do you have any preference for a particular video format?

Response:

As shown above 72% of the people claim to have a preference for certain video formats. 28% of them do not have a particular preference, they compress videos to any format, as long as the the resultant video is smaller in size.

Question 5: If yes, please specify .

Response:

Out of the 72% people who have stated to have a preference for a particular video format were also requested to specify the formats they like most. The above char represent the percentage pertaing to video formats prefrences.

Question 6: Have you heard of grid computing before?

Response:

Since grid computing has emerged very recently, less than ten years ago and because its use is extended to mostly private organisations and to huge projects many people are not were of this technology. This project provides an opprotunity for people to be aware about this technolgy and also experiment with it.

Question 7(i): If grid computing platform were to be implemented at the University of Mauritius, would you be interested in using it?

Response:

80% of the people showed an interest for using a grid computing platform for the purpose of video compressed. Only 20% of the people werer relunctant to the idea of using such a platform. Therefore this project can be implemented since the majority of the people are interested in this new technique for speeding video compression.

Question 7(ii): What is your opinion on such a platform?

Response:

76% of the people who answered the questionnaire showed a positive interest for the implementation of a grid computing platform for speeding video compression. 8% of them were not interested at all and 16% had no particular opinion. Most of the 16% of people who did not have any particular opnion was because the concept of grid computing was not clear to them.

Alternative ways of solving the problem

Existing approaches to setting up a grid for the purpose of video compression:

Modifying existing open source software

Using Globus Toolkit for configuring a grid network

Starting the development from scratch

Proposed solution

Grid computing is quite a new concept and is being mostly used by big organisations mainly in the field of research where lots of data computation and storage is required. That is why there is not any open source grid software that exists that can be modified to suit our need.

The Globus Toolkit allows setting up grid only in Linux environment. Furthermore if ever a new node is required to join the grid the user will have to configure the node using complicated command line instructions therefore; using Globus Toolkit will not be a good solution for implementing the grid.

Since our requirement is only a grid that will allow nodes to share the load of compression starting the project from crash will be a far better solution.

Evaluation of programming languages

C++

Performance wise C++ is a better programming language than Java or .Net and it provides socket programming facilities as well. However, C++ is a platform dependent language and programming sockets in C++ might be a bit complicated.

Java

Creating network based application using Java is much easier, since it is a network centric platform. Programming using socket is much easier in Java as compared to C++ or .Net. Java applications are platform independent as well as portable; it can thus be run anywhere. The only inconvenience with Java is the compilation process which is a bit slower since the source code has to be transformed into byte codes first.

.Net

.Net provides socket programming framework using C#. .Net also provides runtime diagnostics tools such as event logging, performance counter and tracking which make debugging much easier. However, like C++, .Net application also is platform dependent; it runs only in windows.

Choice of tool

Since Java is a network centric platform as well as a cross platform language, it was chosen to be the programming language for the development of the grid. Another reason for choosing Java is that it provides easy implementation of threads which might be very useful to handle different nodes in the grid.

Design

Software Design Approach

Performance

The system's performance will mainly be determined in terms of participants available for compressing the file and the size of the file itself. The larger the file size and the lesser the number of available participants more time will be required to compress the file.

Interactivity

To allow the system to be user friendly, the user will be able to interact with the system through an interface. That is the user will be provided with a simple window via which he/she can choose which file to send to the server for compression.

Flexibility

The system will be flexible in terms of the availability of the participants for sharing the compression load. At any point in time when the server will request for the processing power of the participants, all the participants having a cpu usage less than 70% will be able to share the compression load.

Portability

Since the system will be developed using Java programming, its concept of byte codes generated by the Java virtual machine will allow the system to be run on any platform.

Quality of Service

The features mainly concerned with the quality of service of the system will be the compression function that will be used to compress the video send by the user.

Error, Exception Handling and Fault Tolerance

Errors such as incorrect file format being selected for compression will be handled by notifying the user using clear error messages. Exception handling will be used to make such that the connection being the server and client is well established before proceeding further. In case the server is not available the user will be immediately notified of this situation to avoid the waiting indefinitely.

Security

As the system will be utilised only for sending and receiving video file over a LAN and no other confidential data will travel over the network there is not much need for security as encryption or authentication.

Structure of the system and its components

The proposed solution will be implemented as client-server architecture. The clients will be the users who have a video to be compressed and the rest of the clients available in the grid to share the load will be regarded as the participants. Below are the modules that will be used by the server:

Broadcasting a message to all clients in the grid

Split a video received by a client into different chunks depending on the number of available participants that can share their processing power

Send the video to be compressed to the participants

Wait for all the compressed chunks to return

Merge all the chunks in the correct order

Send the compressed file to the client

The client program's task will be lesser than that of the server. However, since the clients are sharing their processing power some components used by the client will be complementing the task of the server. The following are the modules that will be required by the client program:

Send whole video to be compressed to the server

Reply with CPU usage level when the server sends a broadcast message

Compress part of the video send by the server and send it back

Architectural Design

The following sequence diagram gives an overview of the functioning of the whole system in terms of interactions between the client and server modules.

Bibliography

Foster, I. & Kesselman, C., 2004. The Grid 2 Blueprint for a New Computing infrastructure 2nd ed. Elsevier Inc. USA.

Bart, J. et al., 2003.Enabling Application for Grid Computing with Globus. IBM Corporation. NY, USA.

Bart, J. et al., 2005. Introduction to Grid Computing. IBM Corporation. NY, USA.

Vladimir, S, 2006. Grid Computing for Developers. Charles River Media. Massachusetts.