Tuesday, March 27, 2012

Grid Computing

A scientist studying proteins logs into a computer and uses an entire network of computers to analyze data. A businessman accesses his company's network through a PDA in order to forecast the future of a particular stock. An Army official accesses and coordinates computer resources on three different military networks to formulate a battle strategy. All of these scenarios have one thing in common: They rely on a concept called grid computing.
At its most basic level, grid computing is a computer network in which each computer's resources are shared with every other computer in the system. Processing power, memory and data storage are all community resources that authorized users can tap into and leverage for specific tasks. A grid computing system can be as simple as a collection of similar computers running on the same operating system or as complex as inter-networked systems comprised of every computer platform you can think of.
The grid computing concept isn't a new one. It's a special kind of distributed computing. In distributed computing, different computers within the same network share one or more resources. In the ideal grid computing system, every resource is shared, turning a computer network into a powerful supercomputer. With the right user interface, accessing a grid computing system would look no different than accessing a local machine's resources. Every authorized computer would have access to enormous processing power and storage capacity.
Though the concept isn't new, it's also not yet perfected. Computer scientists, programmers and engineers are still working on creating, establishing and implementing standards and protocols. Right now, many existing grid computer systems rely on proprietary software and tools. Once people agree upon a reliable set of standards and protocols, it will be easier and more efficient for organizations to adopt the grid computing model.

Grid Computing Overview

Grid computing systems work on the principle of pooled resources. Let's say you and a couple of friends decide to go on a camping trip. You own a large tent, so you've volunteered to share it with the others. One of your friends offers to bring food and another says he'll drive the whole group up in his SUV. Once on the trip, the three of you share your knowledge and skills to make the trip fun and comfortable. If you had made the trip on your own, you would need more time to assemble the resources you'd need and you probably would have had to work a lot harder on the trip itself.
A grid computing system uses that same concept: share the load across multiple computers to complete tasks more efficiently and quickly. Before going too much further, let's take a quick look at a computer's resources:
  • Central processing unit (CPU): A CPU is a microprocessor that performs mathematical operations and directs data to different memory locations. Computers can have more than one CPU.
  • Memory: In general, a computer's memory is a kind of temporary electronic storage. Memory keeps relevant data close at hand for the microprocessor. Without memory, the microprocessor would have to search and retrieve data from a more permanent storage device such as a hard disk drive.
  • Storage: In grid computing terms, storage refers to permanent data storage devices like hard disk drives or databases.
Normally, a computer can only operate within the limitations of its own resources. There's an upper limit to how fast it can complete an operation or how much information it can store. Most computers are upgradeable, which means it's possible to add more power or capacity to a single computer, but that's still just an incremental increase in performance.
Grid computing systems link computer resources together in a way that lets someone use one computer to access and leverage the collected power of all the computers in the system. To the individual user, it's as if the user's computer has transformed into a supercomputer.

Grid Computing Lexicon

Reading about grid computing can get very confusing if you don't know the lingo. Here's a quick rundown on some of the terms you might encounter when discussing grid computing:
  • Cluster: A group of networked computers sharing the same set of resources.
  • Extensible Markup Language (XML): A computer language that describes other data and is readable by computers. Control nodes (a node is any device connected to a network that can transmit, receive and reroute data) rely on XML languages like the Web Services Description Language (WSDL). The information in these languages tells the control node how to handle data and applications.
  • Hubs: A point within a network where various devices connect with one another.
  • Integrated Development Environment (IDE): The tools and facilities computer programmers need to create applications for a platform. The term for an application testing ground is sandbox.
  • Interoperability: The ability for software to operate within completely different environments. For example, a computer network might include both PCs and Macintosh computers. Without interoperable software, these computers wouldn't be able to work together because of their different operating systems and architecture.
  • Open standards: A technique of creating publically available standards. Unlike proprietary standards, which can belong exclusively to a single entity, anyone can adopt and use an open standard. Applications based on the same open standards are easier to integrate than ones built on different proprietary standards.
  • Parallel processing: Using multiple CPUs to solve a single computational problem. This is closely related to shared computing, which leverages untapped resources on a network to achieve a task.
  • Platform: The foundation upon which developers can create applications. A platform can be an operating system, a computer's architecture, a computer language or even a Web site.
  • Server farm: A cluster of servers used to perform tasks too complex for a single server.
  • Server virtualization: A technique in which a software application divides a single physical server into multiple exclusive server platforms (the virtual servers). Each virtual server can run its own operating system independently of the other virtual servers. The operating systems don't have to be the same system -- in other words, a single machine could have a virtual server acting as a Linux server and another one running a Windows platform. It works because most of the time, servers aren't running anywhere close to full capacity. Grid computing systems need lots of servers to handle various tasks and virtual servers help cut down on hardware costs.
  • Service: In grid computing, a service is any software system that allows computers to interact with one another over a network.
  • Simple Object Access Protocol (SOAP): A set of rules for exchanging messages written in XML across a network. Microsoft is responsible for developing the protocol.
  • State: In the IT world, a state is any kind of persistent data. It's information that continues to exist in some form even after being used in an application. For example, when you select books to go into an Amazon.com shopping cart, the information is stateful -- Amazon keeps track of your selection as you browse other areas of the Web site. Stateful services make it possible to create applications that have multiple steps but rely on the same core data.
  • Transience: The ability to activate or deactivate a service across a network without affecting other operations.                           

Sharing Resources

Several companies and organizations are working together to create a standardized set of rules called protocols to make it easier to set up grid computing environments. It's possible to create a grid computing system right now and several already exist. But what's missing is an agreed-upon approach. That means that two different grid computing systems may not be compatible with one another, because each is working with a unique set of protocols and tools.
In general, a grid computing system requires:
  • At least one computer, usually a server, which handles all the administrative duties for the system. Many people refer to this kind of computer as a control node. Other application and Web servers (both physical and virtual) provide specific services to the system.
  • A network of computers running special grid computing network software. These computers act both as a point of interface for the user and as the resources the system will tap into for different applications. Grid computing systems can either include several computers of the same make running on the same operating system (called a homogeneous system) or a hodgepodge of different computers running on every operating system imaginable (a heterogeneous system). The network can be anything from a hardwired system where every computer connects to the system with physical wires to an open system where computers connect with each other over the Internet.
  • A collection of computer software called middleware. The purpose of middleware is to allow different computers to run a process or application across the entire network of machines. Middleware is the workhorse of the grid computing system. Without it, communication across the system would be impossible. Like software in general, there's no single format for middleware.
If middleware is the workhorse of the grid computing system, the control node is the dispatcher. The control node must prioritize and schedule tasks across the network. It's the control node's job to determine what resources each task will be able to access. The control node must also monitor the system to make sure that it doesn't become overloaded. It's also important that each user connected to the network doesn't experience a drop in his or her computer's performance. A grid computing system should tap into unused computer resources without impacting everything else.
The potential for grid computing applications is limitless, providing everyone agrees on standardized protocols and tools. That's because without a standard format, third-party developers -- independent programmers who want to create applications on the grid computing platform -- often lack the ability to create applications that work on different systems. While it's possible to make different versions of the same application for different systems, it's time consuming and many developers don't want to do the same work twice. A standardized set of protocols means that developers could concentrate on one format while creating applications.

Concerns about Grid Computing

Whenever you link two or more computers together, you have to prepare yourself for certain questions. How do you keep personal information private? How do you protect the system from malicious hackers? How do you control who can access the system and use its resources? How do you make sure the user doesn't tie up all the system's resources?
The short answer to this question is middleware. There's nothing inherent in a grid computing system that can answer these questions. The emerging protocols for grid computing systems are designed to make it easier for developers to create applications and to facilitate communication between computers.
The most prevalent technique computer engineers use to protect data is encryption. To encrypt data is to encode it so that only someone possessing the appropriate key can decode the data and access it. Ironically, a hacker could conceivably create a grid computing system for the purpose of cracking encrypted information. Because encryption techniques use complicated to encode data, it would take a normal computer several years to crack a code (which usually involves finding the two largest prime divisors of an incredibly large number). With a powerful enough grid computing system, a hacker might find a way to reduce the time it takes to decipher encrypted data.
It's hard to protect a system from hackers, particularly if the system relies on open standards. Every computer in a grid computing system has to have specific software to be able to connect and interact with the system as a whole -- computers don't know how to do it on their own. If the computer system's software is proprietary, it might be harder (but not impossible) for a hacker to access the system.
In most grid computing systems, only certain users are authorized to access the full capabilities of the network. Otherwise, the control node would be flooded with processing requests and nothing would happen (a situation called deadlock in the IT business). It's also important to limit access for security purposes. For that reason, most systems have authorization and authentication protocols. These protocols limit network access to a select number of users. Other users are still able to access their own machines, but they can't leverage the entire network.
The middleware and control node of a grid computing system are responsible for keeping the system running smoothly. Together, they control how much access each computer has to the network's resources and vice versa. While it's important not to let any one computer dominate the network, it's just as important not to let network applications take up all the resources of any one computer. If the system robs users of computing resources, it's not an efficient system.

Grid Computing Applications

There are several grid computing systems, though most of them only fit part of the definition of a true grid computing system. Academic and research organization projects account for many of the systems currently in operation. These systems take advantage of unused computer processing power. The most accurate term for such a network is a shared computing system.
The Search for Extraterrestrial Intelligence (SETI) project is one of the earliest grid computing systems to gain popular attention. The mission of the SETI project is to analyze data gathered by radio telescopes in search of evidence for intelligent alien communications. There's far too much information for a single computer to analyze effectively. The SETI project created a program called SETI@home, which networks computers together to form a virtual supercomputer instead.
A similar program is the Folding@home project administered by the Pande Group, a nonprofit institution in Stanford University's chemistry department. The Pande Group is studying proteins. The research includes the way proteins take certain shapes, called folds, and how that relates to what proteins do. Scientists believe that protein "misfolding" could be the cause of diseases like Parkinson's or Alzheimer's. It's possible that by studying proteins, the Pande Group may discover new ways to treat or even cure these diseases.
There are dozens of similar active grid computing projects. Many of these projects aren't persistent, which means that once the respective project's goals are met, the system will dissolve. In some cases, a new, related project could take the place of the completed one.
While each of these projects has its own unique features, in general, the process of participation is the same. A user interested in participating downloads an application from the respective project's Web site. After installation, the application contacts the respective project's control node. The control node sends a chunk of data to the user's computer for analysis. The software analyzes the data, powered by untapped CPU resources. The project's software has a very low resource priority -- if the user needs to activate a program that requires a lot of processing power, the project software shuts down temporarily. Once CPU usage returns to normal, the software begins analyzing data again.
Eventually, the user's computer will complete the requested data analysis. At that time, the project software sends the data back to the control node, which relays it to the proper database. Then the control node sends a new chunk of data to the user's computer, and the cycle repeats itself. If the project attracts enough users, it can complete ambitious goals in a relatively short time span.
As grid computing systems' sophistication increases, we'll see more organizations and corporations create versatile networks. There may even come a day when corporations internetwork with other companies. In that environment, computational problems that seem impossible now may be reduced to a project that lasts a few hours. We'll have to wait and see.

No comments:

Post a Comment