Using Google has become an everyday thing for people across the world. Although there are many other search engines, none of which can compare to the endlessly expanding Google. However, Google is not just a search engine. Google provides the world with a wide range of other free products including Google Earth, Chrome, Picasa, Gmail, Blogger, YouTube, Adsense, Analytics, and Droid to just name a few. While over 300 million people use these Google products every day, few know how it all works. Google uses multiple data centers, custom software, and web tools to run its multibillion dollar business. Without these modern technologies and engineers, Google and the rest of the world wouldn't be where they are today.
Data Centers
A data center is a large building which sometimes houses thousands of servers to store data. The data is transmitted using state of the art technology over fiber optic cables at fast speeds. Not only do these data centers store and process data, they also provide redundancy to ensure no data is lost. Google stores all of its information in their enormous secretive data centers throughout the world.
Google's Data Center Locations
Rich Miller (2008) stated that Google stores its information in an estimated 35 different data centers. Some of which include California, Germany, Netherlands, Italy, Hong Kong, and Russia. It is assumed that many of the international locations are used to store country-specific versions of Google's search engine. Over the past years, Google has been focusing on expanding overseas as much as possible. While expanding the company is important, planning and making smart investments for the future are a necessity. To do this, Google has a number of qualifications they demand in order to find a location of a future data center. Rich Miller (2008) states that some of these qualifications include:
The availability of large volumes of cheap electricity.
An area to support carbon neutrality.
The presence of large amounts of water for cooling.
Distances to other Google data centers.
Tax incentives from the state
Large-Scale Transmission
Some of these data centers are over 300,000 square feet, and have been estimated to cost about $600 million each to build. With such a large amount of servers there is also a large amount of energy needed. Rich Miller (2008) explains that each of Google's data centers are supported with between 50 and 103 megawatts of electric power. Erick Schonfeld (2008) stated that in 2007, these data centers were processing about 20,000 terabytes a day which give Google an advantage over other competitors without such large-scale capabilities.
Google's Custom Software
In order to provide the quality services that people expect, Google needed to develop its own software designed specifically for its needs. Thanks to their own helpful Google Labs, they have been able to develop the software that they needed. Some of the main software used behind the scenes at Google, include the Google Files System (GFS), and BigTable.
The Google File System
"GFS is a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients" (Google Labs 2003). This file system runs on Linux OS's and organizes huge files in order to provide for their much needed scalability. The GFS was created to be an autonomic computing system ["A concept in which computers are able to diagnose problems and solve them in real time without the need for human intervention" (Jonathan Strickland)] which also works across a huge network. Jonathan Strickland explains that files used by the GFS are typically multiple gigabytes in size which would normally take up a lot of a networks bandwidth. In order to fix this problem, the GFS breaks files up into chunks of 64 megabytes.
Google takes advantage of this file system by organizing them into clusters. Each of these clusters can contain thousands of servers which include clients, master servers, and chunkservers. Clients make file requests, master servers keep track of logs and the chunkserver's file inventories, and the chunkservers store the 64 megabyte files in a redundant array by replicating each file to 3 different servers.
BigTable
Because of Google's vast data network, they need to process data and store it in easily accessed databases. However, other standard 3rd party databases were not suitable to Google's needs. In order to accompany their needs, Google designed an in house database named BigTable in 2004. Google (2004) states that BigTable was built atop of their Google File System and other services, this saves them from buying licenses for every machine. Andrew (2005) states that each cell in the database has a time version to compare changes later on. In order to manage huge tables, the tables are split by 100-200 MB each. Each machine stores about 100 of these tables using the GFS. This enables fast access to multiple machines if one is busy or fast rebuilding if one machine goes down.
Google's Web Tools
Google has designed some of the best web tools to decide what web pages are relevant to your search, ways to advertise in a subtle but effective manner, and rank web pages based on their content. The tools that they have developed to do these jobs include PageRank, Adsense, and Adwords. Many other companies like Google use similar tools, so Google must constantly update their web tools to stay above the competition.
Page Rank
"PageRank is one of the most important algorithms ever developed for the web" (Vitaly Friedman, 2007). Google designed PageRank in 1998; however the patent is assigned to Stanford University. This "PageRank" is one of the many methods Google uses to display the search results people see. Vitaly Friedman (2007) states that PageRank works by gathering links on a webpage and deciding their relevance and quality compared to the linking site. The higher the number, the better the rank of the site has. To get this number, Google PR runs an extensive algorithm that changes over time to determine its official rank. The basic part of the PR algorithm consists of this:
PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn))
As you can see, learning how the Google PR system works could be quite an unbelievable task which has caused many companies to specialize in Search Engine Optimization (SEO) in order to provide customers with web pages that score a high page rank. Although these companies specialize in SEO, web designers need to consider how their webpage is coded, what keywords they use, and what content they provide in order to obtain a high spot on a search result.
Advertising
Millions of people throughout the world use Google's free services every day and probably wonder where they get their "$6.67 billion dollars a year in profit" (Alexei Oreskovic, 2010) from. Manoj Jasra (2009) states that the answer is advertising, which equals 97% of Google's revenue. Google provides multiple tools to manage advertisements for web site owners looking to show other's ads and pay to display their own ads throughout the internet. Two of the main tools are Adwords, and Adsense which both provide efficient way of managing advertisements.
Google's Adwords Tool. Adwords is the main tool used for web site owners to display their ads on Google. Mark Sweeney (2009) states that this service/tool allows companies to buy specific keywords that trigger an ad to appear when a user enters that keyword in a Google search. "Adwords works on an auction based system that assigns particular words to whichever company bids the most for them" (Mark Sweeney, 2009). Google is able to monitor the popularity of each ad to determine its relevance to the webpage that is resides on.
Google's Adsense Tool. Google created Adsense for web site owners that wanted to display advertisements, called publishers, to make money. HubPages (2008) explains that Google provides the publisher with ads to display and takes about 50% of the profit earned. Google spiders a web page displaying ads to determine what ads to use according to content relevance. Unlike many other search engines, and ad providers that use simple keywords, Google is able to determine a website's relevance by reading significant text in the content of each webpage. This provides accurate ads according to the websites actual content, instead of popular keywords that have been entered by the web designer.