Nowadays, the World Wide Web has growing exponentially fast and many people use web search engine to find and retrieve information from the web. The development of World Wide Web consists thousands of electronic collections that often contain high quality information. The unstructured nature of the web often tends to make the web difficult to locate useful resources.
The important of the web search such as query information retrieval, ranking and filtering are still fresh topics in the field of research. The user facing issues such as correlating, presenting and integrating related information becomes important due to the increasing amount of various HTML information sources. The basic aims of this occurrence are to select the best collection of information for a particular information need.
The fast developing of the web shows that not only millions of innovations and development have been made hence many researches still deal with limited web pages on extraction of information from the web pages in different domains. Research shows that the web which adopting traditional information retrieval and storage method is lack of efficiency in term of resolution of the keyword based queries. The current search engine builds mostly based on keyword occurrence and regularity for query matching means that a word has a multiple meanings and multiple words having the same meaning.
Most web users are aware of various popular search engines such as Google, MSN and Yahoo. Because increasingly of the search results, this general purpose of search engine can no longer satisfy the needs of most users searching for specific information on a given topic. The challenges and the changes of the search engines technology need to be scaled up dramatically in order to keep up with the growing amount of information available on the web. The research indicates that Web search industry has to consider specialization widely in different search tools and techniques as well as in geographical search engines and personalized search services. (Asadi & Jamali, 2004)
As the amount of information available on the websites increases enormously, it has becomes necessary to give the user the possibility to perform searches over the information. During the 1960s, there was a series of rapid developments in technology such as the invention of the telegraph, telephone, radio and computer is the starting set of stage for the communications revolution that lined the way for the creation of the internet.
The research shows that in the year of 1957, the Russian launch a robotic spacecraft called "sputnik" or known as "satellite" to orbit the earth. In response of the sputnik launch, United States has created the Advanced Research Projects Agency (ARPA) program as part of Department of Defense in order to foster US-technology. ARPA developed primarily was to maintain control over US military missiles and bombers after a nuclear attack. Through the technological developments from ARPA and American Universities, ARPAnet - also known as the "Internet" was brought online in 1969 and become the predecessor of the modern Internet. (Lee, 2004)
In 1960, Theodor Holm Nelson created Project Xanadu with the objective to create a computer network with a simple user interface that easily accessible to ordinary people and able to solve social problems such as attribution. In 1963, the terms "Hypertext" were introduced by Nelson meaning that the hypertext is contained in hyperlinks which are known as links in today world.
After the creation of the Internet, the concept of World Wide Web is invented in 1980 by Tim Berners-Lee. He proposed a project based on the Nelson hypertext concept with the purpose to allow the public to exchange and updating information. The prototype system is named "Enquire". With the help from Robert Cailliau, Banners-Lee writes the first World Wide Web server by joining hypertext with the Internet using the ideas of Enquire system. The design of the first web server is called "Hypertext Transfer Protocol Daemon" (Httpd) and the first web site built and goes online was at the on August 1991. (Aaron, 2009)
In order to keep up with the constant expansion of the World Wide Web, the amount storage capacity and the processing speed of search engine is developing. A computer science student at McGill University in Montreal named Alan Emtage has created the first search engine in 1990. The tool used for searching on the Internet was called "Archie".
Before Archie was created, FTP (file transfer protocol) server is the only way people could find out the existence by word of mouth or by sending email telling where to retrieved information. The Archie search engine is the Internet first indexer which retrieves file names by matching a user query and also combined a script-based data gatherer with a regular expression matcher.
In addition of the Archie development, Mark McCahill from the University of Minnesota created "Gopher" indexed based on plain text documents in 1991. Soon after Gopher, two other programs, "Veronica" and "Jughead" appeared and served the same purpose as Archie. Both of the programs were used for sending files via Gopher. Veronica means Very Easy Rodent-Oriented Net-wide Index to Computerized Archives which provide keyword search of most Gopher menu titles in the entire Gopher listing whereas Jughead meaning Jonzy's Universal Gopher Hierarchy Excavation and Display was a tool for obtaining menu information from various Gopher servers.
In 1993, World Wide Web Wanderer was introduced by MIT student Mathew Gray which considered as the web's first robot. The "Wanderer" was initially used for counting active Web servers to measures the growth of the web. The bot later been upgraded to served the purpose of obtaining the actual URLs which in turns forming the first database of Web sites known as "Wandex". (Lee, 2004)
According to the research, "A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document and recursively retrieving all documents that are referenced. Web robots are also referred to as web wanderers, web crawlers or spiders. A robot simply visits sites by requesting document from them." (Martijn Koster, 1995)
In October of 1993, "ALIWEB" (Archie-Like Indexing of the Web) was launched by Martijn Koster. ALIWEB perform crawled Meta information which allows to submit user own pages to be indexed. (Aaron, 2009)
Research shows that the Web was ruled and directed highly by the search engine industry, "Excite" was the popular search engine introduced in 1993 by six Stanford University students, Joe Kraus, Ryan Mclntyre, Graham Spencer, Mark Van Haren, Ben Lutch and Martin Reinfried. The idea of this search engine is to provide more efficient searches through the large amount of information on the Internet by using statistical analysis of word relationships. Today, Excite is part of the AskJeeves Company which created by Garrett Gruener and David Warthen.
The "EINet Galaxy" was launched in January 1994 as part of the MCC Research Consortium at the University of Texas, Austin. Its purpose is similar to web directories which contain Gopher and telnet search features. (Aaron, 2009)
In April 1994, David Filo and Jerry Yang from Stanford University created "Yahoo Directory". It started as a collection of their favorite web pages. Because of the rapid number of links grew from day by day, they decided to create a better searchable directory in order to aid in data retrieval. The Yahoo entries were entered and categorized manually thus generally considered as searchable directory rather than search engine.
Later in 1994, "WebCrwaler" project was introduced by Brian Pinkerton. The aim of this project are mainly used to create a copy of all visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. The WebCrawler most important point was provide the first full-text search engine on the internet.
In the following year, "Lycos" was introduced by Dr. Michael Loren Mauldin of Carneigie Mellon University. Lycos was a large search engine which provide prefix matching and word proximity. "Infoseek" also started out in the same year by Steve Kirsh. Infoseek feature a very complex system of search modifiers which include Boolean modifiers, parentheses and quotes. The most popular feature of Infoseek was allowed webmasters to submit a page to the search index in real time. (Aaron, 2009)
December 1995, "AltaVista" was created by two researches, Louis Monier who wrote the crawler and Michael Burrows who wrote the indexer. AltaVista was the first search engine which allows natural language inquiries and advanced searching techniques. (Aaron, 2009)
May 1996, UC Barkeley professor Eric Brewer and graduate student Paul Gauthier introduced "Inktomi". These search engines are powered by the "concept induction" technology meaning it can determine which sites are most popular and productive by analyzed human experience and applies those habits to a computerized analysis such as links, usage and other patterns.
"Ask Jeeves", founded in 1996 by Garrett Gruener and David Warthen was launched as a natural language search engine. In order to match the search queries accurately, Ask Jeeves have used human editor technology. The suggestion behind Jeeves was to offer a question-answering service rather that to create a search engine or directory.
"Hotbot" owned by Terra/Lycos launched in 1996 claims that it was one of the search engines which offer the ability to search within search results. Unlike from most search engine, Hotbot permit searchers to select between a few of the popular search engines on the web and can only pulls results from one search engine at a time.
October 1996, Evan Thornley and his wife Tracey Ellery invent the "LookSmart" search directory. It was a unique, Java powered directory for personal computers usage.
November 1996, "Dogpile" was developed by Aaron Flin. A newly improved search engine includes faster loading of the home page and results pages. It's able to fetch search results from several popular search engines.
"Google" was launched in 1997 by Larry Page and Sergey Brin as part of their research project at Stanford University. Google are the largest most comprehensive human edited directory of the web. A vast, global community of volunteer editors maintained the directory and seeks to become the best directory of the web. (Aaron, 2009)
In February 1998, Overture ( was the first company to offer a pay-for-placement search service which allowing site owners to bid for placement. Those who pay more can appear at the top of results in response to a specific search. (Danny, 1998)
MSN (The Microsoft Network) launched in the third quarter of 1998 which uses search result from Inktomi. The collection of internet sites and service is provided by Microsoft as an online service and internet service provider. The search engine was updated and renamed to Windows Live Search until the brand was reborn as Bing.
May 1999, "AllTheWeb" was launched by FAST (Fast Search & Transfer) clams that it had the largest index of web pages. A few advantages were introduced such as more advanced search features, a fresher database, better search clustering and a completely customizable look.
In April 2001, computer scientists Professor Apostolos Gerasoulis and Professor Tao Yang founded "Teoma". Teoma is a search engine that looks at the web in terms of subject-specific communities. The uniqueness of Teoma is because of its link popularity algorithm which able to analyzed links in context to rank a web page importance within its specific subject.
"WiseNut" launched in September 2001 founded by Yeogirl Yun, WiseNut was a crawler based search engine which able to clustered search results automatically. (Chris, 2003)
Between the years 2000-2009, Google have launched various versions of services to help enhanced their search engine. "Google Adwords" launch in 2000 which allows businesses to place advertisements in boxes next to a search results. "Google AdSense" launch in 2003, allowing site developers to place Google controlled advertisement on their site. (Alan, 2009) "Google Sitemaps" was introduced and allows developers to submit a sitemap that outline the entire site. In the following year, Google acquire user generated video sharing network called "Youtube" which originally founded by Jawed Karim, Chad Hurley and Steve Chen. Youtube uses Adobe Flash Video technology to display wide variety of video content, music videos as well as video blogging. Since then, research indicates that Google has indexed over billions of websites and processes over 70% of the web search.
Towards the large significant changes of search engine, all the major search engines have their own limited editorial review process. Various algorithms and approach have since been added to enhanced and improve their result. Each search company has their own business objectives and technologies to define their quality perspective. In time, the structure of the web pages seems to be a good resource for which search engine can improve their results.
Webmail provide users with the ability to access their email via a web browser engine. The history of email was much older than the Internet. it was evolved from a very simple beginnings. Early email was just like placing a message in other user directory in a spot and easy to see it when user logged in.
With the development of timesharing computers in the early 1960's, many research organizations wrote programs to exchange text message among users at different terminals. These new technologies have influenced many people and extend the human medium of communications. However, these early systems were limited to be use by a group of people using one computer, as in the late 60s computers were very expensive, bulky and thus extremely rare.
The "CTSS" (Compatible Time Sharing System) were created and introduced as the first computer systems to use email service. It was developed at Massachusetts Institute of Technology (MIT) in 1961.The first email system of this type was called "MAILBOX". This system allowed multiple users to log into the IBM 7094 mainframe computer from remote dial-up terminals and to store files online on disk. (Tom, 2008)
Later in the early 1970's, E-mail was quickly extended to become a network email which allowing users to pass message between different computers. "ARPANET" (Advanced Research Projects Agency Network) emerge as the first larger network computers which lead to the development of the Internet. In accordance to the development of Internet, identifying a specific person to receive email message was much complicated.
Computer Engineer named Ray Tomlinson, invented Internet based email. He was part of a small team of programmers who help developing a timesharing system known as "TENEX". TENEX operating system deals with local email programs called "SNDMSG" and "READMAIL" which basically running on most machines on the ARPANET. SNDMSG means for sending message whereas READMAIL for receiving messages. (Dave, 2000)
SNDMSG is a program that created for time sharing systems and capable to deliver messages to another person on the same computer, it also allowed user to compose, address and sending message to other users mailbox. The disadvantage is that it could not transmit message from one machine to another. (Tom, 2008)
Sometime earlier, Ray Tomlinson worked as an ARPANET contractor for BBN (Bolt Beranek and Newman) Technologies that developed an experimental file transfer protocol (FTP) program called "CPYNET". CPYNET could send and received files on remote computers through a network connection using ARPANET backbone but did not allow users to appending any information to files.
The idea of combining SNDMSG and CYPNET came across in Tomlinson mind, Ray Tomlinson decided to write a minor hack by combining the two programs so that message can be delivered from one machine to another. By incorporated CPYNET code into SNDMSG, it has provided a way to distinguish local mail from network mail. The "commercial at" (@) symbol combine with the user host names "user@host" were chosen by Ray Tomlinson as logically to indicate that the user was "at" what computer. The very first network email message was sent between two computers, Ray Tomlinson recall that the first message most likely was the first row of letters on the keyboard consists of "QWERTYIOP". (Katie & Matthew, 1996)
Later, a number of more general mail protocols were developed. Labeling and sorting emails in folders features invented by Larry Roberts for sorting email purpose. In 1975, John Vital developed software to organize email.
Until the early 1980's, standard protocols have developed as the emails features progressed. "SMTP" (Simple Message Transfer Protocol) was created which enable sending a single message to a domain with more than one address. SMTP objective is to transfer mail reliably and efficiently. "POP" (Post Office Protocol) is another standard protocol which allowed different mail systems to work with each other. Both SMTP and POP were important configurations for email. (Ian, 2004)
The appearing of networked electronic mail (Email) has been the major innovative technological towards the modern development since the Internet first become popular. Most of the people have been using the email service since the first launched of the companies such as Yahoo, Google and Hotmail making the email become a necessity for modern world.
Efficiencies of using email online enable user to have access from any computer in any location. Most companies considered to offer the best webmail service free of charge to users as well as unlimited amount of storage. Of the many companies that provide different types of services, a brief research have been go through on "Google Mail", "Windows Live Hotmail", "Yahoo Mail", "Mail" and "Care2" webmail providers.
2.5.1 GMAIL
Google as the approach to show the world how robustness and functional that the web email could be has launched Gmail in April 2004. The Gmail project was started by computer programmer named Paul Buchheit. This project is an online system that does not require full page reloads whenever users clicked on new page. Paul was also the developer of Google AdSense in year 2003.
The creation of Gmail was initially for the employees who work at Google or were friends and family of them which then rapidly showed its progress of up to 1,000 users has opened Gmail account. Since then, the number of invitations had increased. With maximum of 50 invitations per user, more and more users have been invited to Gmail. Like the Google search engine, their free web based email service layout is clean and fast. (John, 2009)
The service, Gmail is the Google approach to chat, email and social networking which is in the course of testing the service with a handful of users. The features of Gmail application provide great ease of use, fast, understated text advertisement and free online storage of approximately up to 7469 MB (8 GB). Users can also rent additional storage of up to 20 GB or 16 TB each with different rental fees per year. Besides, Gmail provide users with bigger file attachment size approximately 25 MB per email rather than other existing service. (Elvin, 2009)
Gmail with fast and rich web interface have provided the best email organization features such as custom filters, specific email addresses or subject headings can be restricted from entering user's inbox. Black/White list which work the same as custom filters, creating custom folders, sorting emails works by allowing users to redirect incoming emails into specific folders, import address book, auto reply function, conversation view feature which able to keep track of the users conversations with various people by sorting the message into one line of inbox with a corresponding number of messages in that conversation.
Figure 2.1 Gmail Web/Email Search Engine
Figure 2.1 depict Gmail search option either searches within email storage or by website. The strength of the Gmail application service is not limited to only specific features, but also praises the search function as an excellent email service. The unique amongst other webmail systems is the search oriented features. Gmail users able to search emails by keyword, sender or topic of an entire email archive and grouped together in a clear manner.
Figure 2.2 Gmail Advanced Search Option
Figure 2.2 indicate that Gmail allow users to construct advanced search option. The search options include search for phrases, message sender, location, date and message with attachment.
Figure 2.3 Gmail Filter Option
Figure 2.3 indicate that by using filters, the flow of incoming message can be automatically label, archive, delete, star it, forward email or keep it out of spam.
"MSN Hotmail" is a free customizable web-based email service, formerly known as "Hotmail" was founded by two colleagues Jack Smith and Sabeer Bhatia in mid 1996. Due to the popularity of the usage, Hotmail was acquired by Microsoft Corporation in late 1997 and transformed as MSN Hotmail. (Prashant, 2009) Microsoft later then decided to enhance their email system based on three main concepts of being faster, simpler and safer and rebranded "MSN Hotmail" as "Windows Live Hotmail"
The development of Windows Live Hotmail was mean to replace the old programs such as Microsoft Outlook Express and Windows Mail Vista. Though, Windows Live Hotmail were updated periodically with focus on greater speed, security, increase of storage space, better user experience and advanced features usability. Windows Live Hotmail also offers the ease of customizable display not only with colors but with placement of the reading pane.
The features of Windows Live Hotmail include many extras such as Windows Live Messenger, online data storage with Skydrive and other various recreational services. Windows Live Messenger provides free online storage of up to 25 GB. Users can also upgrade to a paid version "Windows Live Hotmail Plus" which features stronger spam protections, larger attachment sizes, faster performance and additional storage up to 10 GB with $20 per year. Besides that, Windows Live Hotmail also provides users with 10 MB limit on file attachment size. (Rebecca, 2010)
Windows Live Hotmail with simple and easy to use service offers fast and universal message search included with junk email filters that can sort incoming mail to custom folders. All the incoming email will be scanned to prevent spam and phishing markers. Windows Live also includes optimized versions Windows Live Suite which provide access using various browsers and mobile devices. On the mobile version, users can check email, get updates and compose document just like the original Windows Live.
Figure 2.4 Windows Live Hotmail Web/Email Search Engine
Figure 2.4 depict Windows Live Hotmail web and email search option. The location of email search option menu is placed conveniently near the top of list of email allowing users to easily search certain types of email message, whereas the web search option is located in the last row of the query menu.
Figure 2.5 Windows Live Hotmail Advanced Search Option
Figure 2.5 indicate that Windows Live Hotmail provide user to perform advanced search option. User can choose specific type of action to retrieve more accurate results.
Figure 2.6 Windows Live Hotmail Filter and Reporting Option
Figure 2.6 indicate that the incoming email message will be protected and help reduce the junk email. Windows Live Hotmail uses SmartScreen technologies which screen email to identify and separate junk email from legitimate email. (Jonathan, 2009) The SmartScreen will learn from known spam, phishing threats and user feedback to distinguish the different characteristics from legitimate or junk email.
After the "Yahoo Directory" launched by David Filo and Jerry Yang in 1994, the growing popularity leads to the creation of "Yahoo Mail" released in 1997. Yahoo mail services are available in two versions, the "Mail Classic" and the "All-New Mail". The differences were in term of their functionality. However, the basic features such as unlimited storage are still included in both versions.
The Mail Classic is a secondary user interface for Yahoo Mail. This traditional full page scroll continues its simpler interface for people on older computers and decrease internet connections. The another newer version of Yahoo Mail is way more advanced than the Classic, it has a revised interface which contains mostly "Ajax" (Asynchronous JavaScript and XML) which act similar like "Outlook". (Wikipedia: Yahoo Mail, 2010) Both of the versions were integrated with an Instant Messenger allowing users to chat while browsing their email.
The features of Yahoo Mail provide user friendliness and easy to use email program. Integrated instant messaging, SMS texting and unlimited storage space make Yahoo Mail one of the most useful services compare to other free email services. Other great features include drag-n-drop functionality, ability to download all attachments, view image attachment as slide show and also customizable of the size of various tabs usage and panes.
The features of Yahoo Mail application with simple, easy and comprehensive for novice computer users provided free online unlimited amount of storage space and a maximum attachment file size of 25 MB per email. Beside, users can also upgrade to "Yahoo Mail Plus" with additional of 200 filters, no graphical advertisement, bogus email address (AddressGuard), no promotional taglines in message and offline POP3 access with $20 per year. (Richard, 2004)
Figure 2.7 Yahoo Mail Web/Email Search Engine
Figure 2.7 depict the Yahoo Mail web and email search option. The web sites which registered on the Yahoo search engine provide user with great efficient. Whereas the emails search provide users with recent message search functionality.
Figure 2.8 Yahoo Mail Filter Option
Figure 2.8 indicate that all the incoming email can be filter out according to users setting and keep the email free from any known spam emails. All the unwanted spam email will automatically sent to spam folder. Additionally, users still able to change that address to not spam or can report email as spam to Yahoo.
2.5.4 MAIL.COM
"" is a free email service founded by Gerald Gorman and Gary Millin in 1995. These companies not only have been offering free email service, but they have also buy domain names from other companies such as "", "", "" and "" which are just a few of the collections.
Because hundreds of domain names available in, it allow users to choose their own email address domain for free. also include with applications such as online chart web-based interface is a practical and easy to use email service. (Heinz, 2010) Many convenient features such as phishing filter, spam filter or secure email delivery also provided.
The basic feature of provide users with only 3 GB of inbox storage space and a maximum attachment file size of 16 MB per email. With a simplistic email address, it also used by many people as a forwarding account to a more robust program such as Gmail.
Figure 2.9 Web/Email Search Engine
Figure 2.9 depict the web or email based search option. The search functions are same as any other popular search engine, users able to search based on subjects or keywords in a specific category.
Figure 2.10 Filter Option
Figure 2.10 indicate that user can create filter by choosing any action based on user necessity and all the incoming known spammers include phishers will be prevented from reaching user inbox. User was allowed to report any spam which slip through filters.
"Care2" is a global community awareness network that provides free email services. Care2 is founded in 1998 by Randy Paynter with the intention by connecting individuals and organizations to do something positive for the environment. The company profits were mainly relies on advertisement sales and sponsorships thus 10 percent of its earnings will be donated to various nonprofit organizations.
Care2 provide a network for people who enjoy being involved in social awareness issues such as the environment, civil rights, animal rights, human rights and more. Care2 also provide service such as e-cards, blogs, polls and petitions. Care2 member can become involved with any groups and actions. (Elizabeth, 2007)
The feature of Care2 provide user with a simple email service that include an HTML and plain text composer chosen from simple stationary, new mail notification via ICQ, custom filters and domain names choices. Other feature also included such as network messaging and searchable directory for any interest groups. (Heinz, 2010)
Care2 provide user with only 5000 MB of inbox storage space and maximum attachment file size of 10 MB per email. However, this email service is reliable enough for user if they are seeking to join a group with a cause to make a difference in something.
Figure 2.11 Care2 Web/Care2 Directory Search Engine
Figure 2.11 depict the search option either searches by web or within Care2 directory. Care2 has a searchable index where users can search the directory of all the members of Care2 by their names or the area where they live.
Figure 2.12 Care2 Email Search Engine
Figure 2.12 depict the Care2 email based search option. Like any other popular email service, Care2 email search also allow users to search based on subjects or keywords in a specific folder.
Figure 2.13 Care2 Advanced Search Option
Figure 2.13 indicate that Care2 allow users to perform advanced search option. The search options include search for keywords, subject, message date, time, type and selection from which folders.
Figure 2.14 Care2 Filter Option
Figure 2.14 indicate that all the incoming email will be place according to user choice. Users may also create filter based on any condition to help protect against spam email.
Secure Sign-in
Spam Filter
Report Spam
Virus Scanning
Phishing Filter
Create Folders
Black/White Lists
Custom Filters
Address Book
Search Mail
Auto Reply
Plain Text Composer
Spell Check
Personal Signature
Multiple Language
Instant Messaging
Domain Choices
Email Support
Forum/Message Boards
Online Help