In their textbook Modern information retrieval, Baeza-Yates and Riveriro-Neto state that Information retrieval deals with the representation, storage, organisation of and access to information items. The representation and organisation of information should provide the user with easy access to the information in which he is interested. In other words, IR is the technique and process of searching, retrieving and interpreting information from sources of stored data.
Information was mainly stored in libraries in the form of books or documents. From 1961 libraries started cataloguing and indexing their contents and with the arrival of computers started building databases containing bibliographic details of documents. In these modern days however, IR has evolved from text retrieval systems to multimedia IR that also includes audio, images and video information (Chowdhury, 2004).
Technological developments changed everything. The increase in productivity and efficiency means that a lot more of information is produced. IR services advanced to mobile search, thus allowing users to search for information while travelling. Nowadays, all organisations store and retrieve data as part of their practice. Academic institutions store and provide access to an increasing number of electronic resources to their students as part of their education; chemists draw upon previous research and generate new patents (Schofield, 2000) and the USA Department of Defence uses DAMIR as a web-based interface through which to present information about missions or projects it manages.
It is a fact that IR is part of our everyday lives. This has made an impact on our society and affected both our private lives and our professional lives. It has changed the way we handle and assimilate information and integrate it into our lives and our jobs. New professions have been created, new products are being developed and the technology advances exponentially. All these because of the evolution of IR.
Overview of the major issues
One of the most noticeable effects of IR is the improvement in technology. Innovative hardware and high performance systems are being implemented and deployed to cope with the increasing demand for computer power in order to run more efficient systems. An article in Market Wire in 2009 reported that a "Mitrionics Inc announced a computer platform that achieved a twenty-times increase in performance and ninety percent power saving over traditional processors when running standard application algorithms for Information Retrieval and document filtering". Furthermore, PR Newswire reported in a recent article that the National Science Foundation awarded millions to fourteen universities for Cloud computing search. These universities are using cloud computing to investigate and develop new ideas that require vast processing power and will impact not only on the information retrieval community but on the industry as well.
Communications is another sector that IR has had an effect on. The information need for the latest news has spawned new methods of global communication with the use of the media and the Internet. People communicate on public spaces, away from commercialism and share thoughts and experiences freely, while the broadcasting companies transmit news across the globe aiding global trade (Day, 2004). It is estimated that due to the increase in global trade and the businesses needs for relevant information and reduction of costs of translating foreign language documents there will be at least a 26% increase in the translation industry by 2014 (Information Technology Newsweekly, 2010). Thus new integrated solutions are provided to tackle foreign language discovery needs in business, in order to smooth operations. In addition, with the combination of the WWW and IR we have witnessed the outburst in e-commerce in recent years which Reedy and Schullo define in their book Electronic Marketing (2004) as "a system of online shopping and information retrieval accessed through networks of personal computers".
The positive outcomes of IR are reflected among the health sector as well. Consultations can be made on complex or unusual clinical cases by studying data from previous cases; research on tools in the area of molecular biology and genetics; identification of health materials to support patient education; even legal actions related to health care institutions through the discovery of relevant biomedical evidence by health care professionals (JMED, 2005). Geer reports that databases and retrieval and analysis tools grow exponentially, as well as the frequency of their usage based on the increase in the number of papers cited (Geer, 2006). As a result, this new domains and tools are used as sources for new biological discoveries, like for example an ontology-based mining system.
Competitive Intelligence (CI) is the discipline of researching and assessing competitive firms and finding ways to surpass them. By monitoring the external environment of a firm, collected information is transformed into exploitable intelligence that influences the decision-making process (Murphy, 2005). Applications can be found in many fields, for example the military, business and the academic field. In his book Business at the speed of thought (2002); Bill Gates outlines the advantages of CI against other firms through various real life examples and suggests that using digital information to innovate your product in the market can drastically improve your competitive position. Another example is an Ontology-based mining system for competitive intelligence in Neuroscience that crawls automatically through documents about neuroscience, classifies them into different categories, annotates them within the controlled vocabularies, indexes them and then maps them into the designed ontology. The significance of the system is the integration of ontologies in the CI system leading to an insightful understanding of the information need (Jiao et al, 2007).
The efficiency of literature access to electronic databases that index thousands of scientific journals has had a big impact in the scientific productivity of individuals in educational institutions. An increase in the number of scientific publications in recent years has been observed. In particular, between 2001 and 2003 there has been an increase in publications by 53.4 percent for Turkey, 34 percent for China and 26.87 percent for South Korea (Kirlidog et al, 2007). The availability of resources has transformed the research field in educational institutions making it easier to acquire and process information on a global scale. As a result research has escalated into a chain reaction, producing more results that will be the basis for further research and so on.
The sharp increase in music downloading over CD sales has shifted the music industry away from physical media formats towards online products and services. Distribution has become easier and has sparked an increase on the amount of new music available. This has established music as one of the most popular type of online information with large collections available on the WWW on streaming and downloads services. This has caused the replacement of the traditional methods for identifying music such as record stores or listening to radio broadcasts to personalised services, such as online communities, social networking sites or personalised radio stations. By interacting with other users of similar tastes, the users can identify interesting tracks that have already been indexed by their social circle. The issue of the user not knowing specifically the information he is seeking in music has been addressed and is being helped in identifying music. If the user hears a song in a public place and has no idea on the track information, he can now identify the particular recording by taking a sample with his mobile phone using the Shazam system. The case where the user happens to have the melody in his head but no other information has also been addressed by the online music service Nayio which attempts to recognise a track by singing a query. Both services allow the user to purchase the retrieved recordings (Casey et al, 2007).
On the other hand, IR has had a negative effect in some cases. The amount of information that is available and is received is overwhelming and sometimes forms obstacles instead of helping satisfy the user's information need. By making databases available online the efficiency and effective discovery of resource and knowledge has become impending, but the increasing volume of information makes it harder to obtain useful information from the WWW. New and different media are constantly introduced to the Internet that distress structure and create inconsistencies. As the volume of unstructured data increases, it makes retrieval more complicated due to the inability to store and manage the data (Montebello, 1998). For example, consider the case of a medical image database that may contain millions of images to be classified using certain facets and the health care practitioner has to jumble through the classification to obtain any significant images (Binshan, 2000). Another example is managers that are required to make informed decisions about software development projects but require more information than what is currently available through the standard information mediums. They are subsequently overwhelmed with information and in order to deal with information overload they hasten their decision-making process thus decreasing decision quality (Robin et al, 2007).
Summary and Conclusions
The fundamental role of IR in the industrial and scientific fields brings both opportunities and challenges. The supply of resources of information provides effective ways to link new ideas to older and provides many opportunities for research and product design. In a press release, Outsell reports that libraries are shifting their spending from content to new technologies and investing into digitizing collections and developing retrieval systems.
At the same time, the accelerated development of new resources seems to dissociate the user's ability to keep up in fast moving fields. Sometimes, the information is too much to process, so the user conforms to what he already knows and retrieve whatever he thinks he needs and proceeds, unaware of any advanced techniques or resources for his information need.
A more recent issue in IR is the personalisation of web pages and web searches in order to provide more personalised and relevant results. Google's constant need to monitor and record the user's web history and content conflicts with privacy issues. The same applies with Facebook, who at their recent F8 conference announced new functionalities that are promised to help users connect to "higher quality content". By agreeing to connect your Facebook profile to the webpage you are currently browsing, you get a feed of what your friends or other people liked on the webpage. Of course this is meant to provide a service to the user; however it affects primarily his privacy.
To make optimal use of IR, more efficient methods should be implemented to store and index data. More intelligent search engines could retrieve more relevant data. Finally, a consistency in the query format amid the different search engines on the WWW could improve usability.