New knowledge is produced at a continuously increasing speed, and the list of papers, databases and other knowledge sources that a researcher in the life sciences needs to cope with is actually turning into a problem rather than an asset.
Pharmaceutical companies want to minimize costly late-stage attrition by identifying and eliminating drugs that do not have desirable safety profiles or sufficient efficacy as early on as possible. The need for effective data integration has become stronger as the cost of drug discovery and development has soared to over $1 billion [5]. Data commonly originates in different departments in which varying terminologies are used. Further, the data itself is very heterogeneous in nature, and consists of data types that include electronic patient records, chemical structures, biological sequences, images, biological pathways, and scientific papers.
Understanding the spectrum of biology influenced by any given therapeutic target is key to successful drug discovery and repositioning. Drug repurposing can provide a fast and low-risk return on investment, so long as it meets a number of criteria, such as addressing unmet medical needs [6]. It is an option that any pharmaceutical company can consider, regardless of its size. A key potential benefit of these technologies is that the emphasis is focused on the data. Drug repurposing (Therapeutic Switching) has been gaining importance in the last few years as an increasing number of drug development and pharmaceutical companies see their drug pipelines drying up and realize that many previously promising New Chemical Entities (NCEs) have failed to deliver [7]. The drug repurposing approach has the potential to identify new first-in-class mechanisms to treat disease, while at the same time, avoiding some of the challenges associated with the development of a new chemical entity. One interesting facet of the repurposing approach is that the discovery of new targets can be parlayed directly into the creation of NCEs that further enhance the new mechanism or target activity. Here the benefit is starting with a drug-like template that is known to interact with a new disease-relevant target. Today's drug repurposing companies are fueled by different drivers, including the increasing supply and virtual availability of novel targets, the emergence of Linked Data within Semantic web technologies.
There is much interesting information about drugs that is available on the Web. The sources of data range from impacts of the drugs on gene expression, through to the results of clinical trials. The figure1.shows part of the data sets that have been published and their interlinking paths so far, within the Linked Data cloud.
Fig: 1- Linking Open Data cloud, grown to 4.2 billion RDF triples, Interlinked by around 142 million RDF links
A simple interface that allows anyone to put a set of practical queries for the purpose of exploring pharmaceutical data from the Linked Data cloud through an interactive report is most needed.
Semantic web technology provides a means of storing and interpreting any type of data and may be expanded and modified as more data becomes available and if the underlying science changes, allowing us to more rapidly respond to the changing needs [8]. The aim of the present work is to demonstrate how semantic web technologies can be used to reveal the relationships of various drugs / genes / proteins targets and receptor types from manually annotated facts from published sources, semantically find links and facilitate forming hypotheses for potential drug repurposing. Further an attempt has been made to develop few prototype demonstrations to showcase how semantic explorations of relevant datasets can suggest or help bring about a potential drug repurposing.
Objectives:
To build a prototype demonstration with a simple Sparql query interface that allows researchers to put a set of specific queries for the purpose of exploring drug related information on the Linked Open Drug Data cloud through an interactive report.
To perform virtual screens for identifying / verifying and optimizing previously unrecognized connections & interactions between a drug, target, and disease.
To identify target interactions based on the sequence similarities for a broad range of new and unexpected existing drug activities (both on- and off-target effects).
To elucidate new biological information about a specific therapeutic area for identifying therapeutic targets.
To perform targeted non-hypothesis based queries and to navigate through the relationships with an integrated view of all of their drug data.
Investigation undertaken:
Linked open drug data (LODD) (Fig-2) developed by the World Wide Web (http://esw.w3.org/HCLSIG/LODD/Data) Constrodium (W3C), USA was used in this study.
Fig:2 - Linked Open Drug Data
The following databases were taken up for Drug Repurposing with Semantic Web Technologies from the Linked Open Drug Data.
Drug Bank - provides drug (i.e., chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e., sequence, structure, and pathway) information. The size of the database is 766,920 triples; 4,800 drugs, 2,500 protein sequences.
SIDER - contains information on marketed drugs and their adverse effects, and the size is 192,515 triples; 1,737 genes
Dailymed - provides information about approved prescription drugs, includes FDA approved labels, and the size is 164,276 triples; 4,039 drugs
Diseasome - describes characteristics of disorders and disease genes linked by known disorder-gene associations and the size is 91,182 triples; 2,600 genes
LinckedCT - Linked data source of trials fromClinicalTrials.gov and the size is 7 million triples, 62000 trials
DBpedia - deals with Drugs/ Diseases/ Proteins. RDF data about 2.49 million things that has been extracted from Wikipedia and the size is 218 million RDF triples; 2,300 drugs, 2,200 proteins
First by employing the SPARQL query language the retrieval of data from the selected databases was done. By using virtual discovery screening of the data to from the database was done to identify any possibility for the drug repurposing (Fig-3). Then, by Agnostic Screening and End-point virtual screening was done to discover new uses for both on - and off - target effects of existing drug. By focusing on a validated end point of interest, we can identify existing drugs that produce an unanticipated, yet desired, phenotypic result. By casting the broadest possible net, the agnostic screening approach has the potential to generate novel scientific discoveries.
Fig: 3 - Semantic Web based Search Engine, executes SPARQL queries over the Linked Life Data, developed in the Department of Bioinformatics, Bharathiar University.
Then applying specific biological insight derived from semantic analysis of Linked Data about a target's role in a new indication was done followed by phenotypic screening of the compounds with the right desired properties are identified and developed. Finally by employing "known compound - new target" approach and "known target - new indication" approach, the drug repurposing was done with semantic web technology.
Conclusion:
Based on the results of this study, it can be concluded that there is a widespread belief that current models of drug discovery and development need revamping and reinvention in order to make pharmaceutical R&D more predictable, reliable, and less costly. The present research envision a novel approach to this challenge that involves profound changes with semantic web technologies in the way post - marketing surveillance data are exploited for drug discovery and development including drug repurposing. This approach capitalizes on recent advances in molecular medicine, human genomics, and information technology, as well as an increasingly sophisticated public eager for solutions to their unmet medical needs.