The Semantic Web is used to get the machine readable descriptions to the information already on the web, in order to improve search and data usage on the global scale. It is an efficient way of representing data on the World Wide Web, or as a globally linked database which is distributed in the form of RDF (Resource Description Framework) graph based model A standard for metadata encoding as well as encoding other knowledge on Semantic Web is RDF ("Resource Description Framework"). Computer Applications in the Semantic Web uses in a distributed and decentralized method structured information spread throughout the current web. A RDF query language known as SPARQL, defines standard query language and data access protocol which is used with RDF data model. SPARQL works for every data source which can be mapped to RDF. RDF has great usage in semantic web architecture since RDF uses SPARQL to retrieve data so SPARQL also holds great role in semantic web. This paper focuses on SPARQL. Its role and usage in Semantic web with examples and its comparison with SQL along with various tools used to execute SPARQL.
Key Words: Semantic web, RDF, SPARQL, TWINKLE, Jena ARQ.
Introduction
Semantic Web
Semantic web is a concept given by Sir Tim Berners-Lee which consists of several components or technologies like URI, XML, RDF, RDF Schema, SPARQL, OWL, Unifying Logic, Proof, Trust, User interface and applications and various tools to support them. They may be organized in the convenient way and are interlinked together.
The Semantic Web is an initiative that aims to enable data from different sources to be combined in a consistent way. It is particularly useful when the schemas and terminologies of different data sets are to be merged and they differ between organizations or change over time. Semantic Web technologies have been successfully applied to data integration. Resource Description Framework (RDF) is a simple graph-based data model for representing information on the Web and SPARQL is the proposed standard for querying RDF and both are the component of layered Semantic Web Architecture.[1]
RDF
RDF is used to describe the resources which are available on the web and also identify the relationship between them. It is a general purpose Language for representing the web metadata. The main purpose of RDF is to represent the semantics (Meaning) and reasoning about the web metadata.
Rupal
Harsh
27
:frnd
:age
:frnd
28
:ageRDF provides common structures that can be used for interoperable XML data exchange.[3] The various principles designed by W3C followed by RDF are interoperability, extensibility, evolution and decentralization. The model for RDF was designed to have a simple data model, with a formal semantics and provable inference, with an extensible URI-based vocabulary.[9] This model allowed anyone to enquire about any resources. In the RDF data model, data is to be stored in a universal format i.e. anything that can have a universal resource identifier (URI) can be stored in RDF format RDF data consists of a set of triples of the form (s, p, o), where s is called the subject, p is called the predicate and o is called the object of the triple.[5]
Predicate
Subject
Object
Figure : RDF Triple
Figure : Example RDF data graph [4]
In the above example the RDF data is represented as subject , Predicate , object respectively as "Rupal" (subject) :age (predicate) is 27 (object). Similarly "Rupal" has :frnd (friend) "Harsh" in which "Rupal" is subject , :frnd is predicate and "Harsh" is object .
The rdf data for person "Rupal" in the above example is-
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
>
<foaf:Person>
<foaf:name>Rupal</foaf:name>
<foaf:age>27</foaf:age>
<foaf:frnd>Harsh</foaf:frnd>
</foaf:Person>
</rdf:RDF>
Accessing RDF data with SPARQL: The data model representation of Semantic Web is RDF, so it is required to have a language for accessing meaningful data from RDF. One such language is SPARQL query language as recommended by w3c which fetches data from RDF for applications like annotation etc. SPARQL is a query language having very much similarity with SQL constructs [5].The Main part of the SPARQL query is the triple Pattern basically based on Graph Model.[10] Relational Algebra is also has a great role for representation [6] as well as for optimization.
Example of SPARQL
PREFIX foaf : <http : //xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE { ?x foaf:name ?name .
?y foaf:mbox ?mbox . }
Namespace prefix is defined by the first line, and last two lines uses prefix to express a RDF graph which is to be matched with Identifiers beginning with '?' identify variables. In this query, the predicates foaf:name and foaf:mbox participating in the triples are searched for the resource '?x' and want the subjects of these triples.
Filter construct can be used to add constraints for values, In addition to state graph to be matched. An example of string value restriction is FILTER regex (? mbox,"company") that specifies regular expression query. We can also apply the filter on integer values as an example FILTER (? price <20) that specifies that '?price' must be less than 20.[2]
Construct similarity between SQL and SPARQL: - The construct of SPARQL is based on the standard query language of Relational model and having the similarity with the SQL Processing and syntax.
Query syntax similarity of SQL and SPARQL:-
SQL
SPARQL
SELECT column_list FROM relations_list WHERE condiitons ORDER BY column name
SELECT Some_variable_list
FROM
<some _RDF_source_URL >
WHERE
{
{ some_triple_pattern Another_triple_pattern FILTER ( Condition)
}
}
ORDER BY Variable_name
Sno
SQL
SPARQL
1
SQL works on Relational Database Model.
SPARQL works with RDF graph based Model.
2
Relational Data Model stores data in structured Form.
RDF data is stored in unstructured form.
3
Access data from Table.
Access data from RDF data file which is in the form of XML. Generally in the form of RDF triple (subject, Predicate, Object).
4
Most convenient and powerful language and is used very mostly with various powerful languages to fetch data.
Basically is recommended by W3C to querying RDF data and also for querying Web Ontology so is very helpful in various semantic web applications. Illustrations and Comparison: As SPARQL is a query language looks much like SQL. The illustrations of some of the query as given below first the query is written in SQL then the same query is represented in SPARQL. So it is helpful to understand and compare the general semantic of SPARQL along with SQL.
Sno
SQL example
SPARQL examples
1
SELECT salary
FROM employees
WHERE emp_id = 'E1234'
SELECT ?sal
WHERE { emps:E1234 HR:salary ?sal . }
2
SELECT emp_id, salary
FROM employees
SELECT ?id, ?sal
WHERE { ?id HR:salary ?sal }
3
SELECT hire_date
FROM employees
WHERE salary >= 2000
SELECT ?hdate
WHERE { ?id HR:salary ?sal .
?id HR:hire_date ?hdate .
FILTER ?sal >= 2000 }
4
SELECT v.hire_date
FROM emp_vars AS v, emp_consts AS c
WHERE v.salary >= 12345
AND v.emp_id = c.emp_id
SELECT ?hdate
WHERE { ?id HR:salary ?sal .
?id HR:hire_date ?hdate .
FILTER ?sal >= 12345 }
( Here two Prefix URI's used to represent the different RDF data sources)
SPARQL Query Execution Tools:-
There are various tools available to execute the SPARQL for RDF data in semantic web. TWINKLE, ARQ processor for JENA , REDLAND etc are some of the tools which are available as an open source. In this paper we have analyzed "TWINKLE" and "JENA with ARQ" Tools to execute the SPARQL on them.
To execute the queries on these tools the preliminary requirement is to have RDF data file. JENA with ARQ Processor is simply a command line interface where as a GUI interface which wraps ARQ SPARQL query engine is TWINKLE.
Query Execution through Jena with ARQ processor:
Download the jena2.6.4 Unzip the contain folder.
Step1- To execute SPARQL through JENA you should have jdk1.5 or higher version of java.
Step2- Set environment variable with the name JENAROOT and give the path of jena2.6.4 folder.
Step3- Open the command prompt and reach contained folder of jena-2.6.4 where a bat folder exist.
Example :- d:\Jena-2.6.4 \bat>
Step4- Now to execute the sparql query following syntax is used and is execute with the help of sparql.bat file-
d:\Jena-2.6.4 \bat>sparql.bat --data < RDF file path> --query <SPARQL query file path>
Query Execution through Twinkle tool:
Download the Twinkle2.0. Unzip the contain folder.
Step1- To Start Twinkle Tool you should have jdk1.5 or higher version of java.
Step2- Open the command prompt and reach contained folder of twinkle.
Example :- d:\sparql\twinkle-2.0-bin>
Step3- Now execute the following command -
d:\sparql\twinkle-2.0-bin>java -jar twinkle.jar
Now one GUI "Twinkle:SPARQL Tools" window will open where SPARQL query can be executed.
Sample Sparql query on the foaf.rdf data
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?name ?mbox_sha1sum
WHERE {
?x rdf:type foaf:Person .
?x foaf:mbox_sha1sum ?mbox_sha1sum .
?x foaf:name ?name
}
order by ?name
Output of the query :-
Another SPARQL example:
The below SPARQL query filters the data of those who have "gmail" account and "R" character somewhere in their name.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox_sha1sum
WHERE {
?x rdf:type foaf:Person .
?x foaf:mbox_sha1sum ?mbox_sha1sum .
?x foaf:name ?name .
FILTER regex( ?mbox_sha1sum, "@gmail")
FILTER regex( ?name,"R")
}
order by ?name
output:-
Using ASK:-
ASK gives the Boolean result for "Do you have any matches?". Below example gives the result in yes if we have any match with the name "S.K.Malik", otherwise no.
Analysis:- Querying SPARQL query using Twinkle tool is more convenient than with Jena ARQ processor. Twinkle Tool has following features which are not in Jena ARQ or if exists than they are complicated to use.
Features
Graphical user interface which is very user friendly.
We can easily load, edit and save SPARQL queries
We can also add PREFIX statements into queries easily.
Custom namespaces can be configured so that they can be inserted quickly into the queries.
Can cancel long running queries
Can save results to file for further reference.
Have Text and table Display both of result.
Can Query local files and remote RDF documents both.
Can Query RDF data held in relational databases
It has the capability to query online SPARQL endpoints like govtrack, dbpedia and reyvu.com.
Query using the ARQ extended syntax, or standard SPARQL which supports COUNT etc.
Can use ARQ Property and Extension functions.[7]
Can also apply inferencing like Jena rules, RDF Schema, OWL ontology.
For quick access it can configure the commonly used data sources.
Conclusion and Future Work:- This paper presented a brief comparison of SPARQL with SQL, along with various tools to execute SPARQL. Also shows the preliminaries and steps to execute SPARQL query on these tools. It will helpful for the peoples those are the beginners in semantic web and want to study SPARQL, and can be easily accomplished with the comparative study with SQL constructs. It also gives the Twinkle tool advantage over other tools along with the complete steps to execute the SPARQL. As SPARQL access RDF data which is the data representation of web. So it can be a file with heavy data. Some methodologies which can give some optimization concept for SPARQL query. As today mostly all things are very easily access though web so it is a heavy data on web, and for efficiency optimization is the basic requirement.