Abstract. In this paper, a novel strategy for eliminating redundancy in the data dissemination process is proposed. A cluster based infrastructure is considered where the entire geographical area of interest is divided into grid based clusters having a representative member in each of the cluster. The output from the sensors is made to pass through a context aware system to ensure the validity of the sensor data. Redundancy in the above valid data can be of two types - Intra cluster and Inter Cluster. Based on the type of redundancy, redundancy elimination is performed at header nodes or at intermediate nodes thus facilitating efficient transmission of data from source node to access node.
1 Introduction
In wireless sensor networks, sensors are employed in the area of interest to gather data or to monitor the environment. Sensors can be deployed in a random fashion or can be placed in strategic locations. The data collected by each node is forwarded to the sink (destination) through a process known as "Data Dissemination". However the sensors in wireless sensor networks are battery powered due to which it has limited lifetime. So the data dissemination policy has to be framed in such a way that the energy consumption is optimal.
2 Related Work
The entire geographical area is divided into grid based clusters[1]. Each cluster has a header node which acts as a representative node of that cluster (Fig 1). The header node is responsible for the location update of the mobile sink groups. The header node in each cluster is elected using backward timer. The access node (sink) queries the source node for data. The data dissemination process from source node to access node (sink) is exploited using header to header forwarding[2].
Whenever an event occurs in the monitoring environment, the sensor node detects it and forwards the relevant data to the sink node for processing and to take the necessary actions[3][4]. Eq.,1 below shows the relationship between sensor lifetime and battery power as, lifetime is directly proportional to the battery power remaining.
Fig 1. Cluster based virtual infrastructure
Lifetime of a α Battery power of the
sensor node sensor node
Eq.,1
Battery power is consumed when data is transmitted. Hence by Eq.2, more the amount of data transmitted, more the power consumed. More the power consumed, less the life time of the sensor nodes.
Amount of data α Amount of power transmitted consumed
Eq.,2
So the data dissemination process has to be efficient in order to reduce the amount of data transmitted and increase the lifetime of the sensor nodes.
3 Proposed Work
In the above virtual infrastructure when a member sink queries two or more source node for data there is a high probability of data redundancy in data dissemination process[8]. Hence if redundant data are to be transmitted again and again, the amount of data transmitted increases which ultimately results in high consumption of power and bandwidth.
Types of redundancy in the above infrastructure
Intra-Cluster redundancy: When the source nodes lie in the same cluster.
Inter-Cluster redundancy: When the source nodes lie in adjacent or neighboring clusters.
Due to the above condition the lifetime of the sensor node decreases reducing the reliability of the network.
Redundancy elimination
The main objective of the project is to increase the lifetime of the sensor by framing an efficient data dissemination policy which involves identifying redundant data and eliminating the same before the data reaches the sink node.
3.2.1 Eliminating Intra-Cluster redundancy
When the source nodes lie in the same cluster the data to be transmitted is compressed at the source nodes. The compressed data is then forwarded to the header node where redundancy elimination takes place (Fig 2). Then the data dissemination is brought about by header to header forwarding until the data reaches the access node (Sink).
Fig 2. Intra- Cluster Redundancy Elimination
Eliminating Inter-Cluster Redundancy
When the source nodes lie in adjacent clusters the shortest path from header node to access node for each source node is found. These paths tend to converge at a header node present in some other cluster. Elimination of redundant data is performed at this header node and compression at their respective source header (Fig 3).
Fig 3. Inter-Cluster Redundancy Elimination
3.3. Module description
The proposed architecture of Redundancy Eliminated Data Dissemination consists of the following modules,
Cluster Formation
Dynamic Topology Management
Header Node Election
Data Compression
Context Aware System
Finding Shortest Node Converging Path
Redundancy Elimination
3.4. Architecture of REDD
Fig 4. System Architecture
Each node is deployed in the area of interest to observe the environment around and must belong to a cluster defined by its geographic axes. The nodes that fall within a particular cluster form a group under a representative node called Header Node. This is done by Cluster Formation and Header Election modules. Since the sensor nodes are mobile, the position of each sensor node keeps on changing. So the sensor node associates itself with a particular cluster from time to time with the help of Dynamic Topology Management module.
The job of detecting events nearby is done by the sensors at the physical level. These events are then combined together and stored locally. The sink node queries for data of interest from the source nodes through the Querying module. The query is then forwarded to the access nodes by Header to Header forwarding [9]. For this forwarding, the Shortest Path and the Shortest Converging Path are found by the sink. Based on the destination of the sinks, the corresponding path is used. While replying, data is compressed at the cluster head. The compressed output from the sensor is made to pass through a Context Aware System to ensure reliability of the sensor data. Redundancy in the valid data is eliminated either at the header node or at some intermediate junction node which is mentioned in the query by the sink node. The redundancy eliminated reply from the access nodes is then sent to the sink by Header to Header forwarding in the same way as the query.
On receiving the data, the Data Processing module of the sink node processes the data and performs the necessary action. Finally, the performance of the REDD protocol is analyzed graphically and the results are tabulated
4 Implementation Issues
4.1 Algorithm: Cluster Formation
Form_Cluster(Node temp)
xp  getxPosition();
yp  getyPosition();
RFIDnode  getRFID(xp , yp);
xc  ( xp / Gsize ) * Gsize + Gsize /2 ;
yc  ( yp / Gsize ) * Gsize + Gsize /2 ;
RFIDheader  getRFID(xc , yc);
if (RFIDheader == NULL)
setHeader(RFIDnode);
else
sendMsg(RFIDheader);
wait();
if(receiveMsg())
setHeader(RFIDheader);
joinMember(RFIDnode);
endif
endif
end
Note:
1. When a node is created it contains the following information
RFID - Unique identity of the node
Location information (X,Y co-ordinates)
2. To find the grid to which the node belongs following formula is used
xc  ( xp / Gsize ) * Gsize + Gsize /2 ;
yc  ( yp / Gsize ) * Gsize + Gsize /2 ;
where xc, yc are the co-ordinates of header and Gsize is the grid size
The above algorithm gets the position of the node from the GPS device, calculates the grid center and joins the grid header if any. If there is no header yet in the grid, it becomes the header node.
Algorithm: Location Update (Dynamic Topology Management)
Update_Location(Node node)
xold  getxPosition();
yold  getyPosition();
RFIDnode  getRFID(xold , yold);
xoc  ( xold / Gsize ) * Gsize + Gsize /2 ;
yoc  ( yold / Gsize ) * Gsize + Gsize /2 ;
RFIDcurrentheader  getRFID(xoc , yoc);
xnew  setxPosition();
ynew  setyPosition();
xnc  ( xnew / Gsize ) * Gsize + Gsize /2 ;
10. ync  ( ynew / Gsize ) * Gsize + Gsize /2 ;
11. RFIDnewheader  getRFID(xnc , ync);
12. if (RFIDnewheader == RFIDcurrentheader )
13. return;
14. else if (RFIDnewheader != RFIDcurrentheader)
15. sendMsg(RFIDnewheader)
16. wait();
if(receiveMsg())
setHeader(RFIDnewheader);
joinMember(RFIDnode);
endif
21. endif
end
Note:
1. If a node moves within in its grid there is no change in topology.
2. If a member node of one grid moves to another it registers itself with the header of the new grid.
Whenever a node moves, it calculates its new grid center and checks with the old center. If they are not the same, it joins under the new header node and unregisters from the old header. If it is a header itself it initiates election process in the old grid.
Algorithm: Header Node Election
Elect_Header()
Initialize node[i]battery  100 for all I;
if (i%5==0)
GRID(i/5)/Gsize,(i/5)%Gsize)headernode[i/5*5];
end if
if (onBatteryChange())
if(Cur_HeaderBattery<100&&Cur_HeaderBattery >50)
continue;
else
Old_Header  Cur_Header;
Cur_Header  (Cur_Header.Members[] ïƒ next);
Cur_Header.Members[].Join  Old_Header;
end if
end if
end
If the current header has moved out of the grid or if its battery power has reduced below the threshold, it starts the election process and notifies the first node among its members. That node, if it has enough power, becomes the new header and advertises its election as the new header to other member nodes.
4.4 Algorithm: Sensor Data Forwarding
Forward(Graph, Source)
for each vertex v in Graph:
dist[v] := infinity ;
previous[v] := undefined ;
end for ;
dist[source] := 0 ;
Q := the set of all nodes in Graph ;
while Q is not empty:
u := vertex in Q with smallest dist[] ;
if dist[u] = infinity:
break ;
endif ;
remove u from Q ;
for each neighbor v of u:
alt := dist[u] + dist_between(u, v) ;
if alt < dist[v]:
dist[v] := alt ;
previous[v] := u ;
endif ;
end for ;
end while ;
return dist[] ;
end
When a node wants to send data to some other node, it first calculates the shortest path to the destination graphically and then forwards the data to the next header node in the path along with information regarding the path by Header to Header forwarding [9]. The next header node on receiving the data forwards it to the next down lower in the order and so on.
Thus, all the above algorithms are used in the implementation of the REDD architecture.
Context Aware System
In wireless sensor networks, the reliability is ensured by making use of redundant data. But this is facilitated at the cost of excessive power consumption. Since we are going to remove the redundancy, we have to ensure that the unique data has to be a reliable one [11][13]. To ensure that we use a context aware system as shown in Fig 5. The context aware architecture consists of the following modules
Sensor Output - Contains raw data about temperature and pressure in byte stream.
Context Interpreter - Extracts the context (temperature | Pressure ) from the raw sensor data.
Context Retriever - Retrieves the context that has to be compared with a set of rules.
Rule Engine - Provides the set of rules as described in Table 1.
Fig 5. System Architecture
S.No
Condition
Action
1
Temperature < UTmin
Discard
2
Temperature > UTmax
Discard
3
UTmin < Temperature < UTmax
Allow
4
Pressure < UPmin
Discard
5
Pressure > UPmax
Discard
6
UPmin < Pressure < UPmax
Allow
Table 1. Initial Set of Rules
UTmin & UTmax - Minimum and Maximum allowable temperatures
UPmin & UPmax - Minimum and Maximum allowable Pressure
Java context awareness framework (JCAF) [10] is used to implement the context aware system. The compressed output from the sensor data is given as input to this module. The temperature and pressure context are extracted from the compressed output and then it is checked for validity against the rules defined in the rule engine. If the context is valid the data is forwarded to redundancy elimination module else it is suppressed.
Redundancy Elimination
One technique used to decrease the number of redundant messages transmitted and thus pro-long the network lifetime is data aggregation. REDD protocol's redundancy elimination algorithm is based on a single value called the correlation coefficient to represent the whole set of readings recorded by all the nodes in the sensor field.
The value of the correlation coefficient (H) ranges from 1 to 10. H = 1 is for strong correlation of data and as the correlation coefficient increases, the degree of correlation between data decreases. The following flowchart(Fig 5) describes the above procedure
Fig 6. Redundancy Elimination
5 Performance Analysis
The initial network deployment consists of 3x3 grids with 5 nodes in each grid respectively. The battery level of each sensor is initially set to maximum (100%). The transmission cost is equal to one unit per bit. The following table shows the specifications of simulation scenario
No of transmissions
No of valid transmissions
No of invalid transmissions
No of redundant transmissions
No of unique transmissions
100
95
5
2
93
200
193
7
6
187
300
287
13
14
273
400
375
25
27
348
500
468
32
36
432
Table 2. Simulation Scenario
The following graph (Fig 7) depicts the relationship between the number of total transmissions and number of valid transmissions. The number of invalid transmissions i.e those that doesn't satisfy the rules specified by the rule engine.
Fig 7. Total Transmissions vs Valid Transmissions
It can be seen that the valid transmissions follow a linear pattern as the number of total transmissions increase.
We can also define the number of redundant transmissions in valid transmissions. This is shown in the following graph (Fig 8).
Fig 8. Valid Transmissions vs Redundant Transmissions
We can also figure out that the number of redundant transmissions also follow a linear path as the number of valid transmissions increase. This shows that the amount of power wasted due to redundant transmissions increase as the number of valid transmissions increase.
Now considering valid transmissions vs unique transmissions we can see that the amount of power saved by eliminating redundant transmissions increases as the number of the valid transmission increases. This can be figured out from the following graph (Fig 9)
Fig 9. Valid Transmissions vs Unique Transmissions
6 Conclusion and Future Work
Thus, the REDD helps in optimizing the data dissemination policy by compressing and eliminating the redundant sensor data there by reducing the amount of data transmitted and amount of power consumed. This results in increased lifetime of the sensor nodes and improves the reliability of sensor network.