Diagnostic Agent Based Inter-Process Communication Aware Monitoring System for Wireless Sensor Networks

Process failures are instigated by underlying errors and faults in various layers of WSN (Wireless Sensor Network) communication protocol stack. Therefore, efficient and effective monitoring systems for fault detection and diagnosis are imperative for fault tolerance and robust operation of WSN to meet critical application requirements for reliability and throughput. Existing detection-diagnosis regimen are either centralized or distributed and network monitoring is performed passively or actively. This work presents a diagnostic agent based inter-process communication aware monitoring system for WSNs. Diagnostic agent actively performs probe-based process execution tracking and examines the effects of errors, omissions and channel misbehavior on process execution at node, link and network levels to implement failure detection and fault diagnosis. Such diagnosis is performed through the inference of inter-process communication of stacked and peer layer processes on sender and receiver side. The monitoring system has been implemented in Castalia simulator for WSN. Local diagnostic agent is implemented on sensor nodes for self-monitoring and network wide fault diagnosis is performed by global diagnostic agent on cluster head. Simulation results show that the system performs robust root cause analysis of critical process failures due to errors in stacked and peer layer processes. The decentralized distribution of diagnostic load on sensor nodes and cluster head produces lesser communication overhead and is energy efficient.


INTRODUCTION
develop protocols and techniques for robust and reliable operation. Autonomous deployment of WSN in unattended and hostile environment results in high frequency of failures due to underlying errors and faults [1]. Fault is an erroneous state of a hardware or software component. Such faults manifest as errors. An error characterizes an incorrect system state that may lead to failure causing aberrations from normal system behavior Generally, WSN monitoring is carried out either actively or passively [2]. In active monitoring, debugging agent on each sensor node periodically collects node status updates and transmits to sink for fault detection and diagnosis [3]. Moreover, the impact of vertical and horizontal propagation of a process failure on interrelated processes has not been examined in detail. This work presents a diagnostic agent based inter-process communication aware monitoring system for WSNs that actively performs probe based analysis of process execution. The rest of the paper is organized as such.
In section 2, we review existing network monitoring schemes for fault diagnosis in WSNs. Section 3 presents the proposed system architecture and working in detail. In section 4, the simulation details and results obtained in Castalia simulator [5] are discussed. Section 5 concludes with discussion on research findings.

MONITORING SYSTEM
The monitoring system performs probe based investigation of anomalous behavior of WSN communication protocol stack processes. The system identifies typical processes that run on the protocol stack.

For practical considerations, AODV (Adhoc On Demand
Distance Vector Routing) [17] and IEEE 802.15.4 MAC and PHY layer processes [18] have been selected. The process flows of these protocols serve as a foundation for the system architecture.

System Architecture
The system defines LDA (Local Diagnostic Agent) for node self-diagnosis as shown in Fig

Failure Detector
LDA periodically sends marker probe to collect markers by traversing corresponding PESS as shown in Fig. 2.
The failure detector module parses probe results to decode error markers that may be representing process failures. It accumulates error count and generates PECs

Stacked Process Correlation Model
Inter-process correlations for stacked processes are based  If critical process failure is successfully diagnosed locally, a fault report containing primary root causes is generated.
Otherwise, the failure cause may be external. The external causes are un-observable on this node; accordingly, the situation is considered as a peer layer process failure. In this scenario, partial diagnosis results are stored in the fault report and sent to CH for in-depth investigation. For each peer routing process failure, similar reports are generated and transmitted to CH.

Global Diagnostic Agent
The global diagnostic agent on CH contains Report |Parser and Fault Diagnosis modules. The Report Parser module collects incoming fault reports after each probe interval.
The Fault Diagnosis module performs root cause analysis of critical process failures according to peer process correlation model.

Peer Process Correlation Model
The inter-process communication of peer layer processes is represented in the form of to/from calls. Therefore, interprocess correlations are extracted from these calls and a peer process correlation model is proposed as shown in

Performance Evaluation of LDA
The probe interval has been varied for performance evaluation of LDA. The outputs of failure detector and priority tests modules are examined. The impact of stacked and peer layer processes on critical failure is investigated to evaluate inter-process communication aware fault diagnosis.

Impact of Probe Interval
The     Therefore, on unidirectional links RREQ processing fails due to black listed RREQ source error. Subsequently, RREQ processing error is also inferred as potential root cause of routing failure (Fig. 11).

Discussion and Comparison
The additional diagnosis traffic overhead produced by the proposed monitoring system is compared with Sympathy [6] that is designed to collect all necessary node metrices for root cause analysis at sink. However in the proposed monitoring system, diagnosis communication with CH only takes place if LDA deduces external failure source. This decentralized distribution of diagnosistic work load is energy efficient and generates less overhead. As, wireless communication process incurs more energy than computation. In comparison, Sympathy [6] produces 30% overall diagnosis overhead due to periodic transmission of node metrices even in case of no network exception. To compare overhead, ECDFs have been selected. In Fig. 13, ratio of the diagnosis traffic to the overall network traffic is represented on x-axis, and ECDF on y-axis. The proposed system is compared with varying the probe interval against different metric periods of Sympathy. As shown in Fig.13, Sympathy is significantly outperformed by the proposed system as LDA transmits fault reports to CH on need basis only reducing diagnosis communication overhead.

CONCLUSION
This work presents a diagnostic agent based inter-process