Integrated Traceability Approach for an Effective Impact Analysis

Change is inevitable, software undergoes continuous change during its life cycle. A small change can trigger high evolution because of the ripple effect identified during the activity of impact analysis. However, it depends on the traceability information, which is the connection between software development artifacts. The current traceability techniques lack the breadth and depth to carryout informative impact analysis. We have performed a detailed literature survey of traceability techniques from the year 2008-2018. These techniques are evaluated on the criteria for effective impact analysis present in the literature. The results highlight that no single technique fulfills the criteria for effective impact analysis alone, they can be combined together to achieve promising results. We have presented a hybrid approach that combines four traceability techniques to achieve the entire criteria for an effective impact analysis after careful evaluation. The techniques combined are: Information Retrieval, Pre-Requirement Specification Traceability, Value based Requirements Traceability Technique and Goal Centric Traceability Technique. Our proposed hybrid approach is empirically validated via a field experiment. Results are analyzed for time and effort utilized in maintaining and retrieving the traceability information. The results are promising as the hybrid approach achieves effective impact analysis within minimal time and effort. We plan to extend the validation to real world impact analysis situation via case study.


INTRODUCTION
he success of software depends on meeting the needs of the stakeholders. These needs manifest in form of software requirements. Requirement Engineering (RE) process identifies and manages all the stakeholders' requirements. A proper RE process helps in removing the risks of incomplete requirements, lack of stakeholder's involvement, during maintenance phase. The software suffers from continuous change due to unclear user requirements, changing technology and laws etc., but incorporating changes without observing its effects on other parts of system and artifacts can lead to poor estimates, rework, delay and even project failure [3,4]. Change may seem to be small but its effect may impact critical areas of the system. Therefore, to perform better resource estimates and maintenance leading to quality product, a mechanism has been introduced to understand modifications and its influence called change impact analysis [5]. It is present in maintenance procedures of software development and is a core activity of requirements change management [6]. It is used to determine the ripple effect of continuously changing requirements during development and maintenance phase. It also determines and estimates the affected units in software after the proposed change is made. It also helps to identify the dependency among the critical functionality, initiate regression testing and estimate the cost of change.
An effective impact analysis requires careful evaluation of the impact of the proposed change to determine the complexities and consequences. This can only be achieved when traceability information is maintained during software development [6]. Traceability is defined as "the ability to describe and follow the life of a requirement in both forward and backward direction (i.e., from its origins, through its development and specification to its, subsequent deployment and use, and through periods of on-going refinement and iteration in any of these phases)".
The necessary information of traceability information required for an effective impact analysis is discussed in detail.
Traceability information is categorized into pre and post traceability, vertical and horizontal traceability and traceability of functional and non-functional requirements [7,8]. A traceability approach that provides all these kinds of traceability can be used for effective impact analysis. Some other important dimensions to traceability are syntactic or semantic traceability and ability to trace manually automatically or semi-automatically [9]. Semantic traceability is attained by giving meaningful labels to links while Structural traceability is achieved by arranging links into a hierarchical structure [10,11]. Manual, automatic or semi-automatic traceability represents different levels of tools support.
There are many requirement traceability techniques and tools present in literature [12][13][14][15][16][17]  The practical benefits that are gained from the individual use of these techniques are restricted. Industry performs very limited form of traceability which is usually supported by requirement management tools. An evaluation of these techniques is performed to determine their usefulness for effective impact analysis [6,8] till 2008. We have extended the evaluation to include literature from 2008-2018 and have proposed a hybrid approach to traceability based on the evaluation.

RELATED WORK
A detailed literature survey of hybrid or integrated traceability techniques is performed, used for impact analysis. The review highlights that the selection of the traceability techniques is not based on any specific criteria. None of the hybrid approaches provide complete traceability and are not evaluated for the time and effort required in maintaining traceability. The focus of this research is on an integrated approach; therefore, we have presented techniques which are a combination of two or more traceability techniques given in Table 1. PROMESIR, a hybrid approach is a combination of Latent Semantic Indexing (LSI) and Scenario-Based Probabilistic Ranking (SPR) to locate features in basic code via static and dynamic analyses respectively [32]. The results are quite acceptable and prove that PROMESIR performs better than these techniques implemented individually. However, the approach only supports functional requirements and can only be applied to source code. The rationale as to why latent semantic indexing and scenario based probabilistic ranking is used is not given. PROMESIR gives fast and accurate results with help of a computation intensive technology. The approach (PROMESIR) has no tool support.
TraCS [33] [35]. IR is a technique used to determine the similarities between texts while JRipples is a tool to determine incremental changes in source code only. This approach is only applied to enhance post requirements traceability of functional requirements. It does not support pre-RS traceability and non-functional requirements.
Three well known approaches namely Regular Expression (RE), Key Phrases (KP) and Clustering (CL) to Vector Space Model (VSM) and IR are integrated to achieve traceability [36]. These approaches when used in a combination have shown positive results and removed some of the limitations of VSM. It also does not provide any support for NFR and pre-RS traceability. A framework (TLFRT) [8] using Pre-RS, Value Based Requirement Traceability (VBRT) and Goal Centric Traceability (GCT) has been proposed for performing pre and post requirements traceability of functional and non-functional requirements. Requirement Traceability Matrix (RTM) is used for maintaining traceability links, though it suffers from scalability problem. Techniques were selected on the basis of empirical evidence found in literature. Table 1 presents the pros and cons of the Trustrace Trace functional, non-functional requirements.
It does not perform pre requirements traceability. Traceability techniques have been chosen without evaluation. [34] Traceability Link Graph(TLG) This approach was only applied to enhance post requirements traceability of functional requirements.
It can be applied to source code only not documentation. It does not support pre-requirements traceability and non-functional requirements. [35] VSM+RE+KP+CL It removed some of the limitations of VSM rather than applying VSM alone. Provide support for functional requirements.
It does not provide any support for NFR and Pre requirements traceability. [36] Three Level Framework for Requirements Traceability (TLFRT) Give encouragement for functional and non-functional requirements. Perform pre and post requirements traceability.
RTM is used for maintaining links which has scalability problem and evaluation of techniques are not performed properly for effective impact analysis. [8] hybrid approaches found in literature. Table 1 is motivation enough for a new hybrid approach.

TECHNIQUES
The traceability techniques have been evaluated individually according to the criteria of impact analysis from multiple perspectives [6]. Each technique has its pros and cons presented in Table 2. The evaluation helped us in selecting techniques for making a hybrid approach. Some of the traceability techniques have already been evaluated [6] according to criteria for effective impact analysis. We have applied the same criteria to the set of traceability techniques surveyed from 2008-2018. The evaluation highlights the gaps, the attributes which are required for effective impact analysis but missing in the surveyed techniques. We have merged different aspects of these traceability techniques to come up with a hybrid solution that provides complete impact analysis. The approach is also empirically validated for time and effort required for saving traceability information. The proposed hybrid approach is found to take less time and effort than each individual technique. It is composed of Pre-RS, IR, VBRT and GCT. The hybrid approach is selected after proper evaluation of all the techniques available in literature for an effective impact analysis. Evaluation criteria has been selected carefully which full fills the criteria for multiple perspective impact analysis [6]. The criterion is chosen based on its significance and importance for impact analysis activity during software development. The importance of each parameter of evaluation criteria is explained in detail in the original study [6]. Detailed evaluation of all the traceability techniques present in literature from 1997-2018 is presented in Table 2. The symbols are to be interpreted as, X indicates no support, √ indicates full support and O indicates partial support whereasmeans it is not specified in literature.
All   The hybrid approach proposed in this research combines Pre-RS for pre requirements traceability, IR and VBRT for valuable functional requirements and GCT for non-functional requirements. Therefore, full traceability is achieved by combining different techniques in a manner that only valuable traceability links are saved. Identifying the valuable links helps in reducing the time and cost required to save and maintain the traceability information.

HYBRID APPROACH
Requirements traceability is one of the most important features for attaining complete and effective impact analysis. Complete requirements traceability is one of the emerging trends of requirements engineering process. The software industry tries to avoid implementing traceability due to many factors e.g. limited time, high cost, extra effort required to save traceability information, lack of knowledge, lack of tool support, lack of appropriate training etc. [7,[37][38][39]. Many traceability techniques have been introduced in literature but none of them provide complete requirement traceability for effective impact analysis. We aim to combine the already existing traceability techniques instead of making new one to achieve full traceability and effective impact analysis. We have integrated the techniques in such a manner to reduce the effort and time required to save and maintain the traceability information. Fig. 1 presents the detailed design of the sequence of the traceability techniques in hybrid approach for stepwise covering end to end traceability. Hybrid approach is a semi-automatic approach in which we integrate traceability techniques for fulfilling multiple perspective impact analysis criteria [6]. It is the combination of different traceability techniques to cover the impact of change on different requirements with in less time, cost and effort. Each technique has its pros and cons so we combine all the positive aspects of each technique to fulfill the whole criteria of impact analysis. The hybrid approach is divided into two segments. The functional and nonfunctional requirements between source and SRS document are traced by pre-RS tracing in the first segment whereas in second segment functional requirements are traced by using information retrieval technique from SRS. Traceability information of nonfunctional requirements is extracted with help of Goal Centric traceability. The Valuable post traceability links of functional requirements and non-functional requirements are extracted using VBRT and GCT respectively. Finally, we are able to get the complete set of traceability links of functional and nonfunctional requirements.

TOOL SUPPORT
Pre-RS is the only technique which supports pre requirements tracing back to their source. The functional and non-functional requirements have been determined manually and stored in an XML file. Documentation is also saved in a simple Microsoft Word file as SRS (Software Requirement Specification) document. The Pre-RS is not supported by any tool.
The documentation of pre-requirements traceability is followed by IR and VBRT for performing post requirements traceability. IR automatically generates traceability links among different artifacts and reduces effort [8]. Automated generation of traceability links via tool support helps in reducing time as compared to manual tracing [40]. VBRT is used to prioritize traceability links to reduce cost, time and effort by tracing valuable links only [12]. For this purpose, a requirements management tool called ReqSimile is used which supports traceability. ReqSimile is a general purpose tool that supports impact analysis and change management during software evolution and also provides support for identification of traceability links. The stakeholders manually give priorities to different requirements and store valuable requirements for reducing time and effort [12]. It directly extracts requirements from SRS document in Microsoft Word format and stores links in database file using Microsoft Access.
Finally, GCT is used for modeling of non-functional requirements identified in SRS document. The experimental results show that GCT supports and manages the effects of change on NFR (Non-functional Requirements). GCT is supported by many software tools. StarUML is used for sketching different nonfunctional requirements. This tool is used after extracting nonfunctional requirements by using IR. Users had to convert NFRs in the form of goals and further divide it into sub goals by using StarUML.

EXPERIMENT
Science and engineering contains different tools, approaches and techniques for process validation. Survey, simulations, field experiments, controlled experiments and case studies are the most commonly used techniques for validating any process [41]. Field experiment and controlled experiment are commonly used to evaluate processes or techniques. We have chosen field experiment to validate this research by checking the claims of hybrid traceability approach. We are also interested to identify the impact of the proposed hybrid approach in real life setting in terms of time and effort. The time and effort taken by the hybrid approach will be compared to the most common way of traceability practiced by the industry, which is requirement traceability matrices [8], therefore, we have compared the hybrid approach with a RTM approach to evaluate its effectiveness.
The experimental process performed to validate hybrid approach is described in Fig. 2. A brief summary is presented in form of experimental tasks which includes sequence of tasks in shown in Table 3.

Fig. 2. Overview of Experimental Design
We chose RTM for comparison with our proposed approach because it is commonly used in industry. It is very cheap in terms of cost and does not require much training, can easily be implemented with help of Microsoft excel sheets. The alternate H (Hypothesis) as well as the NH (Null Hypothesis) is given below.

H1:
The proposed HA decreases the time used for saving and maintaining traceability than RTM. NH1: The proposed HA has no effect on the time used for saving and maintaining traceability than RTM while providing complete traceability. H2: The proposed HA decreases the effort used for saving and maintaining traceability than RTM while providing complete traceability. NH2: The proposed HA has no effect on the effort used for saving and maintaining traceability than RTM while providing complete traceability. Each group traced Nonfunctional requirements from SRS document. 6 Save links of nonfunctional requirements.
Non-functional requirements were diagrammatically represented by both groups. 7 Save calculated time.
Total time for completing all tasks is measured. 8 Monitor and control change If modifications in stakeholder prioritization happens then sub task "prioritize functional requirements" will be perform but if Any modifications happens in Pre-requirements then whole set of tasks will be perform again.
The first group consisted of four members who were selected for achieving experimental goals. The roles (requirement engineer, system developer, project manager and customer) were assigned to the four members. Initially the group was assigned the tasks of achieving impact analysis criteria using the proposed hybrid approach. The tasks were performed and experimental data collected. Afterwards same group performed the experimental tasks for achieving impact analysis criteria without the hybrid approach. RTM was used for saving traceability links of functional and non-functional requirements. Two projects of same nature and size were chosen and given to the group one by one. Both projects were medium in size i.e. M1 (medium sized project =>100<300 functional points size according to International Software Benchmarking Standards Group (ISBSG)) [42]. This was done to make sure that the results do not vary because of the size and nature of the project. Data collected from each group during experiment execution was checked qualitatively and quantitatively. To evaluate the impact of hybrid approach we performed an experiment which showed its effect in real life situations i.e. calculating time and effort difference between group with and without using the proposed hybrid approach. The experimental tasks performed are presented in sequence in Table 3.

Independent Variable:
The hybrid approach is the independent variable.

Dependent Variables:
The time and effort used by the hybrid approach to save and maintain traceability information is the dependent variable.

QUANTITATIVE AND QUALITATIVE ANALYSIS
The experimental results have been evaluated by a group using the hybrid approach and without using the hybrid approach. The qualitative analysis and data collection was performed by interviews and group discussions of every recorded task. The quantitative analysis of reported results was evaluated on the basis of difference by using T-Test and cross checking.

T-Test:
The statistical test is used to analyze the results differences between real conditions and changing results due to testing time fluctuations [43].
Commonly two types of t-test are used dependent and independent mean t-test. We used dependent mean ttest (paired t-test) for our data analysis, as group is performing tasks using hybrid approach; same group is performing same tasks without using hybrid approach and they used RTM.
T-test was performed to equate time and effort needed for executing tasks for effective impact analysis. The experimental group performed the tasks assigned using the proposed hybrid approach and then the same group performed experimental tasks via RTM approach.
Means is the simply calculated by taking average of values while standard deviation is used to calculate the dispersal of set of values [44]. Table 4 the acceptance and rejection of hypothesis depends upon the value of p. P-value is the probability value of the result and assuming that null hypothesis is true. α (Alpha) represent the significant level used to calculate the probability of rejecting null hypothesis. If the value of p is greater than alpha, it shows that the null hypothesis is accepted, however if the value of p is less than alpha then the null hypothesis is rejected [45]. For checking whether both the Null hypotheses are accepted or rejected, we examine the value of p. in both case time and effort the value of p is less than alpha so our alternate hypotheses (H) are accepted and null hypotheses (NH) are rejected.

H1:
The proposed HA decreases the time used forsaving and maintaining traceability than RTM while providing complete traceability. H2: The proposed HA decreases the effort used for saving and maintaining traceability than RTM while providing complete traceability. Figs. 3-4 show clear distinction between data recorded from group performing tasks before and after using HA. The results are defined as "There was a major difference in the time to complete tasks by group using hybrid approach as compared to group without using Hybrid approach. Cross Checking: Cross checking method was used to quantitatively analyze the retrieved data.

Results in
In Figs. 5-6, results were calculated by comparing the difference of time and effort for completing each experimental task using hybrid approach and without using hybrid approach. The above results show that the Hybrid traceability approach is effective for covering the effective impact analysis criteria. Table 5 supports the hypothesis that Hybrid approach will save effort and time while achieving the whole criteria of impact analysis. Hybrid approach saves 3 hours and 15 minutes for covering the criteria of impact analysis and saves 13 man/hour efforts. It means that if we add four men to group without HA they will perform with same time and with same effort as done by group with HA. Selected group completed the experimental tasks, once with using hybrid approach and the other without using the hybrid approach. Group discussions were allowed among the members for generating reports after every task. Researchers critically reviewed those reports to compare the results both with and without using hybrid approach. The participants found the hybrid approach effective, however recommended usage of a single tool for performing impact analysis. They used different tools for completing all tasks during experiment execution. However, a lot of time is required to manually save and maintain traceability links with RTM; therefore, it is not a good option for impact analysis.

CONCLUSION
This approach facilitates the software developing organizations to manage end to end traceability in less time and effort. By combining the good points of Pre-RS, IR, VBRT and GCT techniques, we were able to get better results and effectively fulfill the impact analysis criteria within less time and effort. From literature study we also came to know that IR decrease the scalability problem of RTM, however RTM is inexpensive and feasible for those projects in which scalability is not a constraint. IR technique helped in reduction of time and effort than using other techniques. Time and effort were further optimized with help of VBRT by discarding the un-important traces and separating valuable traces. GCT covers nonfunctional requirements and represents them in form of soft goals and operationalization. The whole process used in the hybrid approach does not require any special expertise for managing traceability information. Different tools are used for evaluating hybrid approach in real life setting. Hybrid approach was analyzed by conducting a field experiment on a group of 4 individuals.

FUTURE WORK
This hybrid approach has been analyzed and validated by an experiment. It can be statistically validated in more software companies for evaluating its usefulness in real scenarios. New researchers can implement this approach as a tool, and perform fully automatic traceability, since currently a semi-automatic tool support was used. The tools which were currently used were general purpose. A customized tool support will improve the approach. The approach can also be implemented as simulator, where actual embedded functions of this approach can be represented in the form of simulations by using MATLAB. This will help companies to understand and implement the approach in an effective manner.

ACKNOWLEDGEMENT
Authors are thankful to International Islamic University, Islamabad, Pakistan, for providing support to conduct this research. The participants of experiment are sent an expression of gratitude for their continued support to our research.