Tuning COCOMO-II for Software Process Improvement: A Tool Based Approach

In order to compete in the international software development market the software organizations have to adopt internationally accepted software practices i.e. standard like ISO (International Standard Organization) or CMMI (Capability Maturity Model Integration) in spite of having scarce resources and tools. The aim of this study is to develop a tool which could be used to present an actual picture of Software Process Improvement benefits in front of the software development companies. However, there are few tools available to assist in making predictions, they are too expensive and could not cover dataset that reflect the cultural behavior of organizations for software development in developing countries. In extension to our previously done research reported elsewhere for Pakistani software development organizations which has quantified benefits of SDPI (Software Development Process Improvement), this research has used sixty-two datasets from three different software development organizations against the set of metrics used in COCOMO-II (Constructive Cost Model 2000). It derived a verifiable equation for calculating ISF (Ideal Scale Factor) and tuned the COCOMO-II model to bring prediction capability for SDPI (benefit measurement classes) such as ESCP (Effort, Schedule, Cost, and Productivity). This research has contributed towards software industry by giving a reliable and low-cost mechanism for generating prediction models with high prediction accuracy. Hopefully, this study will help software organizations to use this tool not only to predict ESCP but also to predict an exact impact of SDPI.

T here exists a strong relation between the process and its outcome i.e. Product. It has been observed that the majority of organizations are not following a systematic approach due to the scarcity of resources and are required to deliver quality products to the market within short time spans. Due to the lack of process infrastructure, also been conducted to confirm the link between higher CMM levels and higher product quality. Thus, the improvement of software product quality and software productivity requires understanding of the software development process's capability i.e. the maturity of that company's software development process [3], which is known in a form of different models. These models include CMMI and ISO/IEC 15504 (a standard for software process assessment agreed by ISO and the International Electrotechnical Commission), whereas ISO/IEC 15504 is an international standard for determining process capability and the CMMI is the most popular model which is in compliant with the ISO/IEC 1504 standard [4]. The benefits of these approaches are not always natural and need more industrybased studies [5]. Recently, the authors have done research [6] in this regard, in which benefits have been empirically identified for the Pakistani working environment.
One major problem which the software industry is currently facing in SDPI adaptation is the shortage of proper tools [7][8][9]. In spite of having some models or tools available which could be directly or indirectly used for benefit measurement and prediction purpose like COSTAR, SEER, SLIM, CostXpert, KnowledgePlan, and Software Risk Master [10]. But none of the tools are cost effective and reflects cultural or environmental behavior for the developing countries. The developing countries are involved in outsourcing with the world's technologically strong countries, but they are not evident enough to gain client trust, and their adopted SDPI's effects are debatable.
Therefore, this research compensates the raised needs of developing persuasive cost and benefits justification in a form of tool for its adoption. The main contribution of this study for the body of knowledge is to present a prediction model for SDPI benefit measurement classes ESCP. This tool could be used by multiple organizations to predict the impact of different SDPI level wise ESCP values. The model comes under the category of PPM Process Performance Model [11] and also could be utilized for measuring organization's business performance measurement and prediction [12]. Fig. 1 presents overall research flow and it shows that ESCP related data has been collected from organizations which have been appraised to different SDPI levels. A questionnaire was designed and sixty-two data sets have been collected from three different CMMI appraised organizations against the set of metrics used in COCOMO-II [13]. COCOMO-II Post Architecture model is tuned to bring prediction capability for SDPI "benefit measurement classes" such as Effort, Schedule, Cost, and Productivity using ISF (Ideal Scale Factor) analysis. Model prediction accuracy has been checked using the MMRE (Mean Magnitude of Relative Error) and PRED (30) [6].
The composition of this paper is as follows: Section 1 covers introduction, Section 2 discusses literature review including problems in model selection, Section 3 discusses research methodology i.e. hypotheses statement, research question, and its investigation, data collection procedure and questionnaire analysis. Data analysis (ISF Analysis) is given in Section 4. Section 5 covers threats to validity, whereas Section 6 covers discussion of results (model accuracy with ISF results). Section 7 and 8 covers conclusion and future work.

LITERATURE REVIEW
Different SDPI approaches are in use by the software industry to improve software quality, productivity and software development capability using process assessmentbased approaches [14]. Using different assessment approaches, the process capability and maturity of an

Prediction Model Selection
Different models are in use by the industry for cost-benefit measurement and prediction. Through literature review the similarities have been determined between different software development prediction models [16][17] and SDPI benefit measurement classes like ESCP, so that these similarities could be leveraged with the SDPI benefit prediction models.
A survey based research [18] for selecting a suitable model for research purpose highlights that plenty of commercial tools were available but, the COCOMO-II model has been a primary attraction because of its fully available internal equations, and parameter values. Furthermore, there is another research published which has concluded that COCOMO-II Post Architecture model is most accurate model among different COCOMO models and gives accuracy of PRED (30%) = 70% of times [19].

Selected Prediction Model and CMMI's Benefit Prediction
During the literature review, an industrial study [20] where E = (0.91 + 0.01*" j= 1-5 Scaling Factor) The value of SF Scale Factors in the equation is adjusted which causes an exponential variation on a project's effort or productivity variation. The EM (Effort Multipliers) are the project controllable knobs that represent high payoff areas to emphasize in a software productivity improvement activity.
The rating scale for both SF and EM are scaled in between five levels from very low to Extra high.

Prediction Model Tuning
There exist numerous studies on calibrations conducted on COCOMO-II [21][22][23][24]. Research [22] has been done that describes the calibration of the COCOMO-II Post Architecture model using Bayesian approach. This research has claimed to give significantly better results than the multiple regression approach. Another reference [25] discusses an IEM (Ideal Effort Multiplier) method and considers it as the simplest way to analyze the correlation between the effect of COCOMO effort multipliers and the actual data from software projects by mapping Project productivity and the cost driver on the graph. But, the results may not be very clear due to the reason that the effects of other cost drivers get mixed in with the effort multipliers. For example, some of the projects in the COCOMO database with very low required reliability had relatively low productivity because they were performed with very lowrated analysts and programmers, and with very low use of modern programming practices. Therefore, for getting a clearer impact of a cost driver on development productivity, there is a need to eliminate the foul effects of other cost driver attributes as much as possible. The best way we found to normalizing these other effects is to compute a quantity called the IEM for the project wise cost-driver combination.
Using it we can get a clearer assessment of cost driver's effect on the project and a comparison of that effort with the COCOMO multiplier for the cost driver.
The results of this analysis might show a strong correlation between the COCOMO effort multipliers (the white circles) and the project's IEM for the selected attribute, as will be evidenced by the median values of the project data (the arrows) for each selected attribute rating. The correlation might not perfect, but it gives us reasonable confidence that the COCOMO effort multipliers are approximately the right magnitude and going in the right direction as a function of cost driver attribute rating. The results for any of the cost driver attribute could be something like as shown in Fig. 2.
That is all the data points (black circles) would lie within the circle (White).
Few years back, a team of researchers in a study [26]

Research Question and Hypotheses
The primary RQ (Research Questions) of this research are as follows: (

Questionnaire and Data Collection
For data collection, a questionnaire was designed collecting primary research data from multiple organizations. The "phone interview" was used as a method of collecting data.
It is considered as high in terms of respondent's motivation and low in interview bias as discussed in reference [27]. It's every single copy was applicable on every individual organization which has reported multiple data sets. In this research, the data has been collected over a 6 month period in early 2011. Survey participation requests were sent through email. A total of 24 questionnaires were delivered to respondents through email out of which 4 were returned, yield a response rate of 16%. One questionnaire was eliminated due to missing data. Three were analyzed which made the response rate of (80%). The sample consisted of 62 data sets collected from 3 organizations which are comparable to data set used in studies [16,[28][29][30] and is reasonably large as compared with study [26].  l  a  u  t  c  A  l  a  t  o  T  s  t  n  e  s  e  r  p  e  r  t  I  d  e  k  r  o  w  e  v  a  h  o  h  w  s  n  o  s  r  e  p  g  n  i  w  o  l  l  o  f  e  h  t  f  o  "  s  r  u  o  H  g  n  i  k  r  o  W  "  f  o  m  u  s  e  h  t  s  i  t  I  .  e  l  c  y  c  e  f  i  l  t  c  e  j  o  r  p  g  n  i  r  u  d  s  e  i  t  i  v  i  t  c  a  e  m  i  t  s  '  r  e  p  o  l  e  v  e  D  e  r  a  w  t  f  o  S  .  2  t  s  y  l  a  n  A  s  t  n  e  m  e  r  i  u  q  e  R  .  1  :  b  o  J  t  a  s  e  i  t  i  v  i  t  c  a  g  n  i  r  e  e  n  i  g  n  E  e  r  a  w  t  f  o  S  d  e  n  g  i  s  s  a  r  i  e  h  t  n  o  m  a  e  t  e  h  t  s  i

COCOMO-II Cost Drivers
Prediction variables usually used in estimation models represent the benefit measurement classes ESCP through which SDPI benefits are measured. Furthermore, 22 cost drivers have been recorded, which consists of five scaling factors and 17 effort multipliers. These qualitative cost drivers have a direct impact on software development projects therefore, it was necessary to record them for generating process improvement benefit predictions.

Interpretation of SDPI Levels or PMAT Scale Factor
Different increasing levels of software development process maturity are represented by a qualitative variable of PMAT Process Maturity Rating or SDPI levels. Table 4 indicates the adopted interpretations of different SDPI measurement levels.

Demographics
The demographic data i.e. size, structure, and distribution of these populations concluded that the number of available experienced respondents were belonged to the engineering and management disciplines (Table 5), which shows they have good understanding of development practices and were capable enough to provide a qualified assessment.
The respondents consisted of Quality Managers having approximately 10 years of field experience in process improvement field, which strengthens the collected data's reliability.

Evaluation of Prediction Accuracy
As discussed in research studies [21,19], the major highlighted prediction accuracy measures used in this research include the following measures:   • PRED measure shows that the K number of MREi prediction readings are within L% level variance and is given as PRED (L%) = (K/N)*100.
• It involves breaking of data sets into organization wise separate groups so such that the overall prediction accuracy improves within a group as compared to the mix data set.

RESULTS AND DATA ANALYSIS
Excel spreadsheet was used for performing the major calculations and data analysis in this research: (a) Against each data set ESCP has been calculated using equations given in Section 4.1 (

Basic Calculations
While performing the defined data analysis, the following calculations have been made in sequence. In Tables 6-7 it shows that data has been collected against COCOMO Scaling Factors and Effort Multipliers.

ISF Analysis
IEM is defined as, "for a project P, calculate estimated effort  (2) Following the verification rule given in reference [25] i.e. "If an Estimation model is perfect, the IEM for each project (P) would be equal to the corresponding COCOMO effort multiplier", there is a need of driving IEM formula using which, if we take COCOMO generated estimates as an actual effort value then the generated IEM value should be exactly the same corresponding COCOMO effort multiplier value. While doing so, we have first rearranged the equation 1 to derive any Effort Multiplier factor. This rearrangement results in exactly the same formula given as Equation (2).
Before going any further, we have verified it by taken available project's COCOMO-II Post Architecture model's estimated effort value as an actual effort and divided it with its estimated value which has been calculated while excluding the value of RELY Effort Multiplier for its "Very Low" project readings. The generated outcome was 0.82 which is the same value of COCOMO Effort Multiplier.
Then Equation (2) has been verified for calculating the ideal values for Scaling Factors. For this verification, the project's COCOMO-II Post Architecture model's estimated effort value was taken as an actual effort and divided with its estimated value which has been calculated while excluding the value of PMAT Scaling Factor for its Very Low project readings. For CMMI level 0 projects above formula should give its output exactly the same as PMAT scaling factor's beta coefficient value i.e. 7.80, instead it was giving a value of 1.27. Table 9 has shown this verification and the originally calculated value using the above Equation (2). On getting unverifiable results, we have solved the Equation 1 to derive the scaling factor value, that could be used to give verifiable results, see Equation (3) given as under.

Model Prediction Accuracy
Among the model prediction accuracy checking methods discussed in research [19] 5 represents the inter-model comparisons graphically.
In Fig. 5 the horizontal x-axis represents 5 projects of CMMI level organizations, and the vertical y-axis represents the Person Month measure as well as the KLOC value scale. We have five bars against each project data, among which starting from the left side the 1 st bar represents KLOC size of project, 2 nd bar represents PM actual reported, 3 rd bar represents PM estimated from COCOMO PA model, 4 th bar represents PM estimated from ISF-PMAT values derived in Malaysian study [26], and the 5 th bar is PM estimated from ISF-PMAT value which is calculated in this research. All above graphs witnesses that the estimated effort calculated from the ISF-PMAT value of this research gives closer value    to the Actual effort value reported in the collected data set of this research. All tabular and graphical data in Section 4 summarizes acceptance of research question that was raised in earlier Section. Table 18 shows a better prediction accuracy of this research's model than the original COCOMO model. It answers the Research Question RQ 1and shows that the prediction of this research's model reflects more accurate Predictions for ESCP values than the original COCOMO model, which makes it suitable for usage in Pakistani software industry If we focus over Effort value, which is the major factor of calculating other factor schedule, cost, and productivity, Table 19 helps in answering the RQ2 and shows that the prediction accuracy of COCOMO model tuned using IEM method is not better than the prediction accuracy of COCOMO model tuned using ISF method of this research.

THREATS TO RESEARCH VALIDITY
In this section we have discussed the major threats to the validity of this study as recommended in research notes by [34]. First of all the majority of organizations in the software industry have not maintained very detailed historical data repository, therefore, some metrics for which we have collected data are in percent form of some quantitatively selected data. An expert opinion has been used for data collection from senior QA (Quality Assurance) personnel which were involved at the time when software were developed. It has been observed in the targeted companies that once they go through whole process of SPI their software development and process related metric data is maintained and managed by the SQA (Software Quality Assurance) department although they have Quality Managers. However, their SQA managers are more responsible for data related issues. It could be mentioned that on the recommendation of QA Managers SQA Managers have been contacted for reporting project level details.
Furthermore, data collected in the questionnaire was not verified from their original resources as it was collected from the historical data repository. The data is collected from 3 most efficient organizations of Pakistan, which can limit the a generalization of this research, but these companies are representative of the software industry in Pakistan who has successfully achieved higher SDPI levels. Although, the number of companies taking part in the case studies are low in number, perhaps the number is sufficient to draw general conclusions.   Another threat is that researchers cannot draw a general conclusion based solely on the results of this study because, a limitation exists on data ranges of variables and that projects were only related to MIS and Business applications.
Therefore, it applies restriction on generalization of this research to other application domains. As it was not possible for the collected sample to cover a whole range of data values, it will not be realistic to assume that the results will be always generalized outside the settings in which the study was conducted. In order to include other possibilities of project type and size, one has to replicate this study for more projects with different sizes implemented using different SDPI levels.

RESULTS AND DISCUSSION
Section 4 has presented the interpretation of the benefits results in three participating organizations with data from sixty-two projects. We are discussing here the meaning and impact of result including the comments of the senior quality assurance personnel at data collection sites over unusual results.
Project data was collected for those past projects which were developed during the validity time period of their company's specific SDPI certification. Therefore, the SDPI level was taken as the company's maturity level and not of the project itself. It was not useful to study projects which have already participated in appraisals. The projects which were part of the appraisal have to show aggregate benefits especially while appraising ML level 4 and 5. It has been observed by the experts that to achieve appraisal ratings many companies produce fabricated records at a higher level.
In this research we have successfully devised a good mechanism to come up with a Prediction model which is not only proven reliable among research and professional communities but, it also has a very good prediction accuracy. The COCOMO-II Post Architecture model with ISF Ideal Scale Factor method has been implemented in this research, to tune it for reflecting CMMI base Process Maturity PMAT impact on software development project data, which is then taken as a benefit prediction model for CMMI. One major reason of adopting this method was the non-normal behavior of the collected dataset, otherwise we could also build upon a new regression base model on our collected data.
The result of this study shows that the mean-variance obtained from this research's "New model" with updated coefficient value for CMMI base Process Maturity variable PMAT is not the same as the mean-variance value obtained from Constructive Cost model by Barry Bohem (COCOMO-II)'s earlier values for CMM based process maturity and yields low variance. It found that with increasing levels of SDPI, not only Effort (PM) decreases but, the schedule and productivity value also decreases, whereas the project unit cost increases. This study also shows that in case of "Effort" during early years of SDPI adaptation, with every increasing level of SDPI adaptation of processes increases which consumes more time, effort, and man power. Another study [35] reports that in many cases of software project appraisals, the creation of evidence for appraisal, in order to justify a specific goal, takes much effort and added no value. The discussion with senior QA expert at data collection site has indicated that this increase reduces within a few years of implementing SDPI. This observation is purely the perception of QA manager and has not been empirically validated.
Fig. 5 has shown that this research's model gives close estimates of actual effort value as compared with the other two model estimates. This accuracy of estimation is common for all projects of CMMI level 0 through level 5. But here overall Effort estimation trend is different as compared to the COCOMO and Malaysian model's estimation trend, i.e. this research shows low consumption of effort in early stages of CMMI organizations which gradually increases till Maturity level 3 and again it drops down until it reaches ML-5. Whereas the other two models produce high effort estimates in early levels of CMMI appraised organization and low efforts on the highest level of maturity organizations.
One major reason of this different behavior of Effort consumption is the unavailability of proper and low-cost process improvement tools to Pakistani software development companies [7][8][9]. If we look at TOOL (Table 2 for detail) effort multiplier settings at various Maturity level projects it also shows Low to Nominal level rating for all ML projects. Only one organization has afforded using very high to extra-high level TOOL ratings at maturity level 3. If we take a close look at Process Areas of CMMI versus the Process Areas of old CMM framework then two aspects are quite visible: (1) that CMMI level 2 has only one additional process area whereas the ML-3 has around 7 additional process areas to follow, whereas, Level 4 and 5 have more or less the same process area. (2) these process areas requires added tool support and as discussed above the organizations hardly cross the usage of a basic front end, back end, and case tools with only minor integration. The above two aspects clearly justifies the reason of different effort consumption till ML 3 and a huge difference in magnitudes of original COCOMO model's PMAT value which is for old CMM framework used in international market and this research's ISF-PMAT value which is used in Pakistani working environment.  Our previous study [6] has also shown that, although, the rework value decreases about seven percent with the increasing levels of SDPI that should have caused a decrease in overall effort but the effort shows no decrease as it consumes more and more time in creating gold plating of QA documents without the proper help of genuine process management tools. In this research we have successfully devised a good mechanism to come up with a Prediction model which is not only proven reliable among research and professional communities but it also holds a very good prediction accuracy.

CONCLUSIONS
A project data of sixty-two dataset was used to enable

FUTURE WORK
For future work in the area of SDPI prediction model development there is a need of more data collection from CMMI organizations and to apply local data set to tune or derive new PMAT rating levels for better prediction accuracy.