Utility of CK Metrics in Predicting Size of Board-Based Software Games

Software size is one of the most important inputs of many software cost and effort estimation models. Early estimation of software plays an important role at the time of project inception. An accurate estimate of software size is, therefore, crucial for planning, managing, and controlling software development projects dealing with the development of software games. However, software size is unavailable during early phase of software development. This research determines the utility of CK (Chidamber and Kemerer) metrics, a well-known suite of object-oriented metrics, in estimating the size of software applications using the information from its UML (Unified Modeling Language) class diagram. This work focuses on a small subset dealing with board-based software games. Almost sixty games written using an object-oriented programming language are downloaded from open source repositories, analyzed and used to calibrate a regression-based size estimation model. Forward stepwise MLR (Multiple Linear Regression) is used for model fitting. The model thus obtained is assessed using a variety of accuracy measures such as MMRE (Mean Magnitude of Relative Error), Prediction of x(PRED(x)), MdMRE (Median of Relative Error) and validated using K-fold cross validation. The accuracy of this model is also compared with an existing model tailored for size estimation of board games. Based on a small subset of desktop games developed in various object-oriented languages, we obtained a model using CK metrics and forward stepwise multiple linear regression with reasonable estimation accuracy as indicated by the value of the coefficient of determination (R 2 = 0.756).Comparison results indicate that the existing size estimation model outperforms the model derived using CK metrics in terms of accuracy of prediction.


INTRODUCTION
A number of attributes can be used to describe and judge various aspects of software. Some of these attributes (e.g. quality) are external while others (e.g. complexity) are internal. Software size is classified as an internal attribute [1]. Once the software has been developed, its size can be easily measured by using any of a number of different applications called code counters [2]. The real challenge however, is estimating the size of software before its development.
Unfortunately, software size is unavailable during early phases of software development. So we need to develop a size estimation model that accurately predicts the size of desktop-based software games.
Estimation of software size is challenging but the rewards for accurate prediction of size are extremely high. This is primarily because software size is one of the key inputs of many effort and cost estimation models [3]. Some of these models, such as the widely used COCOMO-II (Constructive Cost Model) [4], use both direct size measures e.g. SLOC (Source Lines of Code) [5]) as well as indirect functional size measures such as function points [6]. An accurate estimate of either SLOC or the programming-language independent function points is, therefore, crucial in planning and managing software development projects.
Software development projects using the object-oriented paradigm [7] use class diagrams [8] for modeling purposes.
Analysis class diagrams are used to model the problem space while design class diagrams model the solution space [9]. Apart from being used as a tool for improving understanding of the problem and solution spaces, these class diagrams can also be used as a tool for estimating the size of the final software application [10].
Application design information, whether obtained from the class diagrams or extracted by reverse engineering [11] code written in an object-oriented programming language, is a good candidate for being used as an input to a size estimation model. This is primarily because design is the last phase before implementation. Therefore, a wealth of information is available at the end of the design phase.
This research attempts to take advantage of this fact.
In this work, we describe the derivation and validation of a regression-based model developed to predict the size of board-based software games. This model uses the design information provided by CK metrics [12]. It is a suit of most validated and most reliable metrics. It is used to evaluate object-oriented design [13] as input and generates an estimate of software size as output. A dataset comprising around 60 open source board-based software games is used to calibrate this model. Once calibrated, this model is assessed using different accuracy measures and is validated using K-fold cross validation [14]. The accuracy of this model is also compared to that of an existing size estimation model. The next section discusses relevant past work in this area. Section-III presents an overview of the CK metrics and section-IV illustrates their calculation via a small case study. Our research methodology is described in detail in section-V which also summarizes the results of model assessment and validation and presents a comparison withan existing size estimation model. Threats to validity of this research are discussed in section-VI. Section-VII highlights our major contributions and finally section-VIII proposes some directions for further work in this area.

LITERATURE REVIEW
Researchers have explored different ways of estimating software size using information contained in the class diagrams of object-oriented systems. One of the earliest works in this area was done by Mišic and Tešic [15].

OVERVIEW OF CK METRICS SUITE
Chidamber et. al. [22] proposed a set of metrics in 1991 specifically for object-oriented software programs. It was improved in 1994 [12] and is now used widely. This suite Depth of Inheritance Tree: DIT refers to the "depth of inheritance of the class". In case of multiple inheritance, DIT is "the maximum length from the node to the root of the tree". As a class gets deeper in the hierarchy, it inherits more methods and variables.
Number of Children: NOC refers to the "number of immediate subclasses subordinated to a class in a class hierarchy" [18]. Unlike DIT, it measures the breadth of the class hierarchy.
Coupling between Object Classes: CBO for a class is defined as "a count of the number of other classes to which it is coupled". Two classes are said to be coupled "when methods declared in one class use methods or instance variables defined by the other class".
Response for a Class: RFC refers to "a set of methods that can potentially be executed in response to a message received by an object of that class".

CASE STUDY
Consider a board game called "X and O". The objective of this two-player game is to win by getting three X's or O's in a row. This game is played on a three by three game board. The players take turns in placing X's and O's on the board till one of them wins or the board gets filled.
The rules of the game are listed below:  Game Start: • Any player can start the game.
 Moves: • The players choose their signs as X or O and then move accordingly.
• A player can place X or O at any empty place of the grid on his turn.
 Draw: • The game is said to be drawn if none of the player succeeds in getting three in a row and the board gets filled.
 Win: • The game is won by the player who gets three in a row first.
A software version of this game was developed using Java (row 57 of Table 2

RESEARCH METHODOLOGY
This research extends our earlier work on the size estimation of board-based software games [25,26]. In our earlier work, we defined eight potential predictors to estimate the size of desktop-based board games viz. NRUL

Extraction of CK Metrics
The same dataset (comprising 67 board-based games) that was used for calibrating our previous model [25] was used as a starting point. However, games that were not developed using an object-oriented programming language were pruned from the original dataset. As shown in

Analysis of CK Metrics
First, SLR [30] was used to determine the predictive strength of each CK metric individually. Later, forward stepwise MLR [31] was used to obtain our size estimation model which uses FP as the response variable and the six CK metrics as the six predictor variables. Regression analyses were done using the IBM SPSS tool [32].  G  e  g  a  u  g  n  a  L  C  O  L  S  F  G  P  F  M  O  C  L  T  I  D  O  B  C  C  O  N  C  F  R  C  M  W   .  1  e  o  T  c  a  T  c  i  T  D  3  #  C  1  5  8  2  4  5  6  9  7  .  2  5  0  1  3  2  5  1  5  7  4  2  2  8  7  0  8  3   .  2  e  o  T  c  a  T  c  i  T  #  C  #  C  1  3  5  4  5  3  3  8  .  9  1  9  1  0  2  0  0  5  2  3   .  3  n  i  B  s  s  e  h  C  #  C  2  1  3  2  4  5  5  1  8  .  2  4  1  4  6  4  4  3  1  0  5  7  2  9 Table 4 summarizes the results of applying forward stepwise MLR on the data set. Each row in this table represents an MLR model. A check mark () represents inclusion while a dash (-) indicates exclusion of the corresponding predictor variable. Using different combinations of predictors, a total of six models were obtained in which all predictors were statistically significant at α-value of 0.05. It is important to note that all of these six models had only two predictor variables. This is because none of the MLR models obtained using more than two predictors had all statistically significant predictors (at α-value of 0.05). The best model (highlighted in bold in Table 4) has an R 2 value of 0.756. The two predictors it uses are CBO and WMC. The following Equation (1), in which EST_FP (Estimated Function Points) denotes the estimated value of the response variable (FP), formalizes this model:

Model Assessment and Validation
The accuracy of an estimation modelmust be formally assessed using different accuracy measures. In this research, we have used the two de-facto standards i.e. MMRE [35] and PRED(x) [35][36]. MMRE is defined as the average of all MREs calculated as a result of regression.
On the other hand, PRED(x) is defined as, "the percentage of relative error deviation that lies within x". The most commonly used values of x in literature are 25 and 30 with little difference in results [37]. We use PRED(25) in our work (i.e. x = 25) which is a more stringent metric than PRED (30). According to Conte et. al. [37] for a model to have reasonably good estimation accuracy, MMRE < 0.25 and PRED(25) > 0.75.
Despite the fact that MMRE and PRED(x) are the defacto standards of estimation accuracy, they have been criticized due to their limitations [38]. Therefore, to overcome these limitations, we have used additional accuracy metrics as well i.e. MdMRE, MMER (Mean Magnitude of Error Relative) to estimate, MBRE (Mean of Balanced Relative Error), and MIBRE (Mean of Inverted Balanced Relative Error).   The terms Y and i Ŷ in Table 5, denote the actual and the predicted values respectively obtained after regression. K-fold cross validation [14] was used to validate this model. The data set comprising 55 games was divided into 5 random subsets (i.e. K = 5), each subset comprising 11 games placed randomly. In each iteration, 4 folds were used to train the model and the remaining 5th fold was used to validate the model. Table 6 summarizes the results of this K-fold cross validation exercise. The last row of Table 6 shows the average vales of accuracy metrics. As indicated by these average values, the performance of the model is not up to the mark.

Comparison of Results
In this subsection, the accuracy of the design-based size estimation model obtained using CK metrics is compared with that of an early size estimation model proposed in one of our earlier works [25]. The early size estimation model uses four inputs (i.e. number of game rules, number of players, animation, and miscellaneous game options) and produces an estimate of FP as output. CK metrics has much lower accuracy than the early size estimation model proposed earlier. The early size estimation model has higher values of R 2 and PRED (25) and lower values for MMRE and other error-related means.
A comparison of the results of K-fold cross validation shown in Table 8 also corroborates the same conclusion. The early size estimation model clearly outperforms the design-based size estimation model built using CK metrics. This may appear counterintuitive since more information is present at the time of design. It must be kept in mind, however, that these two models use different predictors. It is quite possible that the predictors of the early size estimation model (for instance, number of rules) capture crucial information about the requirements which may have been missed by the CK metrics.

THREATS TO VALIDITY
A couple of factors may have an impact on the validity of our results. The first of these is related to the comparison of the size estimation model obtained using CK metrics and the size estimation model proposed earlier. The earlier size estimation model was calibrated using 65 open source board-based games. Since all of these games were not developed using object-oriented languages, the exact same dataset could not be used for building a size estimation model based on the CK metrics. A slightly smaller subset comprising games developed using only object-oriented programming languages was used instead.

CONCLUSION
This paper has presented the results of an investigation The prediction accuracy of this model was also compared to that of an existing size estimation model developed specifically for estimating the size of board-based games.
The existing model relies on information available at the time of inception while CK metrics, on the other hand, are available only after the completion of detailed design.
Therefore, even though one may expect the model using CK metrics to be more accurate, comparison results indicate that the existing model outperforms the model based on CK metrics with respect to prediction accuracy.
This unexpected behavior is, perhaps, the result of not capturing crucial game requirements such as the number of rules of the game.

FUTURE WORK
This work can be extended in a variety of ways. So far, we have focused on a small subset dealing with just boardbased desktop games. Other types of games (e.g. cardbased) may also be considered. Similarly, games made for other platforms (e.g. mobile) may also be explored. Thirdly, the utility of other suites of object-oriented metrics may be investigated. Last but not the least, a hybrid-model using a combination of object-oriented design-based software metrics and requirements-based metrics may be built to get the best of both worlds.