Testing Procedure for Item Response Probabilities of 2- Class Latent Model

This paper presents Hotelling T as a procedure for the testing of significance difference between the item response probabilities ( ′ ) of classes in a Latent Class Model (LCM). Parametric bootstrap technique is used in order to generate samples for ′ . These samples are based on the estimated parameters of 2-class latent model. The estimation of parameters in either situation is done using the Expectation Maximization (EM) algorithm through Maximum likelihood method. The hypothesis under consideration is whether the response probabilities ( ′ ) are equal against each item in both the classes. : = : = ≠ . If the test exhibits significant difference between response probabilities in both classes, it will be a clear indication of a presence of latent variable. We consider both training and testing data sets to develop the test. In order to apply Hotelling T test the basic assumptions of normality and homogeneity of variance are also checked. Chi-square goodness of fit test is used for assessing normal distribution to be good fitted on the hypothesized (bootstrap samples) based on 2-class latent model parameters for each data and Bartlett test to check heterogeneity of variances in ′ . Moreover, our procedure produces a minimum standard error of estimates as compared to those obtained through the package in R.Gui environment.


INTRODUCTION
atent variables are the hidden factors or some underlying concept which cannot be observed directly therefore other factors called manifest variables are needed. These manifest (observable) variables are then served as an indicator for latent variable. A well-known method for predicting latent variable is LCM. It identifies subgroups with the help of information provided by the manifest variables. These subgroups are formed over the assumption of local independence often called conditional independence.
Estimation of model parameters involve iterative algorithm (i.e. EM) Algorithm). We have used Maximum Likelihood Estimation (MLE) method. Usually model summary statistics in LCM are Loglikelihood ratio test statistic (-2logλ), χ . , G 2 and information criteria(s) (AIC, BIC etc.). The use of these statistics in making a decision, is subjective, as to how many number of classes will be in the model. Nylund et al. [9] showed in their paper that in comparison with other information criteria's the accuracy of AIC decreases with an increase of the sample size and in many cases overestimate the number of latent classes.
The Log-likelihood ratio test statistic (-2logλ) is also inappropriate in making such decision as regularity condition violates. C . and G 2 statistics are simply a check of how close the expected frequencies are with the observed frequencies, whereas, the expected frequencies are obtained by amalgamating the relative expected frequencies of all the classes present in the model. In addition, the basic assumption of having expected frequency ≥ 5 may also be violated if manifest variables are greater in number and the number of observations is small. Thus, motivates to develop an alternative test through which we can check the adequacy of the model. Anderlucci and Hennig [10] compared an alternative "distance-based clustering" method for finding the true ("correct") number of classes suggested by Hennig and Liao [11] to latent class analysis.
In this paper we have used Hotelling T . test for testing the differences between the item response probabilities (ω $2 ) in a 2-class latent model, where ω′ $2 s are the subsets of parameters within each group/class obtained by dividing a single population (mixture of two classes) on the basis of the basic assumption of local independence. There is a basic difference of purpose between the "usual Hotelling T 2" statistics and "special approach we adopted to Hotelling T 2 ". In "usual Hotelling T 2" the purpose is to check whether the means of one multivariate normal population is equal to the means of the other multivariate normal population. Whereas, "our special approach to Hotelling T 2 " involves bootstrapping the estimated parameters of a 2-class latent model to create a large number of surrogate samples. These bootstrap samples will serve the purpose of testing the item response conditional probabilities of the two classes of a latent variable to be significantly different from each other or not. In other words, our aim is to test whether the Item Response Probabilities (IRP) in class 1 are significantly different from those against the same item in class 2 of a single latent variable (i.e. ω $( ≠ ω $. . Significantly different IRP's indicate the existence of latent variable while setting up an assumption of local independence. Thus, the IRP's can be "recognizable" via this testing procedure. That is, the sets of parameters can be clearly identified for the respective conditional posterior distribution in the loglikelihood function. T .~F q, n ( + n . − q − 1, α As T 2 assumes that ;y ( − y . < follow multivariate normal. The data we used were investigated for normality and homogeneity of variance. We restrict our attention to 2-class model, further investigation for more than 2-class model is beyond the scope of this paper and can be considered as future work. We expect that the distribution of T 2 might follow Non central F distribution and Multivariate analysis of variance MANOVA may be used for a test of homogeneity of variance, in case of more than 2-class model.

Hotelling T 2 Test Approach for (ω $2 ′s) IRP's (HAIRP):
We need repeated (estimated) values of the parameters (ω $2 ′s) for which we want to carry the test described in this section. For this purpose, we initialized the parameters (we call them "Initial Estimated Parameters (IEP)") obtained through applying 2-class latent model on given data. These IEP's are then used to simulate a hypothesized population and fitting 2class latent model on it. The process is repeated several times (say "B" times) and the estimated model parameter of each hypothetical population is stored separately.
The null hypothesis stated as "No class model" against the alternative hypothesis that "There exist at least two latent classes" or in other words; H0 Response probabilities in items in class1 are equal to those against the same item in the other class.

H1
The two sets of item response probabilities vectors in the respective classes are significantly different from each other. We have created our own program codes in R.Gui for bootstrapping and estimating LCM (the codes are available from the authors upon request). Although we have written codes for the calculation of Hotelling T 2 test statistic too, but we used R package named "Hotelling" version 1.0 developed by Janes Curran available on the CRAN repository [13]. It includes functions to calculate a Hotelling T 2 test statistic and its p-value for the difference between two sets of (multivariate) normally distributed means. There is an option (command) in "hotelling.test()" function of whether to use a shrinkage method by "Shaefer and Strimmer's James-Stein shrinkage estimator" to estimate the "large scale sample covariance matrices" or to use a simple pooled sample covariance matrix. By default the function uses simple pooled covariance matrix unless shrinkage is marked as TRUE in the function for the shrinkage estimates of sample covariance matrix. This package also includes "Aitchison's additive log ratio" and "centered log ratio transformations" for compositional data [13].We limit

BOOTSTRAPPING RESPONSE PROBABILITIES OF MANIFEST ITEM IN A SPECIFIC CLASS
We used two kinds of data sets. "Classical/training data" used several times for applying LCM by different authors and scientists. Since, the model had been fitted on classical data, IEP are thus available. It includes Coleman's panel data [14] and Mastery data [15]. The other is "real/test" data, which has been collected or obtained, for which IEP's are to be obtained through applying LCM. These IEP's are then used as parameters to simulate the hypothesized population repeated number of times. On each hypothesized population, LCM is again applied to obtain the estimates of ω $2 ′s. The values of these estimates are then stored in a matrix for testing differences among W a( and W a. ; i = 1, 2, 3, 4. Efron and Tibshirani [16] suggested that 50 to 100 replications may be sufficient for standard error and bias estimation, whereas, for highly accurate assessments or calculation of p-value, 350 replications may be considered a higher size [17,18]. We simulate the bootstrap samples of sizes 30, 200, 500, 1500 and 2000. In the program (codes) initial values are set to zero. Therefore, the first storage space does not fill during the 1 st iteration. The program command starts storing the estimate from 2 nd iteration onward. The resultant matrix with one lesser value is therefore of size 29, 199, 499, 999, 1499 and 1999.
The estimates of the likelihood function should be based on the global maximum. Bartholomew suggested using different starting points to search global maxima or maxima close to global one [19]. Linzer and Lewis in their paper on "poLCA" (a package in "R" software) recommended using model fitting command several time to find the global maxima [20]. We took care of using the estimates based on global maxima. The reason is that, if the solution is from any one of the local maxima that may create certain problems in achieving the true "situation". Such as, one, in booting samples the algorithm might produce "NaNs" or in other words, some of the booted sample may become redundant and could not be included for further analysis."NaN's" were due to the boundary problem where the solution exaggerates when item probabilities are very much closer to 0 and 1. The problem can be solved by adding a constraint that if IRP are very close to either 0 or 1 to avoid this problem we add or subtract 0.00001 to it respectively. Since the value added is so small, it does not affect the result rather one can have control on the boundary issue. Secondly, there is also a possibility (we have experienced) of occurrence of outliers in the bootstrap samples which could be identified in a chisquare quantile plot and a separate normal probability plot of each ω $2 ′s (not shown here). Presence of outliers may also create a problem in goodness of fit test. Although, it is observed that for HAIRP (Special approach to Hotelling's T 2 statistic) outliers have no influence on the results.

Problem in Handling Item Response Probabilities
Initially LCA starts with an unconditional contingency table for assigning response profile to each class/category without any guidance and thus completes the table. As a result, class prevalence (θ 2 ) and item response probabilities are obtained for each of the latent class.
In the method of LCA during estimation recognition of "Class Identification Number (CIN)" is flexible. That is, due to no clear guidance to identify CIN, the algorithm assigned one set of estimates (class proportion and item response probabilities) to class 1, in the next iteration, are then assigned to class 2. If there are B iterations, (B/2) half of the CIN are assign to one specific class (say class 1) and the other half to class 2 in a random fashion.
Due to no proper recognition of CIN, the sampling distribution of IRP's and b appeared bimodal. Thompson used the term "fractured" for such bimodal sampling distribution. He obtained bootstrap samples for class response probabilities and class proportions through EM algorithm using log-liner model [21]. Linzer and Lewis also discussed the issue of CIN and suggest solution while using poLCA package [22].

Solution to the Problem of Handling Item Response Probabilities
In order to resolve such bimodal sampling distribution, that is, the inability of EM algorithm to consistently recognizing the latent classes, we propose the following method.
After, generating bootstrap samples "B" times through EM algorithm using Latent Class Model in case of 2class model (j = 1, 2) with 4 items (manifest variables; i = 1, 2, 3, 4), we used the following steps for recognition of proper CIN ("fracturing" problem).
Step1: For any one item, calculate the difference in response probabilities for class 1 and 2. For example, we consider the difference in between the response probability of 2-classes for item 1. {i.e.; D = ω (( − ω (. }.
Step2: Create a coding variable C on the basis of "D". Where for every positive difference Cj takes on a value "1" and "2" for the negative differences {if D > 0, C = 1 else C =2}.
Step3: Create another vector of the absolute difference {i.e. create a new matrix T of W ae ′s of dimension B×4}.
Step5: Stack C and C2 and name it "G".
Step6: Unstack T on the basis of grouping variable G.
The strategy is applied to each vector of the estimate. After such adjustments, bootstrap solutions are consistent with the initial estimates. That is, solution will then consistently classify item response probabilities to the group they belong and the bimodal sampling distribution resolve to unimodal with means and variances accordingly. Thompson gave the solution of fracturing of estimates by identifying the most informative indicator for which the conditional probabilities given latent class membership differ most [21]. The command lines for such direction of indicators are also provided in his paper.

DATASETS
The first data used is "Mastery data" [15]. Bartholomew in his book discussed it and suggested a 2-class latent model. The two classes were named as "Master" and "Non-master" [19]. Originally 142 individuals were asked to take a test based on four problems (items) randomly selected from the area of problems on the multiplication of two-digit number by a 3 or 4 digit number, which involve use of carry operations. The response was marked as '1' and '0' for correct and incorrect solutions, respectively. The estimated 2-class latent model parameters (considered as IEP) are presented in Table 1. The classes are marked as "Master" and "Non-Master" based on item response probabilities in each class.
The second classical data set is Coleman's panel data [14]. This data is also used by Goodman [23,24] and Bartholomew [19]. The data was collected from 3398 boys at two different points in time. Each time the

662
individuals were asked two questions that are (1) "Whether or not they considered themselves to be in the leading crowd" (1 for a positive response and 0 for a negative response). (2) "Whether they thought that such membership involved can sometimes go against their principles" (1 for a negative response and 0 for a positive response). A restricted model with an assumption that there exist two latent variables, and that they altogether form four latent classes is the solution for the data [19]. On the basis of their response probabilities obtained through estimating 2class model, we named the classes as being "optimistic" and "pessimistic" attitude groups. The LCM parameters (considered as IEP) are given in Table1.
Real data which we have considered here is of election of Karachi University Teacher Society (KUTS) in year 1993-1994. A total of 434 teachers participated in 1993-94 KUTS election. We considered only the top four positions (President, Vice president, Secretary and Treasurer) in the analysis. The responses were coded as 1 = "Rightist" panel and 2 = "Mix" penal. A complete discussion regarding LCM on KUTS data is available in [25]. Table 1 shows 2-class latent model parameters for the data.

DISCUSSION AND RESULTS
In Mastery data (shown in Table 1), it should be noted that the two class sizes (estimated through LCM) are a ratio of 60:40 and the response probabilities of correctly answering an item for class 2 (non-master class) are too close to the lower boundary of the parameter space (i.e.; 0). Similar scenario is in the case of KUTS panel data that the estimated LC model parameters are too close to the boundaries (as high with maximum probability 0.96 and low with 0.03).
In case of Coleman's Panel data, the estimated model parameters are not too close to the boundary of the parameter space. Although, ω (. and ω T. seam close to zero but they are not as low as less than 0.09 (shown in Table 1).
For each data considered and for each size, Hotelling T . reveals that there exist significant differences between the parameters of the two classes in a 2-class model, as the Hotelling T . test rejected the null hypothesis (i.e. H f : ω $( = ω $. ; for i = 1, 2, … , 4) with "0.000" p-value (Since the test statistics value(s) are very high and results in a p-value too close to 0). Therefore, it can be concluded on the basis of Hotelling T . test, that latent variable exist in Mastery and Coleman's panel data (shown in Table 2).
Bartlett test ( Table 2, 3) shows that variances are heterogeneous in all bootstrap samples of different sizes for all the data set considered. It is observed that, where ω $2 ′s are not too close to the boundary of the parametric space, the chance of normal distribution to be well fitted is pretty high (shown in Table 4). It is also observed while bootstrapping ω $2 ′s that the normal distribution is good fitted in cases where the size of the bootstrap sample is relatively lower than 1000 (for 0 < ω $2 ′s < 1 and particularly when ω $2 s are close to the boundary).
For each data considered and for each size, Hotelling T . reveals that there exist significant differences between the parameters of the two classes in a 2-class model, as the Hotelling T . test rejected the null hypothesis (i.e. H f : ω $( = ω $. ; for i = 1, 2, … , 4) with "0.000" p-value (Since the test statistics value(s) are very high and results in a p-value too close to 0). Therefore, it can be concluded on the basis of Hotelling T . test, that latent variable exist in Mastery and Coleman's panel data (shown in Table 2).
In case of KUTS panel data we consider positive response when voters are in favor of the Rightist group. The result shows that there is a significant difference between ω $2 ′s of "Rightist" and "Mix" group. HAIRP reveals that there exist differences between the parameters of the two classes, since the test rejected the null hypothesis (i.e. H f : ω $( = ω $. ) with "0.000" p-value, no matter whether the covariance matrices are homogenous or heterogeneous (shown in Table 3). Therefore, it can be concluded on the basis of HAIRP that there may exist a latent variable in KUTS panel data.

Comparison of Standard Error of Estimates through Bayes LCA and HAIRP:
"BayesLCA" is a package in R.Gui environment, available at CRAN repository since 2012. The program was developed by Arthur White and Thomas

663
Brendan Murphy. Along with model fitting it also provides with the bootstrap sampling for IRP's and calculate standard errors. We have considered "BayesLCA" to compare the shape of the densities of ω $2 ′s with those obtained through HAIRP (a total of 6 (sizes (B)) × 3 (data sets) = 18 cases). Both approaches approximately showed same distributional behavior for each ω $2 {i = 1, 2, 3, 4 and j = 1, 2}. However, estimates obtained through HAIRP give more accurate results then "BayesLCA". Comparison between HAIRP and "BayesLCA" is presented in Table 5 for all three data sets. For each size (B) the estimates of ω $2 ′s are presented. The average estimates (mean) for each ω $2 obtained through HAIRP are much closer to the population parameters (IEP's) as compared to MAP obtained through "BayesLCA" (for each data).
In most cases estimated model parameters obtained through "BayesLCA" are under or over estimated.
Moreover, standard errors for ω $2 ′s for all sizes (B) are also very small than those obtained through package "BayesLCA". "BayesLCA" provides comparatively smaller standard errors of estimates for large samples (B ≥ 500) as compared to smaller ones (B < 500).    4: A SUMMARY TABLE OF CHI-SQUARE TEST FOR ASSESSING NORMAL DISTRIBUTION TO BE GOOD FITTED ON THE  HYPOTHESIZED (BOOTSTRAP SAMPLES)    666 existence indicates significant difference between the classes, while setting up local independence assumption. Moreover, our adopted bootstrapping procedure provides better estimates along with their minimal standard errors as compared to the one obtained through package "BayesLCA".

ACKNOWLEDGMENT:
Authors are indebted to two referees for valuable comments and suggestions.