A Headway to QoS on Traffic Prediction over VANETs using RRSCM Statistical Classifier

In this paper, a novel throughput measurement forecast model is recommended for VANETs. The model is based on a statistical technique adopted and deployed over a high speed IP network traffic. Network traffic would always experience more QoS (Quality of Service) issues such as jitter, delay, packet loss and degradation due to very low bit rate codification too. Despite of all such dictated issues the traffic throughput is to be predicted with at most accuracy using a proposed multivariate analysis scheme represented as a RRSCM (Refined Regression Statistical Classifier Model) that optimizes parting parameters. Henceforth, the focus is towards the measurement methodology that estimates the traffic parameters that triggers to predict the accurate traffic and extemporize the QoS for the end-users. Finally, the proposed RRSCM classification model’s end-results are compared with the ANN (Artificial Neural Network) classification model to showcase its better act on the projected model.

and ANN Model. Some investigators [3] revealed that non-parametric practices largely perform better due to their resilient ability to seize the non-deterministic and multifaceted non-linearity of traffic time series. [4] correlated different natures of Network under BPN (Back Propagation) with economical mathematical models on the data about inflation rate. Altered BPNs considered were: BPN, BPN with ARIMA and BPN with VAR model.
The primary glitch with ANN is due to the shortage of clarifying ability and the poor choice made in constructing the approach to describe the architecture of the network.
At present, the ANN modeling process is basically empirical. The test result showed hybrid BPN were similar or better than their equivalent econometric model in dynamic forecasting [5]. Researchers [6], made a fair study among ANNs and ARIMA model using 8 years sales data collected from a medium sized enterprise in Brazil which revealed that ANN performs better than ARIMA [7] compared neural network with the regression model OLS (Ordinary Least square) method was used to estimate parameters of regression model using financial stock data.

THE PROPOSED METHOD
The exertion emphasizes on the scheme, the practical appraisal and scrutiny of the conduct which will train the proposed model classifier to predict the throughput which will forecast the arriving input degree in Megabits per  So, this henceforward helps us to justify that the classifier yields better prediction result as the TP rate is noted to be high.
The proposed RRSCM involves the following activities, the pictorial representation is shown below in Fig. 5. In OLS model the Equation (1), represents Y as a Dependent variable, PV i as independent predictor variable for the i th observation (i is the number of observation), ε-Error.
• The discriminators/parameters/attributes referred as IVs have corresponding influences represented as β1, β2, β3… ân for the traffic that is to be predicted as Y (dependent variable referred as DV).
• A simple change in any of the independent variables influences β1, β2, β3… βn will reflect on the computation of Y (DVs), the predictable traffic.
• In case, all the attributes at a point do not contribute any influence over the predictor variable Y (DV), then there exists x0 an assumed PV which is always 1.
• Hereafter, x0 hold an influence á otherwise stated as the intercept coefficient on the predictor variable Y.   (2).
Where n is the sample size and p is the reported standard error proportion. As a check on such standard error equated to zero, it guarantees linearity and henceforth is showed that the expected standard error is to be assumed zero as shown in Equation (3). If the condition is violated the corresponding Independent variable (IV) will be transformed by the proposed RRSCM classifier model.

Postulate-2:
For a particular value of x (IV) there are possibilities to yield several values of y, which when plotted over the graph will reveal variability (variance, σ y ). It is observed that the variability of y (σ y ) should be equal as shown below in Equation (4). The squared variance of individual y(σ 2 y )components with respect to every independent variable x should be the same and are finally equated to a constant term.
This shall also ensure the constant variability on the standard error terms, as its variance is squared. As shown in Equation (5) the Y variance remains the same across the X (IVs). In case if it is violated, then the attributes will be transformed to avoid heteroskedasticity. The issue of heteroskedasticity raises, as the variance of Y differs accordingly with the corresponding IVs.

Postulate-3:
The standard error terms for an observation i and observation k yields error terms of å i and å k respectively. The COV (Co-Variance) between the error terms ε i and ε k is expected to be zero to guarantee uncorrelated error existence as shown below in Equation (6).
As there are n errors, the co-variance between the errors shall have a matrix with diagonal elements, whose variance will be zero and the error will follow a normal distribution as stated in postulate 4.

Postulate-4:
As there are n observations, there exists n error terms (ε 1 …ε i …ε n ) which is said to follow normality or normal distribution (N). The error terms take care of the variability issue, helps to acquire a good generalization and achieve better prediction results. Henceforth, this shall come true if and only if normality is achieved as specified in Equation (7).
The standard error rate ε i will follow a normal distribution within a range of minimum zero and the variance (sigma/ σ) squared for y (σ y 2 . All these postulates when violated may result in discrepancies during the forecast. β o as shown in Equation (8), customs mean and variance and the expectation of β o, β o as is equal to β then it concludes that it's not a biased estimate, otherwise is dictated to be biased.
Algorithm-1, guides the selection of attribute via the perfect pathway. The multicollinearity issue is measured with the formula given in line number 3. It can be measured by tolerance and VIF (Variance Inflation Factor). Tolerance is the variance percentage that is unaccounted by the other IVs (Independent Variables). Tolerance is a multiple regression analysis were the IV is regressed with the other IVs in the MR analysis, which derives R 2 value is obtained. 1-R 2 , leftover of variance which not accounted for is considered as tolerance. Tolerance values are mostly 0.10 or less are sighted as problematic. As 0.20 is the lowest suggested tolerance value, which may further lead to instability. VIF is the reciprocal of tolerance. VIF is correlated with tolerance, it indicates the degree of inflation of the standard error rates due to the levels of collinearity or multicollinearity. VIF values of 10   will sighted as problematic. Here the standard error rates are supposed to be inflated by factor of 10 or above where there wouldn't be any correlation between the independent variables.
Stability is achieved in the selection of predictor variables by avoiding multicollinearity are through the removal of redundant variable, aggregating similar independent variables/attributes of choice and by increasing the sample size.
Algorithm-2, concentrates on identifying the PVs that may mislead the estimation by producing higher/ increased coefficients. A standard regression model is mathematically represented as depicted in Equation (9), in which Y j is the j th observed predictor variable value, PV ij is the j th observed predictor variable value for i th variable and CE i is the regression coefficient to be defined for a dependent variable. M will be the number of data points and N signifies the number of terms in the regression equation.  Therefore, the squared difference between the perceived and forecast value is multiplied with the weights (WF) computed accordingly as formulated in line number 2, to have more accurate estimation.
Algorithm-4, verifies for the auto-correlation issue. In Algorithm-4, the error correlation on various observations are noted and those PVs are transformed further to yield appropriate accurate prediction results. These boosting techniques are introduced into traffic prediction by exploiting all possible regression equations to have the best forecast of the traffic using our proposed RRSCM model. The general form which is used to predict the traffic as shown in Equation (11). The variables pv1, pv2, pv3…pvn are the independent-variables that help us to predict the dependent variable. The dependent variable dataset relies on the dissected traffic on the network biased on the protocols, which leads us to the generation of the prediction equation. Table 2 projects the issues that has to be overcome to have an unbiased prediction estimation. The formula for calculating Coefficient Correlation (r or R) that dictates the degree of association as shown in Table 3, between the dependent and the independent variables (PVs) are given in Equation (12). Everywhere p and q are independent and dependent variables respectively.

DISCRIMINATOR AND CLASSIFICATION MODELS -SPSS
A traffic trace of a shorter period is taken into account to evaluate the prediction using MLP (Multilayer Perceptron) and RRSCM classification model in SPSS.
The discriminators that measure the dependent variables are 38 which are referred as PV 1 …PV 38 . The summary of the prediction using MLP is shown in Table 4.  Table 5. The MLP architecture is chosen in ANN model to predict the traffic. shown in Table 6. The Table 7

RMSE, MAPE and Correlation Coefficient Metrics
The RMSE (Root Mean Square Error) is one of the universally deployed metric [17].    1  4  1  s  c  _  k  c  a  _  e  t  a  c  i  l  p  u  d  _  e  l  p  i  r  t  .  r  e  v  r  e  S  o  t  t  n  e  i  l  C  m  o  r  f  d  e  r  i  u  q  c  a  s  e  n  i  l  y  b  e  t  a  c  i  l  p  u  d  d  l  o  f  e  e  r  h  t  f  o  t  n  u  o  m  a  e  h  T   4  5  1  s  c  _  a  t  a  d  _  e  l  i  t  r  a  u  q  _  t  s  r  i  f  t  e  k  c  a  p  )  t  e  n  r  e  h  t  E  (  n  i  s  e  t  y  b  f  o  e  l  i  t  r  a  u  q  t  s  r  i  F   5  5  1  s  c  _  a  t  a  d  _  n  a  i  d  e  m  t  e  k  c  a  p  )  t  e  n  r  e  h  t  E  (  n  i  s  e  t  y  b  e  g  a  r  e  v  A   1  6  1  s  c  _  p  i  _  a  t  a  d  _  e  l  i  t  r  a  u  q  _  t  s  r  i  f  t  e  k  c  a  p  P  I  f  o  e  l  i  t  r  a  u  q  t  s  r  i  f  n  i  s  e  t  y  B   1  7   Wherever D t denotes the factual significance, F t dictates the estimated significance and n notifies the observations. Since the errors are squared before they are averaged, the RMSE contribute reasonable high weightage to more errors, but this distress is somewhat meticulously controlled by hosting the square root at the end. The superior the value of RMSE signifies an inferior estimation on ANN classification model as shown in Table 10. Mean Absolute Percentage Error (MAPE) is the most shared and widespread error gauging metric for forecasting. MAPE evaluates the mean of absolute percentage error which is stress-free to comprehend and estimate. The MAPE is epitomized by the Equation (14).
Wherever D t is the real value and F t is the anticipated value. To conclude, the MAPE for the classification models discussed above are computed and equated accordingly as shown in Table 11   The actual traffic is graphically represented with the blue color and the predicted traffic showcased in orange color, using the ANN classifier in weka does not correlate much as shown in Fig. 7. The variations are also hereby showcased using the standard error corrections existing over the classifiers. The other Fig. 8

CONCLUSIONS
The investigated outcomes exhibits that: The work can also be envisaged to assimilate the selected predictor classification model into a network management system and assess it on real-time.