A Residual Analysis for the Removal of Biological Oxygen Demand through Rotating Biological Contactor

Regression is a statistical method that is generally used for forecasting and prediction. It helps us to estimate the relationship between a dependent variable and one or more independent variables. This is the most widely used technique that best approximates the individual data points. It has found numerous successful applications in Engineering, Science, business and other fields. Getting average removal % of Biological Oxygen Demand (BOD 5 ) from greywater through Rotating Biological Contactor (RBC), following experiment was conducted in Sindh University hostels using different parameters such as Hydraulic Retention Time (HRT) i.e. 2 hours (0.42 liter per min), 2.5 hours (0.33 l/min) and 3 hours (0.28 l/min) and multiple number of discs i.e. 40, 42, 44, 46, 48, 50 and 52. Consequences reveal that linear estimate of HRTs and numbers of disc are considerable whereas linear and quadratic estimates of number of discs are highly significant, which evidence the significance of time and discs. However, as p-value is greater than 0.05, hence quadratic estimate of HRT is not significant. By using coefficients of the table the regression equation is Removal = - 79.995 + 6.88 time + 2.90 disc, where the sample standard deviation is 7.151, coefficient of correlation is 0.86 and coefficient of determination is 0.742. Distributions of errors are approximately normal as probability plot of the residuals is approximately linear. Residual analysis shows that against each predicted variable, residuals plot falls approximately in a horizontal band symmetric and centered about the horizontal axis and against predicted y-values. Moreover, Residual plot shows the constant standard deviations and linearity assumptions appear to be met.


INTRODUCTION
esources of fresh water are under immense stress and the need of fresh water can never be denied at all especially in the populous countries. Falkenmark and Lindh [1] state that a state or country is said to be water stressed when water provisions drop below 1000 cubic meters per person per year. Gleick [2] mentions that throughout last 20 years, in all regions of the world, shortages are observed due to decline of available water in per capita. The swift urbanization, industrialization, climatic change and ever increasing population spread have significantly threatened the assets of fresh water which are depleting day by day as stated by Nghiem et al. [3]. In urbanized countries, the most important consideration for treated wastewater reuse is implementation and planning of water assets. According to Ali et al. [4] reuse of treated effluents is being considered as reliable source of water in recent R years. In this regard a number of countries like Saudi Arabia, Singapore and Jordan have made significant progress in reclaiming the treated effluents and this has been made a part of their national policies.
RBC is an attached growth aerobic treatment method which requires the occurrence of molecular oxygen for the metabolic movement of microorganisms. In this process by the standard of physiochemical adsorption removal of colloidal particles were achieved first and secondly by the embarrassing situation of balanced particulate matters on the biological flocs respectively [5]. On the other hand, the removal of soluble organic fractions like BOD5 was done by microbial bio combination. The RBC works on the standard of bio amalgamation. These pollutants are transformed by microbes in to simpler end products i.e. water and carbon dioxide and create them possess cells.
Cohen et al. [6] state that Regression analysis is a statistical procedure for estimating the association among various variables. It comprises of techniques for analysis and modeling of many variables when the center of attention lies in cooperation between dependent variable and one or more independent variables. Doudpoto et al. [7] states that Regression analysis, helps us to determine how a variable changes its value by altering one or more independent variables. The change may be linear or nonlinear. Hayes [8] concludes that regression analysis estimates the conditional expectation of the dependent variable given the independent variables -that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. The goal is to estimate, in all cases, the function of the independent variables, called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution. The regression analysis predicts a value a variable y, given known values if the variable x. Jamal [9] categorize the regression analysis as single regression analysis and multiple regression analysis. Single regression analysis of independent variable is called simple regression and when there are two or more independent variables involved in the analysis, it is called a multiple regression analysis. Li-Kun [10] affirm that the coefficient of correlation is generally used to compute the strength of the relationship between the two variables. The value of the correlation coefficient range between -1 and 1.

NORMALITY ASSUMPTIONS
To decide whether it is reasonable to assume that normality assumptions are met or not, the so called residual analysis is carried-out, which is based on the differences between the observed and predicted values. For multiple regressions, inferences are met by three conditions. Weiss [11] suggests that against each predicted variable, a plot of residuals should fall approximately in a horizontal band symmetric and centered about the horizontal axis, against predicted yvalues, and the normal probability plot of the residuals should be approximately linear. Montgomery [12] notes down that those violations of model adequacy and basic assumptions can be investigated without any difficulty by the examinations of Residuals. Doubt casts on the validity of one or more of the assumptions, if failure of any of these three conditions for multiple regression inferences [13]. Histogram of residuals can also be made to check normality assumptions. The plot looks like a sample from a normal distribution centered at zero, if NID (0, σ 2 ) assumptions on errors are satisfied. Fluctuations often occur in the shape of histogram with small samples, unfortunately, but the moderate departure from normality does not imply necessarily a serious violation of the assumptions.

MATERIALS AND METHODS
Primary information regarding quantities of BOD5 and related information of grey water were collected from Sindh University Hostel after processed into RBC so that we can collect different quantities of pollutant levels. Regression Analysis is applied on RBC which is actually used in Simulation for treatment of wastewater, when removal is subjective by one or more than one factors. Two factors were chosen for the present study, to fit the quadratic response i.e. Number of Discs and HRT to analyze the effect. Different procedures were applied in order to optimize the removal response. The system operated under three different positions of HRTs i.e. 2 hours (0.42 liter per min), 2.5 hours (0.33 l/min) and 3 hours (0.28 l/min) and multiple numbers of discs i.e. 40, 42, 44, 46, 48, 50 and 52 for greywater treatment through RBC. Statistical software for Social Sciences (SPSS: 20), Minitab: 17, Ms Excel and Origin pro version 7 is used for analysis and to draw residual plots and 3D plots. The developed statistical model was validated with the previous experimental work by the researchers.
A simple linear regression model may be written as The number bo is called the y-intercept and the number b1 is called the slope of the line. Weiss [11] noted that the values of bo and b1 (in straight line equation) can be symbolize by a and b.
Chaudhry and Kamal [14] propose that for multiple regressions with two predictor variables x1 and x2, the regression equation is in the form: y = bo +b1 x1 + b2 x2 And in general; for multiple regression with k predictor variables x1, x2, ….., xk the regression equation is in the form: y = bo +b1 x1 +b2 x2 + ……………. + bkxk where bo is the unbiased estimate of the regression intercept and b1, b2, and bk are the parameters of regression of equation [15].   Average removal % of BOD5 in Fig.1 shows different HRTs levels i.e. 2 hours, 2.5 hours and 3 hours, multiple numbers of discs and with corresponding area increment. It is observed that as the HRT and number of discs increase, the removal % is increased simultaneously. Dot plot in Fig.2

RESULTS AND DISCUSSION
Key information about the removal % amount of BOD5 through RBC was obtaied from Sindh University Hostels by different procedures concerning the multiple numbers of discs, and the HRT. Table 1 represents Predictor variables, coefficients as well as their consequent t and p values for Response Surface Model. Table 2   By calculating t-values and consequent p-values the factors be validated. Literature revealed that a factor possessing p-value less than 0.05 is considered as significant, if not, factor is considered as not significant. The coefficients of the above equation in Table 1 with their consequent "t" and "p" values specified reveals that linear estimate of HRT and number of discs are significant as their p-values are less than 0.05, however p-value less than 0.01 indicates that linear and quadratic estimates of number of discs are highly significant. Quadratic estimate of HRT is, however not significant as its p-value is greater than 0.05. Interaction effect of HRT and number of discs is also highly significant as the pvalue of interactive effect is less than 0.01.  Table 3 was constructed to validate the consequences obtained. It indicates that even when level of significance was 0.05, HRT and number of discs has highly significant effect on removal % of BOD5, since the consequent p-values are less than even 0.01.
Normal Probability Plot shows that that the distributions of errors are approximately normal. Tendency of normal probability plot upward slightly on right side and to bend down slightly left side implies that the tails of error distribution are to some extent thinner than would be anticipated in a normal distribution. The normal probability plot of the residuals is approximately linear.
Residual analysis shows the differences between the observed and predicted values. Fig.4 shows that against each predicted variable, residuals plot falls approximately in a horizontal band symmetric and centered about the horizontal axis and against predicted y-values, residuals plot fall approximately in a horizontal band symmetric or centered about horizontal axis. Residual plot shows the constant standard deviations and linearity assumptions appear to be met. Histogram of residuals shows normality assumptions as the plot looks like a sample from a normal distribution centered at zero and NID (0, σ 2 ) assumptions on errors are satisfied as standardized residuals are approximately normal with mean zero and unit variance.

CONCLUSION
The output indicates that, retention time and number of discs has linear positive effect on the removal % of (BOD5). The coefficients for retention time, number of discs are 53.1 and 32.2 respectively. The quadratic effect of retention time is 0.73. It indicates that as the time increases the removal of Biological Oxygen Demand is also increased. The model further reveals that number of discs has positive effect on the removal of BOD5; however, after some level of increment of discs, it tends towards decline, as the quadratic response of number of discs is -0.290.
The distribution of errors (Histogram) is approximately normal. The normal probability plot of the residuals is approximately linear. In Normal Probability Plots and Residual Plots, it was observed that by removing the outliers models can fit best.