Statistical hypotheses on the coefficients of correlation and...

Statistical hypotheses about the coefficients of correlation and regression

In a number of cases, the researcher, in carrying out the experiment, seeks to evaluate how the correlation coefficients and regressions calculated by him correlate with certain theoretical values, for example, with a zero value reflecting the absence of any connection between the variables. In this case, a simple estimation of the correlation coefficients and regression is not enough. After all, the methods of calculating the statistical connection considered by us represent only variants of estimating theoretical parameters hidden from direct observation. And we already know well that the statistics and the parameter do not always coincide.

Therefore, the promotion and verification of statistical hypotheses becomes necessary. This situation assumes a structural statistical model, called the fixed linear model. Consider this model first.

Fixed Linear Model

The simple linear regression method, as well as the various variance analysis options considered earlier, are referred to as common linear models. Thus, the structural assumptions of regression analysis for one independent and one dependent variable are close to what we already know for single-factor variance analysis. It is assumed that any value of the criterion, the dependent variable, can be expressed as an additive sum of three components:

• the population constant μ;

• the effect of the independent variable τ;

• the effect of the experimental error ε.

Formally, this can be expressed by the following relationship:

Thus, the first assumption of a fixed linear model is that the relationship between the independent and the dependent variable is linear. Hence the model name is linear .

Another assumption of the model under consideration is that the independent variable, the predictor, takes all possible values ​​in the experiment, and is not a random sample of experimental observations from the general population. Such variables, as we recall, are called fixed variables. Hence the second model name is fixed .

In fact, in an experiment this may not be the case. However, these two assumptions do not pose serious problems for the experimenter: the non-linear dependence in many cases turns out to be fairly easy to transform to a linear form, and the fixation of values ​​is important only for interpreting the results of statistical analysis. The third assumption of the structural model under consideration is much more serious. It assumes that the effect of the experimental error does not depend on the predictor effects. In other words, it is assumed that the variance of the experimental error is the constant value, and the experimental error itself is distributed in the population in accordance with the normal law with parameters 0 and In this case, no assumptions are made about the nature of the distribution of each independent and dependent variable.

If the assumptions of the structural model of regression analysis are true, in particular the last, the third assumption, then the entire variance of the dependent variable Y should be described by two additive parts: 1) variance , associated with the impact of the independent variable (predictor) X, and 2) the variance resulting from < strong> experimental error. Since μ is a constant, it does not contribute to the variance of the dependent variable. Formally, this statement can be described by the following relationship:


Since the variance of x is that part of the variance of the independent variable X, which is at the same time variance Υ, t .e. , and the variance of the experimental error is that part of the variance Υ, which does not depend on the variance X, i.e. , equation (7.12) can be rewritten as follows:

Since none of the parts of this relationship are ever known - after all, we are talking about theoretical parameters, it makes sense to move from the original units of measure to the standardized ones, having carried out their 2-transformation. Thus, we obtain the following relationship:


From here we find:

The value 1 - r 2 determines the share of the residual variance. It shows how much of the variance of the dependent variable is not related to the action of the independent variable. In contrast, r 2 indicates which part of the independent variable is associated with the dependent variable and determines it (see Figure 7.2). In other words, knowing the value of r 2, you can judge how reliable the relationship between the independent and dependent variables is.

Correlation ratio of correlated variables X and Y

Fig. 7.2. Correlation variable variance ratio X and Y

For example, if the correlation coefficient between the variables Y and X turns out to be 0.70, then this means that only 49% of the variance of Y is associated with the variance of X and it is determined, and 51% of the variance of the dependent variable comes from other sources, not studied in the experiment - an experimental error.

Guided by these considerations, you can go to the original units of measure:

The last relationship describes the so-called standard regression error . The value is called alienation ratio. It shows the value of the standard deviation for the dependent variable Y , when that part of it that is common with X, is deleted. It can be referred to as the coefficient of uncorrelation, since the correlation coefficient itself obviously represents that part of the standard deviation Y, which is associated with X.

The standard error of the regression can be estimated from the results of the experiment as follows:

Interval estimation of parameters

Suppose we performed a series of experiments evaluating the statistical relationship of the two variables and constructing each time the newly obtained data for the linear regression equation. It is clear that each time we would receive somewhat different correlation and regression coefficients.

It is known, however, that the distribution of the regression coefficients will be approximately described by the normal distribution. The standard error for the mean slope, regression coefficient B, will be estimated as follows:

For a constant, regression coefficient A, the standard error can be estimated somewhat differently:

Thus, the distribution of the regression coefficients can be described in accordance with t -distribution with n - 2 degrees of freedom for tilt and n - 1 degrees of freedom for a constant. Since, as we know, the Student's distribution turns out to be symmetric regardless of the number of degrees of freedom, the boundaries of the confidence intervals of their values ​​can be found on the basis of our available regression coefficient values ​​and the corresponding standard error values ​​- In ± SE B or A SE A.

Similarly, you can build a confidence interval for the values ​​predicted by the regression model . The standard error for these values ​​is defined as follows:

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)