BIVARIATIVE CORRELATION MODELS, Statistical relationship...

BILATERAL CORRELATION MODELS

As a result of studying this chapter, the student must:

know

• concepts of correlation, covariance and regression;

• Methods for calculating correlation coefficients, covariance and regression for two variables;

• the main provisions of a fixed linear model of correlation analysis and its significance for testing hypotheses on correlation coefficients and regression;

• the possibility of testing statistical hypotheses about the equality of the correlation coefficient to an arbitrarily given value;

• the ability to compare two correlation coefficients;

be able to

• Evaluate the correlation coefficients and construct the equation of simple linear regression analytically and using statistical packages;

• To put forward and test statistical hypotheses concerning correlation coefficients and regression;

• correctly interpret the results of correlation and regression analysis for two variables;

own

• the basic conceptual apparatus of correlation and regression analysis for two variables;

• Skills manual and computer estimation of correlation coefficients and regression.

In Ch. 3, the simplest one-factor experimental plan was considered, in which each value of the dependent variable is acquired independently of the other. However, such an experimental plan is too uneconomic. Very often in the experiment it is required to reduce the number of subjects so that each of them gives not one but several values ​​of the dependent variable. In this case, an experimental plan with connected (dependent) samples is used. The features of such a plan, also called a plan with repeated measurements, were considered in Ch. 4. It was found that, in such an experimental plan, the total dispersion includes not only two, but three sources: the effect of the experimental effect, the effect of the subject and the effect of the experimental error. In order to extract the effect of the subject from the analysis, it is necessary to evaluate the measure of the dependence of the data in the repeated measurement. This same requirement is also present in more complex factorial experimental plans, if the repetition is carried out by one or more independent variables.

In this case, as noted, such a requirement can be realized by estimating correlation and covariance of coherent values or variables. In this chapter, we will become acquainted with these concepts in more detail.

Statistical relationship of characteristics

Correlation

Let's begin our examination of new theoretical concepts from the analysis of a concrete example (J. Cohen et al. [21]). Suppose that in the experiment with the help of specially developed psychodiagnostic procedures, the development of verbal and arithmetical abilities in children was evaluated. The possible results of such an experiment are presented in Table. 7.1.

In the leftmost column of Table. 7.1 lists the subjects who participated in the experiment. This list could contain the specific names of the subjects, however, since these very names do not represent any significant interest for the experimenter, they are often replaced by conditional numbers or other symbols of the subject. In the other two columns of Table. 7.1 shows the results of each subject in test scores. As you can see, the results of the arithmetic test are in a certain relationship with the results of the verbal test. So, the minimum result for both tests is demonstrated by the subject No. 1-5 on the verbal test and 12 points on the arithmetic test; the maximum result again for both tests is demonstrated by the subject No. 9 - 10 and 20 points respectively. Subjects with average results from the verbal test, as a rule, show average results for the arithmetic test.

It turns out that the results given in Table. 7.1, can be presented and more graphically. To do this, we set the values ​​of X, on the horizontal axis of the coordinates of the variable Y. Thus, , the result of each subject will be a point on the coordinate plane, and the data for all subjects will be a field of values, which is usually called the correlation field . The results of this kind of work are shown in Fig. 7.1. This graphical representation of data is commonly referred to as a dispersion diagram . It can be seen that with an increase in the score, obtained by the test but verbal test, there is a similar change in the result for the arithmetic test. In such cases, it is customary to say that the relationship of the two characteristics is positive. In the case when the value of one scale is increased, the value of the other is decreasing, they speak of a negative connection.

Table 7.1

Data for verbal and arithmetic tests for 15 pre-school-age subjects (adapted from work J , Cohen et al. [21])

Number of the examinee

Test, score

verbal ( X )

Arithmetic ( Y )

1

5

12

2

8

15

3

7

14

4

9

18

5

10

19

6

8

18

7

6

14

8

6

17

9

10

20

10

9

17

AND

7

15

12

7

16

13

9

16

14

6

13

15

9

16

Dispersion Diagram

Fig. 7.1. Dispersion Diagram

In addition, you can see that the connection identified in this experiment is linear. However, this ratio is not absolute. There is a tendency for the data to be arranged on one straight line, but in general the data form only elongated along this imaginary direct cloud. Obviously, the more this cloud is blurred by the relative straight line, the less the relationship of features and vice versa.

Theoretically, it is possible to obtain a result when the relationship between the two characteristics is not linear. In this case, it can be monotonous or nonmonotonic. Monotonous, but nonlinear, is, for example, the logarithmic relationship of features. A non-monotonic connection will be described, for example, using a parabola. In what follows, when speaking of the statistical relationship of characteristics, we will keep in mind the linear relationship. This does not mean that a nonlinear connection can not be estimated and expressed through statistical procedures. Just in this tutorial, we do not consider it.

Estimating the connection between verbal and arithmetic abilities on the eye, let's try to express them quantitatively, in the form of a number. It is not difficult to understand that the main problem of the quantitative evaluation of the statistical relationship will be to find the units of measure that are correlated with one another.

As shown in Table. 7.1, the score for the arithmetic test varies from 12 to 20, while the verbal ability score varies from 5 to 10. It is not clear how comparable this data can be in the original units. Apparently, even greater difficulties should arise, for example, when comparing the growth expressed in centimeters or inches, and the weight expressed in kilograms or pounds, although it is quite clear that there is some connection between these variables too: after all, when an increase in growth should be observed, at least some increase in weight.

One can not, however, say that such heterogeneous quantities can not be compared at all. In Ch. 1, as well as in a number of others, a statistical characteristic, depending on the initial values, was mentioned more than once, but still allows one to evaluate the relationship of the two characteristics. This measure, as we recall, is called covariance. It is essentially a variance, but it characterizes the distribution of not one but two variables simultaneously, and can be estimated using the following formula:

(7.1)

where n is the sample size (the number of subjects in the group).

As well as variance, the covariance estimate can be biased or unbiased. The formula (7.1) is a variant of the biased estimate. If it is necessary to obtain an unbiased covariance estimate, we can use a slightly corrected formula:

Covariance as a way of estimating the statistical relationship has two significant drawbacks. The point is that if there is no statistical relationship between the two variables, the covariance should be zero. However, the converse is not true: the relationship between two variables alone can be expressed by any number. In addition, the very value of covariance depends on the chosen scale of variables. Therefore, for example, when assessing the covariation of growth and weight, we get different values ​​for it when using centimeters and inches as a measure of growth, and as a measure of weight - kilograms and pounds.

That is why it is still necessary to translate the available data into comparable quantities, for example, express them in units of standard deviation. And then the measure of communication, which we will get in the end, does not depend on the initial scales of measurement. This method of linear transformation of the initial scale of results is known as the method of data normalization.

Let's denote the result that the first subject received in the test of arithmetic abilities, like X 1, and the result, which the same subject received according to the test of verbal abilities, - Y 1. Similarly, the results of the tenth subject in these tests can be indicated respectively - X 10 and Y 10. In the general case, we will denote the result obtained by the i-th subject in one test, X i, and his result for another test is Y i. To correlate the values ​​of X and Y, , you need to translate them into comparable values, for example, X and Y in standard deviation units. To do this, we use the linear transformation X and Y into z-values. The meaning of this transformation will be to extract the arithmetic mean from the sample and from each initial value X or Y divide the values ​​thus obtained into standard deviation values. Formally, these manipulations will be expressed by the following relationships:

Now, the result of the i -th test on the test of arithmetic abilities will be denoted z xi, his result for verbal abilities test - z yi. These data show how many units of the standard deviation the i-th test results from these tests differ from the average of the arithmetic results for these samples.

As you can easily see, the linear transformation of the original values ​​did not change the structure of the received data. But now the data for X and for Y can easily be correlated with each other. It is clear that if in the experiment an absolute linear positive relationship was found between the two features in question, then any change in the direction of increase or decrease would have a similar effect on the z y In other words, in this case for each test value z x and z y must be the same, and the total difference between z x and z y, hence, it must be zero. On the other hand, the more differences between the 2 values ​​ x and y and the greater the total difference between them, the less the relationship between the original variables. However, for some theoretical reasons, it is better to estimate not the total difference between the 2-values ​​ ι, and the sum of squares of such differences . Then, when evaluating the statistical relationship of two characteristics, we can use the following formula:

(7.2)

If it is necessary to evaluate the statistical relationship of two variables in the general population, then the presented formula will have the following form:

(7.3)

The quantity r in equations (7.2) and (7.3) is the correlation coefficient. In honor of the English mathematician C. Pearson, who first proposed this formula at the end of the nineteenth century, this coefficient is still called the Pearson correlation coefficient .

The correlation coefficient is a quantitative measure that reflects the magnitude of the statistical relationship between the two variables. It can vary only within certain limits, namely, from -1 to +1. If the correlation coefficient is zero, this means that there is no correlation between the two characteristics, i.e. their statistical independence. In this case, on the scattering diagram, we observe a random distribution of points along the entire plane of the diagram. The more the correlation coefficient differs from zero in one direction or the other, the greater the relationship of the features. The correlation field becomes more elongated, gradually becoming a straight line.

The sign of correlation means the nature of the relationship. A positive value of the correlation coefficient means that as the X increases, the Y, value increases accordingly, a negative value indicates that with an increase in X , a corresponding decrease in Y occurs. If the correlation coefficient turns out to be +1, the relationship between the variables is absolute, so one variable duplicates the other. Equally absolute is the relationship of two variables in the case where the correlation coefficient is -1. In this case, all the data in the correlation field is on the same straight line.

Thus, the sign of correlation indicates not the strength of the connection, but only its nature. The more the correlation coefficient differs from zero, the stronger the connection. Therefore, for example, a correlation coefficient of -0.72 indicates a stronger relationship than the correlation coefficient of +0.36.

If we try to estimate the value of the Pearson correlation coefficient for our case, we can find that it turns out to be 0.82. However, do not hurry with the calculations.

thematic pictures

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)