# Statistical control and the problem of reliability of measurements...

## Statistical control and reliability measurement problem

General principles of dispersion and correlation/regression analysis, considered above, are important not only in terms of implementing statistical control in correlation and quasi-experimental studies. These principles largely determine the important aspects of the organization of applied practical work of a psychologist-diagnostician, connected with the development of standardized psychodiagnostic techniques - tests and questionnaires. The basic requirements for such methods are the reliability and validity requirements of the psychometric procedure. In this section we will look at the extent to which the previously discussed procedures of correlation and variance analysis help in solving these psychometric problems.

## Reliability of psychometric procedures

The reliability of the result of psychodiagnostic measurement can be defined as its conformity to itself. It is assumed that the variance of the measurements taken can occur from two sources. One of these sources is a self-measured quality, another statistical source of variability, as in the case of experimental procedures, is a statistical error that reflects the influence of uncontrolled conditions accompanying the measurement procedure itself.

Thus, it can be assumed that any measurement result for an arbitrarily chosen i of the subject includes at least two additive parts-a measurable individual property, denoted by it as π, and the measurement error η. Formally, this can be expressed as follows:

The value of π is supposed to remain unchanged when using the same or similar measuring tools, while the measurement error η varies, determining the variance of the measurement result itself.

If we perform different measurements of the same individual property for one subject, k ,

Thus, the differences between k are determined by the differences in the measurement error. At the same time, by conducting the same type of measurement for n subjects, we get the differences in the X, differences in the manifestation of the investigated individual properties, and differences that are a consequence of the measurement error.

Table 8.3

The results of k measurements of the same individual property of y and subjects

 Subject Measuring Amount Average 1 ... j ... k 1 ... ... . . . ... i ... ... . . . ... n ... ... Amount ... ... Average ... ...

Suppose we have k measurements of the same individual characteristic for n subjects. These results are presented in Table. 8.3. As you can see, the structure of these data is almost identical to that already known from the statistical analysis of single-factor experimental plans with repeated measurements (see also Table 4.1). We know that in such plans there are two main sources of variance: 1) the differences between the subjects and 2) the differences within the subjects. In Table. 8.4 shows the theoretical expected mean square values ​​for these dispersion sources, given in terms of the structural model in question.

Table 8.4

Measurement variance for k measurements for n subjects

 Dispersion Source Expected values ​​for mean squares Between subjects Inside the subjects

The mean square, expressing the variability of the data for all subjects, is estimated as

The variance of the average ratings obtained by the examinees can obviously be estimated as follows:

Thus, we have

(8.10)

The expected value for the variance of the mean estimates obtained by n subjects in k dimensions, will include the variance the average values ​​of the measurement error and the variance of the measured individual property:

Then, using the ratio between the mean square between the subjects and the variance of the mean values ​​(8.10), one can find the expected values ​​of the mean square. They are reflected in tab. 8.4.

Let's express the reliability of the average score for k different dimensions for i -th subject as:

(8.11)

In other words, the reliability of the measurement can be defined as the ratio of the variance of the true value of the measured characteristic to the sum of the variance of the true value of the characteristic and the measurement error, which obviously represents the variance of the measurement result X. In general, this definition of reliability can be expressed as:

(8.12)

Formula (8.12), however, can also be expressed in another way:

(8.13)

The formula (8.13), otherwise known as by the Rühlon formula, shows that the reliability of the test is determined by the variance of the measurement error. The larger the error, the greater the uncertainty of the true value of the measured property. It is clear that in this situation, the empirical value of the result obtained by the subject in this test is not informative until this measurement error is evaluated with a certain degree of accuracy.

In order to assess the variance of the measured characteristic, it is necessary to subtract the mean square that determines the differences within the subjects from the mean square that determines the differences between the subjects, and divide the resulting difference by the number of measurements. To estimate the average value of the measurement variance, the mean square that determines the differences within the test subjects is to be divided by the number of measurements (see Table 8.4). Substituting the estimates of the dispersion of interest to us in (8.11) and multiplying the numerator and the denominator by the number of measurements, we have

(8.14)

or, which is the same,

(8.15)

The reliability of each particular measurement can be defined as:

Then the reliability estimate for it can be expressed as follows:

(8.16)

Taking into account the positions of the structural model of one-way analysis of variance with repeated measurements,

which we considered in detail in Ch. 4 (see subparagraph 4.1.1), equation (8.16) can be expressed as follows:

The result of dividing the covariance by the product of standard deviations of two quantities, as we found out, determines the correlation of these quantities. Hence, the determination of the value of the reliability of the measurement may involve the use of correlation methods. The specific procedures for using these methods depend on what we want to evaluate: the reliability of the whole test - k or only its individual items - 1.

As for the reliability of the test as a whole, after simple algebraic transformations of equation (8.14) it can be shown that

(8.17)

The formula (8.17) is known in psychometrics as the Spearman formula - Brown. It is fundamental for assessing reliability measurements.

If we talk about specific methodological procedures for assessing the reliability of the test, taken in psychometrics, then there are two aspects of determining reliability: 1) retest and 2) one-stage reliability.

Retest reliability is determined by repeating the measurement on the same sample of subjects, say, at intervals of one or two weeks. If the measured property has a sufficiently high degree of stability, for example, it is a measurement of intellectual abilities, the result of testing should be reproduced with a sufficiently high degree. To evaluate the possibility of reproducing the result, we can use the Pearson correlation coefficient already known.

However, the question arises: how can we assess the statistical reliability of the result? It is clear that in this case it would be pointless to advance the null hypothesis about the equality of the theoretical value of the correlation coefficient to the zero value. After all, we are interested not so much in the variance of the measured characteristic itself as in the magnitude of the residual dispersion, which determines the accuracy of the measurement.

We know that the fraction of the residual variance can be estimated as 1 - r 2. This means that even if the magnitude of the correlation coefficient between the two measurements is 0.9, the fraction of the residual dispersion and, consequently, the percentage of the measurement error can reach 20%. In other words, in the case considered, the spread of the test values ​​by about 1/5 turns out to be determined by the magnitude of the statistical error. The problem, however, is that such a correlation value, as a rule, can not be achieved in the validation process of psychodiagnostic techniques. Usually the correlation coefficient between two measurement results of the same characteristic is very rarely higher than the value 0.7-0.8.

The solution to this problem is to use a special corrective procedure to calculate the true X <. Such a true score can be obtained for i of the subject as follows:

>

where X '- the corrected test score; Xi is the empirical score i of the subject; X - mean but the test for all subjects; r is the correlation coefficient between the two tests.

Another way to determine the reliability of a test is to evaluate its one-stage reliability . It is the internal consistency of the various versions of the test, which are commonly referred to as parallel. It is not necessary that such parallel forms of the test exist in the final form. Most often they are obtained as a result of random splitting of the compound test into two halves. The easiest way is to use even and odd test items as such random halves of the whole test. Thus, testing occurs only once, which is especially important in situations where the measured property is significantly variable, such as the magnitude of the situational alarm. Then, a separate count of test scores is made for the even and odd halves of the test, and the correlation coefficient is calculated between these halves. To assess the reliability of the whole test, you can use the Spearman-Brown formula. For two dimensions, it looks like this:

It is clear that this coefficient of reliability depends on between which parts of the test the correlation is calculated. A more reliable method is to calculate synchronous reliability of the test.

This method allows you to take into account the possibilities of different partitioning of the test.

In the simplest case, if the result for each test item is a dichotomous value, for example, the answers "yes" and "no", an estimate of the synchronous reliability can be obtained from the Cowder formula - Richardson. The result of the calculation for this the formula is the coefficient:

(8.18)

where k - the number of items, or tasks, the test; - the variance estimate throughout the test; p j and q j represent the frequencies of each of the two possible responses to j -th test item.

In general, the Cronbach coefficient is used to estimate synchronous reliability:

(8.19)

where is the variance estimate for j the test item;

(8.18) and formulas (8.19), we are dealing with an estimate of the variance of the measurements we performed, the distribution of which, as is well known, is described in accordance with the law χ2, the statistical reliability of the coefficient a can be estimated as a result of the construction of statistics χ2, which has n - 1 degree of freedom:

where n - is the number of test subjects.

Assuming the homogeneity of the variational-covariance matrix of individual test items, the expression (8.19) can be represented by the Spearman-Brown formula for k measurements.

Now let's look at the reliability of individual test items.

Such an assessment is similar to assessing the reliability of the test as a whole. It can also be retest and one-shot.

When calculating the retest reliability of the item, the correlation coefficient between the two results of the same tasks by the same subjects is calculated. If the test uses dichotomous items ("decided" - "did not decide", "yes" - "no"), then in the "manual" It is convenient to use the formula of the φ-coefficient (7.8) (see Subsection 7.1.2). The obtained value of the correlation coefficient can be estimated relative to zero in accordance with the fixed linear model considered in Ch. 7. However, in practice this is not sufficient, since a statistically reliable difference in the correlation coefficient between the results of the two tests may not provide a sufficiently low residual dispersion, which, as we know, determines the accuracy of the measurements. Therefore, usually in the development and standardization of test procedures, it is customary to discard items that do not provide a correlation between the results of the two tests above 0.71, i.e. giving the value of residual dispersion above 50% level.

One-time, synchronous, reliability of points is estimated as a result of calculating the correlation coefficient of this item with a total score for all test items. Thus, the procedure for increasing the reliability of individual test items is essentially reduced to an increase in their internal consistency. However, an excessive increase in the internal consistency of individual test items can lead to the appearance of a pronounced, but obviously undesirable, negative excess in the distribution of test scores.

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

## Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

## How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)