Empirical test and examination of the test, Improvement of the...

Empirical test and examination of the test

The test with the attached table of the specification is subject to examination. As experts, subject teachers, methodologists, and education psychologists can speak.

The simplest version of the examination may be such that when experts are asked to evaluate the tasks from the point of view on a three- or five-point scale:

- compliance with testing purposes (responds - partially responds - does not respond);

- the uniqueness of the formulation (unambiguous - not absolutely unambiguous - ambiguous);

- the suitability of the answer options (fit - partially fit - are not appropriate).

In the most general form, test tasks should:

- match the content of the training material;

- be made in accordance with the relevant rules. Expertise on the compliance of test tasks with these parameters should be carried out in two main sections. Evaluation is subject to separate tasks, and the test as a whole. Experts are given a special assessment sheet, the main content of which is given below (for each item, a positive or negative assessment is provided).

I. The rationale and operationalization of the test construct.

1. Does the test generally meet the goals set for the educational program?

2. Does the test cover the necessary didactic units of the discipline under study?

3. Are the behavioral indicators of the tested content clearly defined or unambiguously identified?

4. Is it possible to answer the test question by applying another form of mental activity (skill, skill, mental action)?

5. Is the degree of difficulty of the test as a whole, its individual tasks and subtests substantiated?

II. The validity of test tasks and their evaluation.

1. Is the technological matrix prepared and does it reflect the tested content and types of cognitive activity of the student?

2. Does the content correspond to the form and variety of test tasks?

3. Do the expert answers the test key?

4. Is there a grading of grades for an open type job?

5. Is the accounting for incorrect answers?

After working out and conducting expert analysis of tasks, a working version of the test is prepared. To ensure its quality, it is required to perform an "empirical test" on a group of subjects who have the same educational characteristics as the one for which the test is designed. In a number of cases, especially if a standardized test is designed, analysis of difficulty and discriminating ability of test tasks using mathematical methods is performed, which allows obtaining information about the tasks that can not always be identified with using expert methods.

Difficulty - is the characteristic of the test tasks, reflecting the statistical level of its solvability in the sample on which the test is empirically tested.

In teacher tests, it is usually calculated as the ratio of the number of testers who correctly completed the task to the total number of tested. This indicator varies from zero to one. Its values ​​are greater the lower the difficulty of the job.

Testing the criterion-oriented test (COURT) is to select the tasks that adequately reflect the fulfillment of the learning task. The quality of the task is not determined by whether they are difficult or easy, whether they are subject to a normal distribution of results or not. If it is confirmed that the majority of subjects who have completed a certain stage of training copes with the test task, and most untrained students do not cope with it, then this can serve as a necessary basis for including this task in the CORT. Necessary, but not sufficient. The researcher must also make sure that the subjects who successfully completed the tasks did apply the skills laid down in the criterion, and not merely showed their ability to memorize the necessary terms or mechanically reproduce the required action algorithms. Therefore, the analysis of the task in the CORT should be focused on a thorough check of the composition of the task, and not only on its statistical properties.

To calculate the distinguishing ability test items, two series of measurements are required: re-testing one group of students or conducting a test on two different groups. The expert chooses from the group of tested only those about whom he can definitely state that they are very good at the material and, conversely, those who do not know the material. They are, respectively, a strong and weak contrast groups. It is important that the contrast groups are, if possible, homogeneous in composition. This means that students of both sexes, with different socio-cultural characteristics, with different academic status (achievement) should be represented in approximately the same proportion.

The simplest and most known indicator of the distinguishing ability (Yar) is calculated as the difference between the proportion of subjects from the "strong" groups that correctly completed the assignment, and the proportion of subjects from the weak group, also correctly completed the task. This indicator is calculated by the following formula:

where Yar - the indicator of the discriminating ability; (And/V2 is the number of subjects in the "strong" and "weak" contrast groups; AND] and P - the number of subjects in the "strong" and " weak contrast groups that correctly completed the task.

This metric can take values ​​from -1 to +1. If yar is equal to +1, this means that the task has the maximum distinguishing ability. If yar is 0, the task does not distinguish between subjects who have mastered and not mastered the teaching material. If yar is -1, which is practically not encountered, then the task distinguishes the subjects, but in the opposite way: the unqualified subject content correctly answers, and those who master them answer incorrectly.

An important contribution to the empirical test test is the evaluation of its reliability and validity. Reliability is usually understood as the consistency of test results on the same sample of subjects under different conditions. The most often resorted to the so-called retestovoy reliability, for the establishment of which on the same group of subjects tested twice, and compare the test scores obtained. The main attention should be paid to the correct choice of the time interval between the two tests. If the time interval is too short (less than two months), then the subjects will simply remember their previous answers and most likely repeat them. If the time interval is long (more than 8 months), then the measured property can change under the influence of some factors, then the noticeable differences between the results of the first and second tests will not be related to the low reliability of the test, namely, to the change in the measured property. The validity check is based on learning the learning objectives that students are expected to achieve. Learning objectives should also be the objectives of a test of achievements with respect to measurable knowledge, skills, and skills. This type of validity is called validity pertaining to the content. There are other kinds of test validity, but they are not as well suited to this category of tests.

The measure of the validity of the test is the degree to which the tasks of the test correspond to the goals of the themes, sections, subject areas tested with them. Comparing the tasks of the test with the objectives of their study, the developer of the test should determine whether these tasks cover, first, all the goals that are significant for training (so that no goal remains unmeasured), and second, the goals that correspond to each task test, are formulated in such a way that in these formulations it is indicated what actions the student performing the task should perform.

If, for example, the verb identify is used in the goal formulation, then the task of the test, in which achievement of this goal is measured, must also require that the student indicate the correct answer, as suggested in the multiple choice job. The verb describe would require an answer in free form, as in an essay test, and the verbs demonstrate or build - the actual implementation of these actions, at least on paper using a pencil. Valid tasks require the same actions that are specified in the goals they measure.

To be recognized as valid, the test assignments must also match the content of the educational goal they measure. The fact, concept or rule, named in the test purpose, should be what is measured in the test task. And this correspondence should be as accurate as possible. If, for example, the test goal says "describe a typical representative of the amphibian class", then the tasks should reveal how the knowledge about amphibians is formed, and not about the representatives of other zoological groups.

The question of the extent to which the task of the test adequately represents a specific content area, has a qualitative rather than a quantitative solution. However, expert opinions on the content of the test can be based on some quantitative indicators. Usually as such indicators are used:

- the percentage of assignments that meet the learning objectives;

- the percentage of assignments that meet the goals with a high score of importance;

- congruence (consistency and proportionality) of the tasks and objectives of the test.

The last indicator can be used to assess the degree of content validity of the developed test. The maximum possible value for estimating congruence between tasks and tested targets, equal to 1.0, can be obtained only if all the experts indicated the proportionality of the task and the goal. If the individual task of the test is for different purposes, then its indicator will be below 1.0. After a proper test, the test is ready for use.

Improving the test and analyzing the results of its application

The experience of testing, like any aspect of psychological practice, needs an analysis of its application. If we turn to the separate provisions of the Code of Fair Practice of Testing in Education, prepared by the Joint Committee on US Testing Practices, it can be noted that the work to improve the testing procedure and evaluate the diagnostic capabilities of the test continues after it is put into circulation.

In accordance with the requirements of the Code, making the appropriate tests, their developers must:

- determine what each test measures and what it should be used for;

- accurately represent the characteristics, utility and limitations of the tests with respect to the assessment of what they are intended for;

- to explain the measured concepts and skills at the required level of clarity and in details that correspond to the characteristics of the audience for which they are intended;

- describe the process of developing the tests, explain, on the basis of which the selection of content and evaluated skills was made;

- provide clear evidence that the test measures what it is intended for;

- provide users with the necessary qualifications or samples of all tasks, or complete their lists with a mandatory application, including lists of answers, descriptions and test scores;

- to identify and publish information about any special skills necessary for conducting tests and interpreting the results.

When interpreting the indicators, the test developer must:

- on time and readily provide the obtained indicators describing the performance of tests with the necessary level of clarity and certainty;

- describe a population representing any norm or a comparative group;

- Provide information that can help users follow the necessary procedures to obtain the desired metrics and their suitability for use.

The evaluation criteria used in the test are important to save until the next use. This will improve the assessment process and compare the achievements of students in successive stages of learning.

thematic pictures

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)