## Standardized tests and informal performance tests

* Standardized tests - * are published tests used to summarize success. Since such tests are conducted in many schools throughout the country, their content can not in every case correspond to the curriculum to the same extent as the tests compiled by the teachers correspond to it. They are intended to determine the correspondence of achievements with some ideal national curriculum (often based on the synthesis of material from various textbooks). These tests were written by professional compilers, the tasks included in them were tested, and the results were subjected to special analysis, and as a result, such tools are more reliable than tests compiled by teachers. In fact, their reliability is almost always between 0.90 and 0.97. Therefore, such tests are accurate and consistent, and their results correspond to the normal distribution curve. To obtain the norms, the test compilers give the published tests to a representative group of the tested, called the normalization group. Members of this group are selected to represent students whose results will subsequently be assessed by comparison with the results obtained by the normalization group. Therefore, the members of the rating group represent all ages, all levels of attestation, all parts of the country (both geographical and urban, suburban and rural), all ethnic groups and all socio-economic levels of the population, so that the resulting norms have the widest applicability.

The method of conducting these tests is as standardized as the tests themselves (another reason why they are called standardized ones). Instructions for their conduct are detailed in detail.

Among the standardized tests of achievements, the following main subtypes can be distinguished:

* - batteries of common achievements, * adapted to measure the final results of mastering educational programs;

* - tests for specific subjects * used in schools, colleges and universities to monitor the effectiveness of subject learning;

* - tests of monitoring educational outcomes, * the main purpose of which is to obtain information about the results of training in different countries of the world, compared at the international level.

Common achievements batteries are widely-oriented tests, they are used to assess the achievement of the main, long-term learning goals. Such batteries are being developed at the beginning of the 20th century and are now being used in US educational practice.

Most test batteries cover all levels of schooling and are a coordinated series of tests that allow students to compare results at different stages of training. One of the most famous batteries is the Stanford Achievement Test Series.

It is designed to measure the basic levels of reading, math, native language, natural and social sciences, and listening skills for elementary and middle school students in the United States.

The seventh, the latest version of the battery is used in different educational groups - from kindergarten to 13th grade. The entire battery takes five half-hour and two 15-minute periods with short interruptions between them.

The test battery consists of the following subtests:

* Dictionary & quot ;: * the subtest learns the vocabulary by using an oral presentation of unfinished sentences asking the child to choose the most appropriate word from the given words.

* & Understanding Read: * the child is asked to read a piece of prose or a poem and to each ask a series of questions. In order to give correct answers, the student should be able to distinguish the main idea of the passage, key moments of the text, understand its hidden meaning, be able to draw conclusions from the read.

* Speech analysis skills: " * the student must pronounce visually presented to him individual letters and their combinations, compose words from syllables.

* Mathematical concepts & quot ;: * the subtest studies the understanding of mathematical terms and systems of notation and actions, eg fractions, sets, percentages, etc.

* Mathematical Calculations ": * the subtest includes an assessment of the ability to act with numbers (no letter symbols are used.)

* Application of mathematics ": * the subtest contains typical arithmetic tasks, measurements and scheduling tasks, etc.

* Literacy & quot ;: * finding incorrectly written words.

* Language : * subtest sets the ability to correctly use uppercase letters, verbs and pronoun forms, faithfully build sentences, observe punctuation rules, etc.

* Social Spiders & quot ;: * requires the execution of tasks based on knowledge from history, economics, politics, sociology, etc.

* Natural Spiders & quot ;: * the subtest contains tasks that reveal knowledge of some research methods and terms from the field of physics and biology.

* Understanding Listened ": * subtest contains tasks to save and organize information.

The latest edition of the test battery includes free-response tasks and an expanded set of tasks with the choice of the answer. This is done in order to measure higher-order thinking skills. Various methods for presenting test results are also used: the obtained profile of the indices for individual subtests or in specific areas of study is subject to horizontal and (or) vertical comparison. Thus, the relative position of each student is evaluated based on the results of a single sample, and the student's progress from class to class can be displayed in units of one scorecard.

Among the test batteries, a special place is occupied by tests offered in graduating classes. This is primarily SAT-I ** (Scholastic Aptitude Test ** or * Scholastic Assessment Test), *, which is essentially a school final exam.

In 2005, the test was made a bit more complicated. A written test (essay) was introduced. The new version of SAT is called SAT Reasoning Test. The mathematical part was expanded, and the linguistic part was renamed * Critical Reading * (in this case - text analysis).

The mathematical section of the test battery is based on the content of algebra problems (set functions, number modulus, radical equation, degrees and functions) and some geometry elements. There are three groups of test tasks - with multiple choice, tasks for comparison and tasks with free design of answers. Multiple choice questions are standard tasks: you have to solve the problem and choose the right answer from the proposed options. In questions of comparison, two quantities are given, and it is necessary to decide how these quantities relate: whether they are equal, or one more than the other. In terms of content, tasks with a freely constructed answer are similar to tasks with multiple choice: it is necessary to solve the problem and justify the answer yourself, but only the answers are not given.

The linguistic part of the test includes the "Read analysis" section. It consists of assignments of multiple choice (in all questions one must choose one correct answer from the five suggested ones). The section includes three groups of tasks. The first of these is the "Understanding Read". In it, students are offered one or two texts, followed by six to thirteen questions. These are questions about the main idea of the text, specific details, the author's attitude to the essence of the question under consideration, the logic and technique of the author's presentation of the material, the conclusions from the discussion, the meaning of individual words. Then follows the subtest "Analogies". This is a traditional form of test tasks: pairs of words are suggested and you must first determine the type of connection in them, and then find a similar or parallel connection in another pair. In the third group of jobs, Additions to sentences it is necessary to fill in the gaps in the sentence by searching for words or phrases that are best suited from a grammatical and semantic point of view.

The written part of the linguistic part of the test is divided into multiple choice and writing essays. All tasks are aimed at verifying the knowledge of grammar and the ability to select words and phrases and are divided into three typical groups: 1) identification of the utterance errors, where it is necessary to identify the presence of grammatical or syntactic errors or to prove that they are absent; 2) improving the statement to make it more consistent, without changing its meaning; 3) improvement of paragraphs (six questions similar to the previous task, but with more detailed answers).

The essay is given the introductory part, and students are required to continue this statement, refuting or confirming it. Of the 60 minutes allocated to this part of the test, the essay takes 25 minutes.

The total number of questions in the test battery is 138, 60 of them in mathematics and 78 verbal. For each of the questions in the subtest, several variants of answers are proposed, having the letter designation - A, B, C, D, E. On a separate sheet are printed issue numbers and against them - the letters listed. The subject should note the one that, in his opinion, corresponds to the correct answer. If he does not mark a single answer, he loses a point. If he answers incorrectly, he loses 1/4 points. Obviously, such a system of counting allows even incomplete knowledge of the question to be taken into account.

The final score is the score calculated from the special table. The latter is constructed in such a way that the required number of points for each section (800) is obtained by the entrant, correctly solving 90% of the assignments. It should be noted that all indicators can be expressed in a single scale for all battery levels. For each half of the test - verbal and mathematical - the pre-determined mean (M) is 500, and the predetermined standard deviation (5) is 100. By taking the total result or both halves of the test together, we get * M =*

An alternative to SAT Reasoning Test is the American College Testing (ACT) test for applicants. It began to be used as a competitive SAT-1 test battery since 1959.

The standard version of the test consists of four sections: English, mathematics, reading and scientific reasoning; In 2004, they added an additional section - writing an essay, which is performed at the request of the applicant. Some US universities require exactly the results of an extended version of ACT, so you can say that two versions of the test are used: ACT and ACT Plus Writing.

English section takes 45 minutes. During this time, you need to read five small texts and answer questions with options, 15 questions per text. Questions are focused on correcting errors in texts. It takes 60 minutes to break up mathematics, and it includes 60 questions on elementary algebra and trigonometry, geometry and arithmetic. It is allowed to use only simple calculators without computerized functions.

Reading section requires 40 minutes; for this time you need to read 4 excerpts from different books or magazines (prose, social sciences, art and science), and answer questions on each of them.

The section of scientific reasoning is given 35 minutes, you need to read seven passages and answer 5-7 questions to each of them. The questions are aimed at finding logical connections, a critical approach to different points of view, predicting results, understanding the basic concepts and theories presented in the texts.

Additional section - essay writing - takes 30 minutes. Subjects are offered an excerpt showing a social problem, and a detailed comment should be given. The standard structure of the essay is not required.

The whole ACT exam consists of four tests (in mathematics, reading, English and science). For each of them you can get up to 800 points. The test in mathematics includes 60 questions, it takes 60 minutes to complete it.

If general performance batteries are aimed at measuring basic educational skills, then standardized tests for specific subjects measure the level of achievement in the subject areas of knowledge studied at school and in college. The use of such tests is significantly increased due to the fact that modern students specialize in the study of specific scientific disciplines.

A special place in the American testing system is taken by a test battery SAT-P * (Subject Test). * It was introduced into the practice of testing the knowledge and skills of school leavers in 1994 and includes subject tests in all disciplines of the natural-mathematical and humanitarian cycles, including foreign ones - European and Asian languages. As a rule, the passage of SAT-II can be voluntary, although a number of colleges and universities require compulsory execution.

For example, the content of the SAT-II section on World History includes 95 questions on the history of Africa, Asia, North and South America, and European history. The history of the USA does not enter here - it is given a separate test. Applicants should be able to analyze excerpts from speeches of politicians, various documents, it is supposed to get acquainted with the main directions of art and material culture. The answers require knowledge and understanding of both the facts of the past, and the events of modernity, cause-effect relationships, the main trends in the development of world history.

In the US, standardized tests are designed for almost every subject: from history to physical culture. Such narrowly oriented tests are used as final exams for a particular course of study. They also perform an equally important function - the definition of strengths and weaknesses in the assimilation of subject-specific skills and knowledge. Secondary school students with additional training in certain fields of knowledge can be tested on the pre-selection screen adopted by the college council.

A special place among the standardized tests of achievements is taken by * basic skills assessment tests, * used as a means of confirming the educational minimum and as a basis for issuing a secondary school certificate. The need for the development and application of such tests in the US was initiated by the report "The Nation at Risk", presented by the National Commission for the Improvement of Education. It argued that the American nation faces an immediate threat of reducing the overall educational level. In this regard, the speakers insisted on the introduction of a minimum test standard, a common test package, based on which the minimum level of school success will be established.

*are developed for both schoolchildren and adults in connection with the implementation of educational programs in institutions of a particular type (for example, in prisons) and to determine readiness for mastery of vocational training programs.*

**Standardized tests for the Minimum Competency Test** To assess the achievement of standards in mathematics, reading and English, * tests of minimum competence are used. * In 1992, in the US, they were used as mandatory monitoring tools at various levels of teaching the learning achievements of individual students, the class as a whole, and for comparing schools. A typical math test for the 6th, 8th and 10th grades consists of 70 quests with a choice of answer, divided into two parts of 40 minutes (35 tasks in each part). The results of the students are presented on a four-level scale: they have learned excellent (level 4), learned (level 3), not learned (level 2) and has significant problems (level 1). To obtain level 3, students must complete 70% of the tasks. Students who have not learned the course are usually sent to summer schools, in which special classes for the laggards are organized. Sometimes such students are left in the second year.

Tests of minimal competence are now widely used in US schools. They evaluate not educational achievement in general, but the competence of the student in the context of making a decision (on the basis of testing results) about the future path of the subject: training at the next stage of the school or transition to professional activity. The expression * minimum competence * means that not every level of educational preparation is evaluated, including as much as possible, but only sufficient and necessary level in order to go to the last step and start a professional activity.

Specific training on the formation of basic skills tested in the framework of tests of minimum competence can significantly limit the learning process and reduce attention to the formation of other planned learning outcomes. Therefore, the use of full-scale achievement tests that evaluate the entire range of skills at different levels is, according to N. Grönlund, a measure that partially but not completely can solve this problem.

Despite the criticism of the tests of minimum competence, they are the main tool that assesses the achievement of standards developed in the US in different states. These tests are conducted in almost all subjects. In addition to tests of minimal competence, students can perform an advanced level test. In the certificate of graduation, a record is made of the level at which the subject is mastered - at the level of basic skills or an elevated level.

In England, in the middle of the XX century. a unified system of subject testing was built. At the present time, at the end of the first three key age-educational stages, i.e. at the age of 7, 11 and 14 years, the level of achievements on core subjects. To do this, students participate in the so-called standard assessment assignments of academic skills * (National Curriculum Tests). * After completing the fourth key step, i.e. at the age of 16, will take place the so-called GCSE exam

*Based on its results, a general certificate of secondary education is issued. In addition, children who begin school education are measured by the level of achievement, serving as a basis for further education.*

**(General Certificate of Secondary Education).**For example, tests in English as part of the examination for secondary education consist of 2 parts. The first part is given 90 minutes, and the second - 75 minutes. The first part of the test verifies the skills in the field of reading and writing. At the beginning, students having read the text should find the relevant information in the text, demonstrate understanding of the text, analyze the text in terms of the techniques used by the author, the language or the structure of the text (maximum score for performance is 17 points). Then the students are offered a situation (for example, imagine themselves as the museum's director), on the basis of which they should prepare a written work (write a letter to school principals asking them to visit the museum more frequently), which assesses their communicative skills: how students express and organize their thoughts in writing, as well as their literacy in spelling and punctuation. Students must complete three tasks: the first - mandatory for all (11 points), the second - at the choice of two proposed (33 points). In the second part of the test, reading and writing skills are tested (based on one of Shakespeare's plays: Julius Caesar, Midsummer Night's Sleep and Romeo and Juliet) and one assignment is performed on the choice of the two offered. For the performance of this part of the test (one task), students could get a maximum of 22 points.

An additional high-level test (for 90 minutes) evaluates the same skills, but at a higher level. This test is suggested to be passed the other day only to those students who showed high results in the performance of the first two parts. Students must complete two tasks (one mandatory for all - 18 points, the second choice of two - 18 points). All tasks in all notebooks in all subjects require a self-written response. All works are checked by a specially prepared group of reviewers, and in the event of a divergence of opinion from the assessment, abstain.

Standardized tests are part of the monitoring programs and are aimed at identifying and comparing the changes in the levels of development of interdisciplinary general educational skills among students from different countries. The most well-known * international monitoring tests * are PIS A

**(International Student Assessment Program ) and TIMSS***Trends in mathematical and natural science education).*

**(Trends in Mathematics and Science Study -**The PISA monitoring study refers to the assessment of schoolchildren who have reached the age of 15 years, general educational skills, having an interdisciplinary nature. The characteristics of students that determine their ability to learn (motivation, self-esteem, learning strategies, etc.) are also studied.

The tests used in the PISA are included in the following sections, which allow us to identify competences that are relevant to modern people.

1. Mathematical literacy is a person's ability to define and understand the role of mathematics in the world in which he lives, to express well-grounded mathematical judgments and to use mathematics in order to satisfy in the present and the future the needs inherent in a creative, interested and thinking citizen.

2. Natural science literacy is the ability to use natural science knowledge, identify problems and draw valid conclusions necessary for understanding the surrounding world and the changes that human activity brings to it, and for making appropriate decisions.

3. Reading literacy is the ability to comprehend written texts and reflect on them, to use their content to achieve their own goals, develop knowledge and opportunities, and actively participate in society. In this case, the assessment is not the technique of reading and the literal understanding of the text, but the understanding and use of the read for various purposes.

4. Competence in solving problems is the ability to use cognitive skills to solve interdisciplinary real problems in which the method of solution at first glance is not clearly defined.

Each student receives a test notebook, which includes about 42 to 62 closed and open assignments (those and others are equally divided). The test time is 120 minutes.

Performing tasks with free detailed answers is checked by a group of experienced teachers, and then a part of the work (every fourth notebook) is rechecked by another troupe of teachers. After that, some of the test notebooks are highlighted, which is rechecked by international experts. Tasks for which experts in the participating countries give uncoordinated estimates are excluded from the analysis. To control the quality of the inspection, a certain part of the students' work is rechecked in the country, and then re-checked by specialists from other countries.

The reliability of the verification for each country is given in the annual technical reports. Usually the reliability of verification results in international comparative studies is quite high and is 0.8-0.95 for most assignments for all countries.

For the performance of the test tasks (as well as questions to them), an international 1000-point scale is assigned for each task group (reading, math and science), depending on how successful the task is performed by all the tested . The international scale has the following characteristics: the average was 500 points, the standard deviation is 100, which means that about 2/3 of the students in all countries participating in the study have results ranging from 400 to 600 points. With some degree of probability, we can assume that the number of points of each tested person shows which tasks (the most difficult) a given student can perform. The average number of points for each country shows which tasks (the most difficult) are most likely to be performed by the average student of a given country. This average number of points is defined as the average score on the curve of the normal distribution of the results of the pupils of the respective country.

For example, the PISA mathematical literacy test in terms of mathematical education measures mathematical competence - the most common mathematical abilities and skills, including mathematical thinking, written and oral mathematical arguments, the formulation and solution of a problem, mathematical modeling, the use of mathematical language, the use of modern technology (for example, computer science). The test includes 16 tasks (32 questions).

The test takes into account three levels of mathematical competence:

- low level (the first stage of the results, which is estimated by the number of points within 358-420 and below) includes the reproduction of mathematical facts, methods and calculations;

- the average level (the second and third steps of the results, which are estimated by the number of points, respectively in the range 421-482 and 483-544) - is the establishment of links and the integration of material from different mathematical topics needed to solve the task;

- a high level (the fourth, fifth and sixth steps of the results, which are estimated by the number of points, respectively within 545-606, 607-668 and 669 and above) - mathematical reflections that require generalization and intuition.

To test the achievement of the first level of competence, traditional teaching tasks are mainly offered, the second level is checked by solving simple life tasks, to check the achievement of the third level, more complex tasks are developed, in which, first of all, it is necessary to submit a mathematical description of the proposed life situation in a situation the problem solved by means of mathematics, to develop the corresponding mathematical model and solve the problem using mat mathematical reasoning and generalizations.

* Example *

A task that corresponds to a high level of competence. In one country in 2003, $ 30 million was allocated from the national budget for defense. The total budget of the country for this year was $ 500 million. The following year, $ 35 million was allocated for defense, with a total budget of $ 605 million. Inflation in these two years was 10%.

1. You are invited to give a lecture in the society of pacifists. You intend to show that the defense budget has declined during this time. Explain how you do it.

2. You are invited to give a lecture at the military academy. You intend to show that the budget for defense has increased over this period. Explain how you do it.

Accordingly, the PISA material set includes tests of literacy, natural science literacy and competence tests in the application of subject knowledge to solve everyday problems. This last component of the PISA program was given the English name CCC * (Cross cwricular competencies) *

*The test assignments for assessing the competence of this kind represent three types of problem posing: a) finding a solution (for example, finding an analgesic drug from the list of such medications that is most suitable for a particular patient); b) Carrying out system analysis and image formation (for example, buying a rack for compact discs: you need to create a proposal in which to clearly describe the device of such a rack so that you can quickly and easily find the required disk); c) search for an error, a defect (for example, the operation of an air pump: on the basis of the proposed drawing or sketch indicate the possible reasons for the pump to function incorrectly). In all competency tests, as in the mathematical test, three levels are identified.*

**.**In order to monitor educational results, the International comparative study T1M88 is also being conducted, which identifies features of the preparation of primary school graduates (TSHBB-D) and graduates of the 8th grades (TSHBB-c). Not only the elements of the mastered subject content are subject to study, but also the types of educational and cognitive activity.

The monitoring program includes tests of achievements (12 options), questionnaires for students, teachers of mathematics and natural science subjects, the school administration. For example, math assignments cover the following topics: "Numbers", "Algebra", "Measurements", "Geometry", "Work with data". The following skills are assessed: the knowledge of facts and methods (to reproduce, recognize, calculate, use tools), use concepts (classify, represent, represent, formulate the condition of a task or situation, distinguish), solve standard (typical) problems (choose a solution method, create a model , to interpret, apply, check), mathematical reasoning (to put forward hypotheses, to make assumptions and forecasts, to analyze, evaluate, generalize, establish connections, solve non-standard problems, prove, etc.)

To assess the educational achievements in mathematics and science are used tasks of various types (with the choice of answer, open tasks with a short and full detailed answer, practical tasks). Tests used in the TSHBB allow: to make a comparative assessment of the level of educational achievements of students in primary and basic schools of different countries; to reveal the changes in the quality of mathematical and natural-science education that occur during the transition from primary to primary school (the same set of students is surveyed: in 4 years, students of final grades of primary school become pupils of the 8th grade).

The variety and dynamism of changes in the daily educational practice of educational institutions stimulate the use of tests as an operational form of measuring the educational goals achieved. Standardized tests are cumbersome, as a rule, are not adapted to local educational groups, and therefore they can not meet the teacher's needs.

In English-speaking countries, unformalized tests are called teacher-made tests, or * teacher tests. * The most important feature that distinguishes them from formalized, standardized tests is that their creators are teachers themselves. The methodological tasks that accompany the construction of these tests are determined by the teacher's professional preparation, the technical means at his disposal, and the availability of time. Another important feature that distinguishes non-formalized tests from standardized ones is the orientation of these tests on the pedagogical needs of one class or a few classes.

Insignificant costs for creating informal tests. Their verification does not require compulsory interregional research, since teacher tests are locally used tools. The variety of programs for studying the same subject in different classes and schools does not always allow using standardized tests.

Teachers themselves decide on the function of the test, as well as on the possibilities of its application in the learning process. Comparing the level of mathematical knowledge among students studying under the traditional program, with a group of schoolchildren who are studying an alternative training course, determining the initial level of training for entering universities or establishing difficulties in mastering new sections of the school curriculum - all these tasks require the development of special tests.

Teachers also determine on their own who will be tested. For example, if it is necessary to form a group of students for corrective lessons in the mother tongue, the teacher can separate the strong students, and carry out the necessary tests only with the rest of the class.

The creation of tests is usually preceded by the observation and analysis of students' inherent ways of mastering the teaching material. Difficulties in performing tasks are fixed, classified and, in some cases, can take the form of Diagnostograms & quot ;. Such kind of research work brings the teacher's tests closer to psychological criteria-oriented tests.

The spread of teacher tests helps to acquire the skills of more detailed curriculum planning, and many traditional written works and verbal tests will give way to objective methods of measuring academic achievement.

Preparing tests for working in the classroom can be greatly improved if you take advantage of the experience of professional test creators.

