Validity and stability of analysis in medical education

Assessments have grown to be a hallmark of the quality of any educational system and with a larger knowledge of learning and developments in the field of psychometrics, assessors and test coders have been held accountable for the inferences that are created on the basis of the assessment scores. It has resulted in validation exercises in assessment and all educational assessors have to consider validity sooner or later with their work.

Determining the validity and stability of assessments has been the mainstay of the validation exercises. However, good advancements in educational mindset and learning ideas - over the last sixty years - the idea of validity has broadened1, 2, 3 and the study question for determining validity has transferred from how valid is the instrument, to, is the inference made on the basis of this device valid for the group of individuals that it is being made and for the purpose of diagnosis results?

The intent of this paper is to presents the current definition and the sources/aspects of validity and stability vis- -vis medical education literature printed in peer evaluated journals and content material books accompanied by detailed discussion of one of the areas of validity data namely predictive validity and its utility for the admission tests and processes in medical education. The paper is organized into the following section

Validity and sources of validity evidence

Threats to validity

Reliability and factors influencing reliability

Predictive Validity of admission tests

Discussion and conclusion

Definition of Validity

According to Vlueten and Schuwirth4 validity identifies whether a musical instrument actually measures what it is supposed to measure. Which means that it is vital for the individual developing the examination instrument to make sure that all items of the instrument work for the purpose of measurement (diagnosis). Thus by-and-large the validity associated with an evaluation method would be dependent on the "intrinsic meaning" of the items that define the instrument which includes this content and the cognitive process that this assessment is trying to gauge1, 3, 4, 5.

Downing6 increases the understanding of the concept of validity by preserving that validity is not really a yes or no decision rather it's the degree to which facts and theory support the interpretations of test results for the suggested uses of checks. This lends itself to the need for a theoretical basis for interpreting the results of the test and provides due importance to the procedure of validating against some theory or hypothesis. Thus validity is not really a quality of the device in itself but identifies the evidence presented to aid or refute the meaning or interpretation allocated to assessment results for the precise group of test takers and the goal of the test.

Sources of validity evidence:

According to the present understanding of the concept of validity, it needs a chain of evidence to aid the interpretations which are made based on the test report1, 2. The data would help link the interpretation thus made to a theory, hypotheses and logic leading to either acknowledging or refusing the interpretations. Resources of facts1 include i) proof this content representativeness of the test materials, ii) the response process is the statistical characteristics of the evaluation questions, iii) the internal composition of the diagnosis, iv) correlation of assessment ratings to other variable's credit score (criterion strategy) and v) the result of assessment results for students.

The criteria7 recommended to use various resources since strong data from one source does not preclude the necessity to seek information from other resources. Some types of diagnosis demand a more powerful emphasis on a number of sources of evidence as opposed to other sources rather than all sources of data or evidence are necessary for all assessments. These resources for the data required are briefly discussed below1, 7.

1. Proof for content validity is obtained from test blueprint or test features which ideally describe the subcategories and sub-classifications of content and specifies precisely the percentage of test questions in each category and the cognitive level expected to be assessed by those questions. The test technical specs are reflective of the emphasis put on content considering how essential and /or important it is perfect for the amount of student being assessed and the required level of cognitive capacity. Therefore while checking for validity information the researcher correlates the level of cognitive potential presumably evaluated by the questions included in the test with the desired level as specified. The number of questions and their technical appropriateness also provides evidence for content-related validity. Hence, validation by subject experts and quality check by specialized experts are both needed for providing evidence of content validity.

2. Evidence about the response process is gathered by providing evidence that all resources of error which might be from the supervision of the test are minimized to the maximum possible. This consists of evidence regarding accuracy of response tips, quality control mechanisms of data from the assessments, appropriateness of methods used to secure a composite score from ratings received from different kinds of assessments and the usefulness and the accuracy of the rating reviews provided to examinees.

3. Facts for internal structure depends upon statistical romantic relationship between and among other procedures of the same or different but related constructs or traits. The psychometric characteristics required as evidence under this head include difficulty and discrimination indices, dependability and /or generalizability coefficients etc. High reliability coefficients show that if the test were to be repeated as time passes, examinees would obtain about the same results on retesting as they received the very first time. This aspect is dealt in more detail in the section on reliability later in the newspaper.

4. Facts regarding relationship of assessment scores to other variable's (criterion measure's) results requires the test to be 'validated' against a preexisting, older measure with well known characteristics that is the extent to that your scores obtained using one test relate with performance on a criterion which is usually another test. Both lab tests can be implemented in once period (concurrent validity) or the next may be implemented at some future time (predictive validity)8.

Concurrent validity is determined by establishing a marriage (relationship) between your score on new test and the rating on an old test (whose validity is already determined) given in the same time frame. In case the correlation coefficient is high that is close to +1. 0 the test is thought to have good concurrent validity. Predictive validity on the other hand is the degree to which a test can forecast how well a person will perform on a criterion measure in the future. This criterion strategy can be a test for example standardized licensing exam or a performance measure such as patient satisfaction evaluations during practice10.

If the assessments do not correlate this demonstrates that one test is calculating a specific build while the other test is calculating another that is these are measuring particular constructs. This lack of correlation provides proof discrimination which is appealing if the two tests are professing to check discrete constructs, while correlation of results from two musical instruments which promise to gauge the same construct should correlate with each other providing convergence research to aid the validity of interpretations of scores from both tools10.

5. Evidence about the impact of examination on examinees or proof consequential validity of the device seeks to know the decisions and final results made on the basis of assessment rating and the impact of assessments on teaching and learning. The consequences of assessments on examinees, faculty, patients and population are great and these consequences can maintain positivity or negative, meant or unintended.

Threats to validity evidence

According to Downing9 validity faces two major threats, construct under representation (CU), and build irrelevant variance (CIV). CU can be scheduled to under-sampling (few questions, few stations, few observations), biased sampling or a mismatch of test to domain and low dependability of scores, evaluations. CIV identifies systematic error created by factors unrelated to the construct being assessed. Such can happen if the things are flawed, items too easy/too hard/non discriminating/cheating/flawed checklists/evaluations scales, variability in the performance of the standardized patient/s (scheduled to poor training), systematic rater problem, indefensible passing score, improperly trained assessors.

Reliability: Definition

According to Classical Test Theory (CTT) consistency is defined as the proportion of true report variance to the experienced score variance which is represented by stability coefficients8. In CTT the recognized report is a composite of the true score and error. Thus dependability coefficients are used to estimate the quantity of measurement error in assessments and is generally indicated as a coefficient which range from 0 (no consistency) to at least one 1 (perfect reliability). Low consistency means that the mistake part is large with the assessment and hence results do not carry value. Although higher trustworthiness is always preferable, there is no fixed threshold to discriminate "reliable" from "unreliable" results. Often 0. 80 is undoubtedly the minimal appropriate value, although it could be lower or more depending on the examination's purpose. Reliability can be negatively afflicted by many sources of problem or bias, however, satisfactory sampling ensures taking account of the undesirable resources of variance and boosts consistency10.

A predominant condition which influences reliability of evaluation is domain- or content-specificity, since competence is shown to be highly reliant on the framework and content6, 7. In the light of these findings reliable ratings can be achieved only if the content of the topic (to be examined) is basically sampled. It has led to the assessments in medical education leaving open ended essay questions, long conditions and limited number of short instances to multiple choice questions, objective structured professional medical examinations and multiple assessments of medical performance since many of these provide opportunities of evaluating students on a more substantial sample of test items compared to. The amount of time spent on assessment also affects reliability since much larger samples of performance can be accumulated. The other factors which result reliability are participation of larger volume of examiners and (standardized or real) patients which raise the chances of variability from student-to-student and therefore affect the dependability of such assessments. Examiner training, increasing use of trained standardized patients and sampling across different health issues are steps taken to improve stability of scores in assessment of medical students at both undergraduate and postgraduate levels. Recent studies have confirmed that sampling is the main factor in obtaining reliable results with any tool10.

Types of reliability

There are many types of reliability estimates which is the specific purpose of the analysis that dictates the sort of reliability estimation which is of ideal importance. The various types of stability estimations include8, 10, 11

test retest - let's assume that a test is testing a single construct, if the test is put into two halves, the items on one one half should correlate with the other half (this only gives the reliability for half of the ensure that you spearman brown's prophecy formula should be applied to have the reliability of the entire test.

internal steadiness - estimates the reliability from all possible ways by splitting the test into 2 halves: [this is Cronbach's alpha coefficient, which can be used in combination with polytomous data (0, 1, 2, 3, 4, n) and is the more basic form of the KR 20 coefficient, which may be used only with dichotomously scored items (0, 1), such as typically entirely on selected-response tests. ]

Inter rater reliability dependant on using kappa figures which take into account the random-chance event of rater arrangement and is so sometimes used as an interrater consistency estimate, especially for specific questions, rated by 2 impartial raters.

Generalizability coefficient -GT can calculate variance components for all your variables appealing in the look: the individuals, the raters and the items.

Issues of validity and stability with respect to medical education

Schuwirth & Vleuten5 in a crucial examination of validity and dependability are of the view that even though the psychometric paradigm of looking at evaluation has provided tools such as consistency and validity to ensure and improve the quality of assessment, it is of limited value in the light of current improvements in assessment. An important outcome of the change in the point of view on trustworthiness (increased sampling more important than standardization) is that there is no need for us to exclude from our diagnosis methods, devices that are rather more subjective or not correctly standardized, so long as we use those devices sensibly and expertly. This has resulted in a change in the manner we think about analysis in medical education and in goal of using musical instruments for diagnosis that are organised and standardized which took us away from real life settings into construed environments such as OSCE we are actually moving back to examination methods which are more traditional though less organized and standardized provided satisfactory sampling is performed ensuring dependability of measurements. This view is now very popular with assessors since it is lends more credibility to work-based or practice assessments tools which might not exactly be highly standardized but are a lot more authentic. A summary of important items to be looked at while assessing instruments for validity evidence and reliability estimates is given below11.

Validity is based on a theory or hypothesis and all sources of validity evidence contribute to taking or rejecting the hypothesis.

Validity is a property of scores and ratings interpretations and not a property of the tool itself.

Broader variety of validity evidence should be looked for with greater attention to the categories of relation to other variables, implications and response process.

Instruments using multiple observers should article inter rater stability.

Predictive validity of admissions testing in medical education

The main purpose for doing selection tests is to choose from a pool of applicants those who find themselves most well suited for the span of study or for exercising the job. In medical education this means that the entrants preferred for entrance to medical institution or residency programs show a readiness for medical education programs and also have the right kind of characteristics, assuming that students selected will stay and not leave the program and on graduation will practice drugs with professionalism. Thus nationwide and institutional analyzing systems in medical education in charge of developing and performing admission testing have to show the predictive value of these examinations to the modern culture. And therefore require selection criteria that are evidence-based and officially defensible. The factors that are generally investigated during the entrance process include cognitive abilities (knowledge), skills and non cognitive characteristics (personal capabilities). Evaluation of knowledge at the entry level in medical institutions has been used in many countries since a long time.

While reviewing English language literature for studies on validity proof selection studies the largest numbers of studies available are from THE UNITED STATES especially from USA which includes more than eighty years of record of centralized admission test in medical education. Few studies are also reported from United Kingdom and Australia. Three studies were found from South Asia. Both in United States of America (USA) and Canada admissions and licensing examinations have been extensively studied because of their ability to anticipate performance during medical institution, in licensing examinations, during residency education and niche (Board) examinations.

The components of knowledge and skills tested differ along the continuum of medical education with undergraduate class point average (UGPA) and medical college or university admission test (MCAT) scores being used for selection in medical academic institutions while the United States Medical Licensing Examinations (USMLE) for overseas medical graduates and the Country wide Table of Medical Examiners (NBME) used by graduates of US medical schools results, medical institution GPA and performance results on assessment during the medical institution years used for selection to residency programs. I'll review relevant studies under individual headings for medical schools and residency programs.

Studies on medical school admission exams:

Predictive validity of tests of Cognitive ability

Basco12 examined the contribution of undergraduate institutional strategy to predicting basic technology achievements in medical university. The undergraduate institutional solution was computed by averaging MCAT results attained by all students of an organization from between 1996 - 1999. the researcher found moderate relationship between Undergraduate technology GPA and individual MCAT scores and between SciGPA and USMLE step 1 1 scores. Relationship between specific MCAT ratings and USMLE step 1 1 was greater than that between institutional MCAT score. Jones et al13 studying the predictive validity of MCAT have reported that MCAT results have significant predictive validity for first and second 12 months medical college course marks and NBME part 1 exam scores. Swanson et al14 analyzed the predictive validity of the previous and current MCAT for USMLE Step one 1 and didn't find much difference between the two forms. Vancouver et al15 evaluated the use of MCAT ratings and undergraduate GPA for predictive validity and differential predictions predicated on ethnic groupings using NBME part 1 as a way of measuring medical students performance. They discovered that using the knowledge GPA and amalgamated MCAT scores were equally predictive for the minority and bulk groups analyzed.

Violato and Donnon16 studying the predictive capability of MCAT for clinical reasoning skills reported evidence of predictive validity for performance on Part 1 of Medical Council of Canada Assessment (MCCE). The verbal reasoning subset of MCAT was positively correlated with MCCE part 2. This demonstrates that items evaluating similar constructs have a good relationship (convergent validity data).

Peskun et al17 assessed the predictive validity of medical college application components by estimating connection between the components of the admission process and the standing of students by residency programs. They found that residency rank in internal medicine was correlated significantly with GPA and non cognitive diagnosis while residency list in family treatments (FM) was correlated significantly with the admissions interview and there was a development towards value between non cognitive analysis and FM position. However, there is no relationship between GPA, MCAT and FM rank. OSCE report was correlated significantly with non cognitive evaluation of entrance predictor variable. Last class in med college was correlated significantly with GPA, MCAT and non cognitive evaluation of admission changing.

Residency rank in IM was correlated significantly with OSCE, IM clerkship final grade and last quality in med school. Rating in FM was correlated significantly with OSCE credit score, IM clerkship ward evaluation, FM clerkship final grade and final quality in med college.

A number of studies have reported reviews of shared information of MCAT. Mitchell et al18 have reported on studies published from 1980 - 1987 using many predictors such as total as well as research and non knowledge themes undergraduate GPA (uGPA), MCAT ratings and institutional quality. They discovered that uGPA and MCAT ratings anticipate performance in basic sciences exams and performance in prior years of medical college. Donnon et al19 in a meta analysis of all publicized data of the predictive validity of post 1991 version of MCAT and its own subtest domains identified the validity coefficients for performance during medical college and on medical panel licensing examinations. They found that the MCAT total has medium predictive validity coefficient result size for basic knowledge/pre professional medical (r = 0. 43) and clerkship/scientific. The biological technology subtest has the highest predictive validity for both basic research /preclinical & medical/ many years of the med institution performance while the MCAT total has a large predictive validity coefficient for USMLE Step 1 1 and a medium validity coefficient for USMLE step two 2. The writing sample subtest possessed little predictive validity for both medical university performance and the licensing exam. Hojat et al20 also have studied the partnership between the writing sample subtest and the procedures of performance during medical institution and USMLE step 1 1. They didn't find any dissimilarities amongst the high, medium and low scorers in the written test regarding MCAT or USMLE scores. However they reported positive correlations with undergraduate non technology and MCAT verbal reasoning ratings of the three teams as well as in written clerkship examinations, and global scores of medical competence and rankings of interpersonal skills. Thus it shows that however the written scores do not correlate with MCQ kind of knowledge based lab tests they may be evaluating other constructs useful in clinical practice. Andriole et al21 analyzed impartial predictors of USMLE Step three 3 performance among a cohort of U. S. medical school graduates. They analyzed Step 3 3 scores in colaboration with four measures of academic success during medical university, including first-attempt USMLE Step one 1 and Step 2 2 ratings, third-year clinical clerkships' quality point average (GPA), and Alpha Omega Alpha (AOA) election. They found higher third season clerkships' GPA, higher Step 2 2 ratings, and choosing residency trained in broad-based specialties being associated with higher Step three 3 scores. Nonetheless they did not survey on the trustworthiness quotes for the ongoing assessment varieties used for clerkships in their analysis which is necessary to make inferences.

Two studies were found from UK which investigated predictive validity of medical institution admission standards. McManus22 confirming on use of A level grades for admission in medical academic institutions has shown that they are predictive of performance in basic medical knowledge, final clinical technology as well for part I of your postgraduate evaluation. He reported that use of intellectual aptitude exams as predictor of educational performance did not illustrate any predictive validity. Yates and James23 in a retrospective analysis design looked into the academic information of students who struggled in the medical institution. They found that negative responses in the head teachers reference characters were the sole indictor for strugglers.

Two studies were found confirming on predictive validty of admission assessments from Karachi, Pakistan. The analysis by Baig et al24 exhibited that the admission test scores possessed significantly positive weakened relationship with second (p = 0. 009) and third (p = 0. 003) professional ratings of the medical students. When the scores of High School were combined with admission test ratings, the predictive validity increased for first (p = 0. 031) second (p = 0. 032) and third (p = 0. 011) professional examinations. Another review from Aga Khan School reported that performance on entrance test is a much better predictor of performance on medical university examinations that interviews25.

deSilva et al26 evaluated the scope to which selection requirements used for admission in Sri Lankan medical classes predicted success later on and discovered that being a feminine and having an increased aggregate rating were the only real 3rd party predictors of success for performance in medical university while An even scores that have been used as really the only criteria for entrance had no correlation with performance in medical college.

A review by Coates27 studies the predictive validity of Graduate Medical School Admission Test (GAMSAT) which is employed for entrance to medical college in Australia and lately have been in UK and Ireland. They found that GAMSAT, interview and GPA showed divergent connections, while combo of GAMSAT and GPA ratings provided the best method of predicting calendar year 1 performance.

Predictive validity of non cognitive assessment

The non cognitive (personal) characteristics are predominantly assessed through interviews, personal assertions notice of support from the top of institution studied. Albanese et al28 in an assessment of published literature reporting on means to effectively evaluate personal qualities mentioned the troubles in using interviews to assess personal qualities and have come up with recommendations for a strategy for evaluating these. They may have provided facts that interviews provide information for entrance related to students' performance in the clinical component of medical education. They may have figured interview ratings can discriminate between students who fail to complete medical university and the ones who complete as well as between those who graduate with honors and the ones who do not.

Eva and acquaintances29 have mentioned the role of multiple little interviews (MMI) in evaluating non cognitive attributes. MMI were developed to improve the subjective diagnosis on traditional interviews and are being examined for his or her predictive validity.

Skills that are evaluated and whose ratings are being used for selection are the communication skills and self applied aimed learning skills for entrance in medical institution while for residency selection a far more specific set of skills arriving under the site of specialized medical competence are assessed. These skills in addition to communication skills include background taking, physical exam etc28.

Predictive validity of assessment in graduate medical education

Not many reports could be retrieved which discuss the predictive validity of selection procedures for graduate medical education. Patterson et al30 assessed three short list methodologies for their effectiveness and effectiveness for selection into postgraduate training in general practice in UK. They reported that scientific problem solving exams plus a recently developed situational judgment test which assessed non cognitive domains were effective in predicting performance at the selection center test that used work-relevant simulations which have been already validated.

Althouse et al31 have reported on the predictive validity of in-training analysis (ITE) of residents for transferring the overall Pediatrics certification evaluation. They discovered that the predictive validity of ITE increased with each year of training being minimal in yr one and maximum in time three.


Assessment methods in medical education have developed over the last a long time with increasing knowledge of the root constructs and development of complex psychological tests leading to more advanced techniques being utilized at admittance, during and exit levels of medical education.

The medical university admission checks in USA and Canada have been most extensively researched. The MCAT has undergone four major adjustments over time, which have been explored and reported32. However, most of the studies conducted to look for the predictive value of entrance lab tests for performance in medical college, during internship and residency education do not provide information on issues like content, consequential and construct validity specifically. Ratings used for student selection have been used as predictors and performance in medical institution, licensing assessment or during residency education as results. Two types of designs have been used while learning predictive validity; potential studies which look at the performance of medical students on medical university examinations, licensing examinations, specialty board evaluation or health results and retrospective studies examining the relationship between effects and predictor factors.

The conceptual framework used by the Best Information in Medical Education (BEME) group to study the predictive validity of examination in medical education helps in critical evaluation of this books. The results of the BEME organized review mentioned that research of the performance after graduating from the medical college is complicated and can't be measured by one type of measurement. Since scientific competence is a multifaceted entity, and the effectiveness of interactions with medical university performance measure differs depending after conceptual relevance of the actions taken during and after medical school. This is obvious in the studies described earlier once we observe that the preclinical GPAs produces more overlap with physicians' medical knowledge than with health professionals' interpersonal skills33.

Consideration of aspects of validity to judge tests for selection of candidates for medical schools

1. Content related facts:

McGaghie32 in a detailed summary of the MCAT from 1926 thus far providing the details of the subtest categories and question types of MCAT over time states that the definition of aptitude for drugs is exactly what has driven the content of the MCAT. In the early many years of its use from 1928 to 1946 this content was generally dominated by biomedical knowledge and intellectual attributes assumed to be had a need to flourish in medical education in those days. However we see that this content underwent revisions predicated on improvement of educational dimension and technology and a revised understanding of aptitude necessary for medical education which in the era from 1946-1962 consisted of decrease in the subtests and addition of understanding society. This was the very first time that it was felt that medical students also need to have an understanding of what is going on around them. This realization was made clearer when the 1962-1977 MCAT unveiled a section on general knowledge in place of understanding modern society. The 1977-1991 progression of MCAT led to discarding the overall liberal arts and knowledge as another section however reading skills and quantitative skills were included. The most recent version of MCAT which altered in 1991 will not measure liberal arts achievement or numeracy, but requires the university student to write a free response essay on a current topic while the verbal reasoning section reveals short comprehension passages from humanities, public and natural sciences accompanied by multiple choice questions.

However the few other selection testing pay a great emphasis on knowledge of biomedical topics and quantitative skills. The admission test of AKU has biology, chemistry, physics and mathematics questions25.

The non cognitive part of the admission process is increasing now with an improved understanding of the final results expected from a medical graduate. Interviews, personal essays, letter of reference point and evidence of contribution in community work derive from the content of non cognitive characteristics or traits such as compassion, empathy, altruism28, 29.

Selection into postgraduate programs is basically based on licensing examinations and medical school GPA this content which is heavily based on basic and professional medical sciences34. Newer methods are being created to look into areas that are very important in practice and look into ratings given by clerkship supervisors, and words of tips by the internship supervisors29. While using advantages of multiple mini interviews (MMI) the content of non cognitive characteristics /attributes is assessed in a structured manner. The content of the MMI include critical thinking, ethical decision making, communication skills and knowledge of the health treatment system 29.

Although plans of examination are not available in the public domain giving the facts of weighting to different content areas and the cognitive capacity being assessed, an over-all feel of the content contained in the selection can be possessed which appears to be reasonable considering the current knowledge of medical education and practice requirements.

The constructs evaluated by the choice studies including interviews are interest and readiness for medical college for undergraduate programs and readiness to practice for postgraduate programs. These can be broadly classified into cognition related and personality related constructs. The build of interest in written test is usually the success of the scholar in the particular test and its subtests. The data for achievements is gathered on the basis of knowledge of the topic, problem-solving capability, critical thinking and logical reasoning. Some quantitative skills are also evaluated in the knowledge problems32. The constructs evaluated by interviews and other methods include personality attributes that must practice as a physician and also have been determined as results of medical education programs. The tools that are used to asses they are still in their developmental periods and evidence is being collected regarding their appropriateness.

The technical quality of the questions and the procedure used to ensure quality has been recognized as a essential requirement of ascertaining validity of assessments35 but has not been eluded to in many studies.

2. Response process

Details of the response process are generally not available in the printed literature. Only one study that I came across has defined the response process at length which gives information into the affect it may have on the validity research36.

3. Internal structure

The only data that is reported about the internal framework is the trustworthiness of the assessments. I could not find studies that discussed the other way of measuring internal structure such as interclass correlations. Studies of MMI have reported on the generalizability of the results29.

4. Criterion related evidence

This aspect of admission tests has been researched the most and many reports have investigated predictive validity of the entrance studies both for undergraduate and postgraduate programs. Evidence of concurrent validity30 are sparse.

The studies have exhibited that the MCAT is a good predictor of the first two years of medical college but will not of the old age and also predicts the performance on USMLE Step I and its own predecessor NBME Part I12-20. This finding is expected since many of these examinations evaluate the same build that is accomplishment in biomedical sciences and generally use the same test format. The stability of MCAT and NBME and USMLE are reported to be <0. 9.

Violato and Donnan16 in their study of prediction of professional medical reasoning skills by the MCAT have shown that scores on assessments of declarative knowledge are good predictors of ratings on testing of knowledge but aren't proficient at predicting specialized medical reasoning.

Studies conducted in 1960s and 70s did not show any correlation between the medical school levels and medical professional performance after some years37, these studies were limited by the kind of measures available at that time. With greater understanding of clinical competence more complex methods to determine this facet of competence were released and we observed an improvement in correlations and studies conducted in 1980s have shown weak relationship between medical university marks and performance in postgraduate training. However studies also show that performance during medical school does not differentiate candidates who perform well during residency from those who perform terribly. It implies that the intricate competencies necessary for a physician to execute effectively are poorly measured by academic scores obtained through measurements which examine a narrow band of the extremely sophisticated total spectrum of skills, talents and performances of rehearsing physicians.

The lack of ability to anticipate performance on the basis of assessments during med college has been related to traditional grading systems38 or an inherent inability of grades to point the transformation of potential into the workplace, the effect of intervening experience between the time of academic training and following career evaluation, and the failing of the selection steps of traditional medical universities to recognize students with the characteristics that will be prerequisite for successful performance (changing brain pieces: knowledge, skills, conducts, and professionalism and reliability) in the task environment39.

Instruments found in measuring performance of residents and exercising physicians should have an acceptable amount of validity and dependability. Global ranking which forms the principal basis for appraising specialized medical skills suffer from several resources of bias which entail cognitive, public and environmental factors which have an impact on the ranking, not only the devices (ref). Research showed that patterns of measuring devices account for only 8% of the variance in performance scores (Williams et al. , 2003).

With regard to issues of psychosocial predictors of the academics and clinical performances of medical students, it has been reported that decided on psychosocial features could significantly raise the validity of predicting performances on aim examinations40, 41 advised a significant link is available between determined psychosocial actions and physician medical competence.

5. Outcome related evidence

Not much empirical data is open to provide the consequence of medical university admission testing on the scholar, university, patient or contemporary society. Minimal learner attrition is reported as you consequence of the admission checks. Some studies have also reported that pupil study to the test and coaching centers instruct the students in the art work of giving entrance interviews because of which the students have the ability to demonstrate (fake) the desired behaviors during that time28. However longitudinal studies need to be conducted by medical institutions and postgraduate programs to look for the consequence of the selection methods.


In conclusion it can be said that studies have reported various assessment methods which have varying predictive validity. However, these studies have been equivocal and non conclusive for aspects apart from academic performance. This is attributable to the multifaceted constructs that are assessed in health occupations; a lot of which remain not completely grasped and solutions to assess them remain being developed and analyzed for consistency and develop validity. This combined with the close discussion and effect of educational and work place on behaviors helps it be more difficult to establish, and interpret measurable habits.

The written lab tests have shown to obtain better predictive validity for cognitive (knowledge) established tests through the medical college or at the licensing examinations but are poor predictors of results on examination of clinical clerkship and Aims Structured Clinical Assessment. Undergraduate GPA in technology subjects has been reported to be predictive for scores on knowledge assessment during the early on years of medical college however, not for later.

Interviews have shown varying predictive validity since there are a number of personal features/qualities that are purportedly measured using techniques that happen to be highly structured to totally unstructured. Personal claims and letters of support have been used as a part of the choice protocol but usually are not given enough weightage to influence selection decisions in many universities. That is one area that needs to be tapped since few studies which have reported on these method show some value in these assessments. Caveats need to be considered when using these scores for selecting students given that they may be influenced by instruction or may be self constructed by students.

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)