|Year : 2021 | Volume
| Issue : 2 | Page : 37-49
Ethnic and gender bias in objective structured clinical examination: A critical review
Iris C. I. Chao1, Efrem Violato2, Brendan Concannon1, Charlotte McCartan2, Sharla King2, Mary Roduta Roberts3
1 Faculty of Rehabilitation Medicine, University of Alberta, Edmonton, Canada
2 Department of Educational Psychology, Faculty of Education, University of Alberta, Edmonton, Canada
3 Department of Occupational Therapy, Faculty of Rehabilitation Medicine, University of Alberta, Edmonton, Canada
|Date of Submission||12-Jan-2021|
|Date of Acceptance||27-Apr-2021|
|Date of Web Publication||15-Sep-2021|
Dr. Iris C. I. Chao
Faculty of Rehabilitation Medicine, University of Alberta, Edmonton T6G 2R3.
Source of Support: None, Conflict of Interest: None
This critical review aimed to synthesize the literature and critique the strength of the methodological quality of current evidence regarding examiner bias related to ethnicity and gender in objective structured clinical examination implemented in health professions education. The Guidelines for Critical Review (GCR) was used to critically appraise the selected studies. Ten studies were retrieved for review. The overall quality of the papers was moderate. Two studies met all the criteria of the GCR, indicating stronger evidence of their outcomes. One of them reported ethnic and gender bias potentially existing, while another found only one examiner showing consistent ethnic bias. No systematic bias was found across the studies. Nonetheless, the possibility of ethnic or gender bias by some examiners cannot be ignored. To mitigate potential examiner bias, the investigation of implicit bias training, frame of reference training, the use of multiple examiners, and combination assessments are suggested.
Keywords: Bias, ethnicity, examiner bias, gender, objective structured clinical examination, race, sex
|How to cite this article:|
Chao IC, Violato E, Concannon B, McCartan C, King S, Roberts MR. Ethnic and gender bias in objective structured clinical examination: A critical review. Educ Health Prof 2021;4:37-49
|How to cite this URL:|
Chao IC, Violato E, Concannon B, McCartan C, King S, Roberts MR. Ethnic and gender bias in objective structured clinical examination: A critical review. Educ Health Prof [serial online] 2021 [cited 2022 Dec 3];4:37-49. Available from: https://www.ehpjournal.com/text.asp?2021/4/2/37/325998
| Introduction|| |
Recent events have brought concerns with racism, sexism, equality, and fairness to the forefront of public discourse. In recent decades, sociologists have attempted to explore and explain the nature of disparity in the areas of ethnicity and gender. Ethnic and gender bias is the propensity to judge and evaluate a person based on their ethnicity and gender, rather than examining a person’s actual capacities and experiences, which can lead to prejudicial behaviors., In health professions education, studies found that these biases were present among educators.,,,, For example, Woolf et al. reported that clinical educators in a medical setting with more negative views of Asian or male medical students may have those stereotypes reinforced and feel less positive about teaching those students. Kirch advocated that researchers or educators must take action to terminate ethnic and gender discrimination in medical and health professions education.
In this critical review, we focused on exploring the ethnic and gender bias that occurred in a specific form of assessment, Objective Structured Clinical Examination (OSCE), among health professions educators or examiners. In North America and Europe, the OSCE is the standard tool to help educators examine various competencies of students in medicine and allied health professions.,,,, OSCEs are reliable and valid to evaluate the clinical performance of health professions students,, and allow examiners to provide more accurate feedback to students.,,,,,, However, an issue for OSCE is the possible subjectivity of examiners that may potentially introduce the risk of inequality, especially when examiners are affected by construct irrelevant characteristics, such as ethnicity and/or gender.,,,
Purpose of the current critical review
Understanding the bias occurring in OSCEs can lead to well-designed examiner training or assessment approaches to reduce inequalities and promote fairness. To date, no reviews have synthesized the literature related to ethnic and gender bias in OSCEs nor critiqued their methodological quality. It is uncertain whether the occurrence of bias is supported by strong and trustworthy evidence. Therefore, this critical review aims to synthesize the literature and appraise the methodological quality of the current evidence regarding ethnic and/or gender bias by examiners evaluating health professions students in OSCEs.
| Methods|| |
Search strategy and selection criteria
Literature was searched in Scopus, CINAHL, and Medline. No date range was set. Search strategy was accomplished with the support of a librarian. The [appendix – Material 1] shows an example of the search terms in Scopus.
Studies related to the investigation of examiner ethnic and/or gender bias occurring in OSCEs were selected. Both quantitative and qualitative researches in the English language were included. Studies that did not explore examiner bias related to ethnicity and gender in OSCEs conducted in health professions education were excluded.
Critical appraisal tool
The Guidelines for Critical Review (GCR) protocol developed by the McMaster University Occupational Therapy Evidence-based Practice Research Group was utilized for critical appraisal of selected studies. The GCR has quantitative and qualitative review guidelines though this review only applied the quantitative review guideline as all studies meeting the inclusion criteria were quantitative based. The guideline consists of seven data extraction areas: study purpose, design, sample, outcomes, intervention, results, and conclusions and implications. The GCR protocol is available online (https://srs-mcmaster.ca/research/evidence-based-practice-research-group/).
Data management, collection process, and synthesis
The authors shared the documents of the GCR instruction for assessing quality criteria, the full text of selected studies, and the results of data extraction electronically in a shared drive. Three authors (IC, EV, and BC) reviewed each selected study and independently performed data extraction using the GCR protocol. The first author (IC) then synthesized all the data and uploaded them to the shared drive. After an initial evaluation, there were minor discrepancies between evaluations. Discrepancies were discussed until consensus was achieved. The primary cause for discrepancies was related to the interpretation of the GCR guideline.
| Results|| |
A total of 89 studies were retrieved from the databases and reference lists. After eliminating 42 duplicates, screening the titles and abstracts, and applying inclusion criteria to the retrieved articles, 10 studies were selected for review [Figure 1], [appendix – Material 2]. The [appendix – Materials 3] and  – shows a descriptive summary of each reviewed study and the results of the methodological critique using the GCR protocol.
|Material 4: Results of methodological critique of the reviewed studies using the Guidelines for Critical Review protocol|
Click here to view
Methodological quality based on Guidelines for Critical Review criteria
[Table 1] presents a summary of this critical review and shows the extent to which the studies met the GCR quality criteria. The overall quality of the studies was moderate. Denney et al. and McManus et al. met all quality criteria, indicating that the results of these two studies may be more trustworthy. Denney et al. demonstrated a significant interaction between examiners’ and students’ ethnicity and gender and bias that may be introduced by varying subgroups of examiners; however, effect sizes were small. McManus et al. found ethnic-related bias occurring in only one non-White examiner who awarded higher grades to non-White students.
Criteria 1: Purpose and design
All selected studies clearly stated their purposes. Most of the studies reviewed relevant background literature to justify the research. However, Stupart et al. did not provide adequate background information. As the knowledge around a subject grows, study designs should become more rigorous where most variables affecting the consequence are understood and can be controlled by the researcher. If there is a paucity of information about an issue, a more exploratory method is suitable; for instance, a case study or a cross-sectional design. In this regard, all the studies reviewed had appropriate study designs. Six reviewed studies published from 2004 to 2017 used retrospective cross-sectional design.,,,,,, Schleicher et al. and Wass et al. utilized prospective cross-sectional designs., Yeates et al. used a randomized controlled trial (RCT) in their study published in 2017. The temporal distribution of the studies shows a progression from cross-sectional designs with the earliest studies to an RCT design with the most recent study.
Criteria 2: Sampling
All selected studies derived the personal information of examiners and/or students from institutional databases or asked the participants to self-declare ethnicity and/or gender. Most of the selected studies stated the ethnicity and gender of examiners and students clearly, Dewhurst et al. did not describe examiners’ gender and ethnicity. Seven retrospective cross-sectional studies recruited participants from the previous 1–4 years. The number of examiners in these six studies ranged from 48 to 356 and the number of assessments ranged from 1024 to 52000. All the studies justified the sample size. Denney et al. collected data on the Membership of the Royal College of General Practitioners, Dewhurst et al. and McManus et al. recruited graduate candidates from the Membership of the Royal Colleges of Physician, and Richens et al. included participants from the Intercollegiate Specialty Board examinations, in the UK. These four studies were considered as representative of the target population as they included all students in an OSCE for the specified range of time. Schleicher et al., Stupart et al., Wass et al., Wiskin et al., Woolf et al., and Yeates et al. conducted convenience sampling at five German medical schools, a medical school in South Africa, and medical schools in the UK,,,, respectively.
Criteria 3: Outcome measures
Six out of 10 studies used rating scales to award grades for examinees, of which Denney et al., Dewhurst et al., and McManus et al. clearly described the reliability and validity of their outcome measures. Schleicher et al. addressed the weakness of the interrater reliability of their scale. Wass et al., Wiskin et al., and Yeates et al. did not describe the reliability of their rating scales. In addition, Richens et al., Woolf et al., Stupart et al. did not describe the tools used to measure student performance.
Criteria 4: Implementation
All studies provided training to examiners before the OSCE, nine out of 10 studies were conducted in a real exam situation, where the scores assigned to students affected students’ course grades or licensure. In these nine studies, the number of OSCE stations ranged from 5 to 22; four studies had one examiner in each station,,,, and two studies had two examiners;, Richen et al., Woolf et al., and Dewhurst et al. did not mention the number of examiners per station. One study was in a simulated exam setting, where the scores awarded did not affect students’ grades. In this study, 159 examiners were randomly assigned to two student simulation groups.
Eight of the studies described the procedure and content of the OSCE. Dewhurst et al. and Woolf et al. did not provide adequate information for the OSCEs used. For example, the number of stations, the number of examiners per station, the procedure, and the examined competency domains in the OSCEs were not mentioned.
Criteria 5: Results
[Table 1] summarizes overall results of the selected studies. Six studies demonstrated underperformance in the overall performance of students from ethnic minorities compared to the ethnic majority.,,,,, Wass et al. found the underperformance of non-White students was restricted to communication skills. Yeates et al. reported that examiners activated Asian-related stereotypes, but these had no effect on examiner scorings. Three studies reported potential ethnic bias: Denney et al. found a significant interaction between examiners’ and students’ ethnicity with Black and minority ethnic (BME) students receiving higher grades from BME examiners than White examiners; McManus et al. reported only one non-White examiner consistently awarding higher scores to non-White students; Dewhurst et al. identified two non-White examiners demonstrating a bias towards non-White students. Although Denney et al. concluded ethnic bias in clinical performance, overall it did not appear there was any systematic ethnic-related examiner bias across the studies. Further the majority of the studies demonstrated small effect sizes (Cohen’s d <0.1).
Regarding gender-related differences, six studies found that female students performed significantly better on overall clinical performance than male students.,,,,, Two studies reported potential gender bias: Denney et al. found male students receiving higher scores from female examiners compared to male examiners; Schleicher et al. found male examiners scoring female students higher than male students. Both studies had small effect sizes (Cohen’s d < 0.1–0.32).
Criteria 6: Study bias and limitations
There were four major limitations in the selected studies. First was the representativeness of the sample to a larger population. Six studies used convenience sampling,,,,,, and were not likely representative of the population., The second limitation was the under-reported psychometric properties of the outcome measures. One study did not state the reliability and validity of the assessment tool used in the OSCEs, one did not address the reliability of the tool, and three studies did not describe the assessment tools used in the OSCE.,, The under-reported psychometric properties lowers confidence about the quality of the outcomes of interest. The third limitation was the lack of descriptive details of the implementation, such as the procedure of the OSCE. Three studies did not describe information about the OCSE used and did not mention the number of examiners.,, The final limitation was the use of only a single examiner per OSCE station. Five studies used only one examiner per station, potentially reducing the reliability of the outcome measures.,,,,
| Discussion|| |
Based on the results of the reviewed literature at the present time, there is not any systematic evidence of ethnic or gender bias occurring during OSCEs. Only two studies met all GCR quality criteria, indicating stronger methodological quality., Although three selected studies identified examiner bias related to ethnicity, the outcomes were not consistent.,, Dewhurst et al. and McManus et al. found examiner bias in only one and two examiners, respectively. Denney et al. reported BME examiners favoring BME students but with a small effect size. Of these three studies, only Denney et al. met all the GCR quality criteria.
Two selected studies demonstrated examiner bias related to gender, with female examiners awarding higher marks to male students and male examiners awarding higher marks to female students. However, both outcomes had small effect sizes. Of these two studies, only Denney et al. met all the GCR quality criteria.
In addition, all studies described the demographics of students, but only four studies stated this information about examiners.,,, In those studies, the breakdown by ethnicity constitutes a larger proportion of nonwhite students than White students, and a larger proportion of White examiners than nonwhite.,, One of the four studies showed uneven distribution of gender between the student and examiner groups, which had more female students but more male examiners. Two studies had a similar proportion of gender for both student and examiner groups.,
When searching for appropriate studies for this critical review, we found that most of the research focused on the performance levels between students across different demographic groups. Few of these studies investigated the potential causations of such ethnic and gender bias in OSCEs. Since potential ethnic and gender bias within a high-stakes examination is of major concern, this issue is expected to become increasingly evaluated within health professions education. It is essential for educators and assessors to consider sources of potential examiner bias when developing OSCEs, in order to ensure fair and consistent evaluation of student performance. Since this problem could vary across different assessment programs, a one-solution-fits-all approach may be ineffective. Instead, those involved in the OSCE development and evaluation process may form a committee and address this potential problem. The committee should examine potential sources of assessment bias, either due to demographic differences between the examiners and students or other potential factors. If bias is detected, ways to mitigate their effect and impact on scores will need to be addressed.
Suggestions for mitigating potential examiner bias
Though significant differences for OSCE scores were found for ethnic and gender groups; overall, there is no consistent evidence to support ethnic or gender bias occurring in examiners assessing students in OSCEs based on the studies reviewed. Nonetheless, the potential for bias in individual examiners should not be ignored as it may lead to unfair evaluations for some students. Implicit bias training, frame of reference (FOR) training, using multiple examiners per station, and the application of combination assessments are recommended for OSCEs to minimize bias and promote equality in diverse learning environments.
According to Gatewood et al., implicit bias works at a level in which the individual remains unaware of their bias. Implicit bias training helps identify unconscious prejudices that impact behavior in the mind of individuals. Such training comes with a facilitator’s guide to provide an overview and science of implicit bias, the development of an inclusive and safe learning environment, and the mitigation of implicit bias. The training may involve small group discussions, case study examples, and self-assessments, and a posttraining implementation plan. While implicit bias training remains underutilized in OSCE assessor training, its benefits can be used to inform OSCE assessors about the possibility of unconscious biases that occur when making decisions.
In contrast to implicit bias training, FOR training is a standardized training approach that is used to maintain consistency when assessors are examining a group of students. FOR training aims to direct examiner scoring with a common standard of performance., Evidence demonstrates that FOR training can increase the accuracy of scoring or giving feedback in OSCEs., FOR training involves creating training groups of examiners, explaining the use of the assessment tool, and iteratively practicing scoring based on standardized examples. Practicing scoring involves explaining and discussing ratings and clarifying any disagreements.,,,
Besides, the use of two or more examiners in an OSCE station can increase the reliability and fairness of the results and favor the objectivity of the assessment.,,,, With another examiner, an examiner’s scoring could be monitored and compared, making it easier to identify potential bias. However, using multiple examiners sometimes is not practical due to the cost of human resources. One study suggested utilizing nonmedical lay-examiners where, after training, they demonstrated similar inter-rater reliability with trained practitioner-examiners.
Last but not least, though OSCEs can predict student performance, a combination of other assessments (i.e., essays and multiple-choice tests) has been shown to have the strongest predictive validity of student performance. It is important to note that OSCEs have no established gold standard, thus incorporating other assessments with OSCEs can help identify discrepancies that may occur in OSCEs based on ethnicity or gender.
Other potential sources of bias
It is possible assessment tools (i.e., scales, checklists, rubrics) as opposed to the examiners themselves are responsible for the differences in scores found between ethnic or gender minorities/majorities. The use of advanced test theory methods can allow for a more objective understanding of item functioning, including performance across subgroups. Item response theory, generalizability theory, and methods for identifying item bias such as differential item functioning can provide a more comprehensive understanding of the psychometric properties of the measures. Using these methods, it will be possible to determine the degree to which sources of bias can be attributed to exam items rather than the examiners. Consideration must be taken to ensure the assessment items accurately and fairly reflect the assessment objectives.
Chong et al. studied the impact of examiner occupation, specialty, clinical seniority, experience, and gender on scores in two medicine OSCEs. They reported that examiner characteristics such as experience and seniority were correlated to bias when assessing communication performance in an OSCE, as junior medical educators awarded consistently higher grades to students than senior clinical educators. Examiner occupation, specialty, and gender did not significantly influence scores. An additional study showed that examiners had different levels of severity (leniency or stringency) when scoring in an OSCE, but which was unlikely related to their demographics.
Limitations and future recommendations
First, all the selected studies were conducted in medical education; the results may not be able to generalize to other health professions education. It is important that educators of other health professions are aware of potential examiner bias. Second, as the selected studies were conducted in the UK, Germany, and South Africa, their results may not represent the situation in other countries.
Third, few studies investigating potential examiner bias met the criteria for high-quality studies according to GCR criteria; future studies should strive to report psychometric properties of assessment tools and provide more explicit details of research implementation. In addition, more qualitative studies are needed with this topic as all the selected studies are quantitative-based. Moreover, we suggest investigating the effects of implicit bias training, FOR training, the use of multiple examiners, and combination assessments in OSCEs to minimize bias. The critical review includes studies published from 2003 to 2017. Due to the constant changes in the social concepts of diversity and equity, future studies on ethnic and gender bias in OSCEs may have different ideas on improving research.
| Conclusion|| |
Based on the reviewed literature, there does not appear to be a consistent or systematic bias related to gender or ethnicity within OSCE evaluations. This statement should be qualified by the fact that the overall quality of the studies reviewed was moderate, with only two out of 10 selected studies meeting all GCR criteria for a high-quality study. Across most of the studies, there were methodological issues including sampling, assessment tools used, and description of the OSCEs studied. Continued investigation of potential gender and ethnic bias, using more rigorous methods, is required. The potential problem of assessor bias within OSCEs, due to demographic differences between assessors and students, is assumed to be unique to each assessment program. Assessment programs should monitor for ethnicity and gender bias and specify methods to address and mitigate its effects once identified. We recommend future studies to explore the effects of implicit bias training, FOR training, the use of multiple examiners per OSCE station, and the use of combination assessments to reduce potential examiner bias in health professions education.
We are thankful to Ms. Liz Dennett, a librarian at J.W. Scott Library, University of Alberta, for her dedication in helping database searches.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Perry J, Watkins M, Gilbert A, Rawlinson J. A systematic review of the evidence on service user involvement in interpersonal skills training of mental health students. J Psychiatr Ment Health Nurs2013;20:525-40.
Hall WJ, Chapman MV, Lee KM, Merino YM, Thomas TW, Payne BK, et al
. Implicit racial/ethnic bias among health care professionals and its influence on health care outcomes: A systematic review. Am J Public Health 2015;105:e60-76.
Cudé G, Winfrey K. The Hidden Barrier: Gender bias: Fact or fiction? Nurs Womens Health 2007;11:254-65.
Siegelman JN, Lall M, Lee L, Moran TP, Wallenstein J, Shah B. Gender bias in simulation-based assessments of emergency medicine residents. J Grad Med Educ 2018;10:411-5.
Woolf K, Cave J, Greenhalgh T, Dacre J. Ethnic stereotypes and the underachievement of UK medical students from ethnic minorities: Qualitative study. BMJ 2008;337:a1220.
Clouten N, Homma M, Shimada R. Clinical education and cultural diversity in physical therapy: Clinical performance of minority student physical therapists and the expectations of clinical instructors. Physiother Theory Pract 2006;22:1-15.
Jacques L, Kaljo K, Treat R, Davis J, Farez R, Lund M. Intersecting gender, evaluations, and examinations: Averting gender bias in an obstetrics and gynecology clerkship in the United States. Educ Health (Abingdon) 2016;29:25-9.
Kirch D. Addressing Racism and Mistreatment in Academic Medicine. AAMC; 2019. Available from: https://www.aamc.org/news-insights/addressing-racism-and-mistreatment-academic-medicine. [Last accessed on 2020 Aug 20].
Edgar S, Mercer A, Hamer P. Admission interview scores are associated with clinical performance in an undergraduate physiotherapy course: An observational study. Physiotherapy 2014;100:331-5.
Guttormsen S, Beyeler C, Bonvin R, Feller S, Schirlo C, Schnabel K, et al
. The new licencing examination for human medicine: From concept to implementation. Swiss Med Wkly 2013;143:w13897.
Sakurai H, Kanada Y, Sugiura Y, Motoya I, Wada Y, Yamada M, et al
. OSCE-based clinical skill education for physical and occupational therapists. J Phys Ther Sci 2014;26:1387-97.
Aranda JP, Davies ML, Jackevicius CA. Student pharmacists’ performance and perceptions on an evidence-based medicine objective structured clinical examination. Curr Pharm Teach Learn 2019;11:302-8.
Näpänkangas R, Karaharju-Suvanto T, Pyörälä E, Harila V, Ollila P, Lähdesmäki R, et al
. Can the results of the OSCE predict the results of clinical assessment in dental education? Eur J Dent Educ 2016;20:3-8.
Muthusami A, Mohsina S, Sureshkumar S, Anandhi A, Elamurugan TP, Srinivasan K, et al
. Efficacy and feasibility of objective structured clinical examination in the internal assessment for surgery postgraduates. J Surg Educ 2017;74:398-405.
Plakiotis C. Objective structured clinical examination (OSCE) in psychiatry education: A review of its role in competency-based assessment. In: Vlamos P, editor. GeNeDis 2016. Advances in Experimental Medicine and Biology. Sparta: Springer International Publishing; 2017. p. 159-80.
Roduta Roberts M, Alves CB, Werther K, Bahry LM. Examining the reliability of scores from a performance assessment of practice-based competencies. J Psychoeduc Assess 2019;37:973-88.
CarlLee S, Rowat J, Suneja M. Assessing entrustable professional activities using an orientation OSCE: Identifying the gaps. J Grad Med Educ 2019;11:214-20.
Franzese C. When to cut? Using an objective structured clinical examination to evaluate surgical decision-making. Laryngoscope 2007;117:1938-42.
Lee CB, Madrazo L, Khan U, Thangarasa T, McConnell M, Khamisa K. A student-initiated objective structured clinical examination as a sustainable cost-effective learning experience. Med Educ Online 2018;23:1440111.
Lukas RV, Adesoye T, Smith S, Blood A, Brorson JR. Student assessment by objective structured examination in a neurology clerkship. Neurology 2012;79:681-5.
Schwartz RW, Witzke DB, Donnelly MB, Stratton T, Blue AV, Sloan DA. Assessing residents’ clinical performance: Cumulative results of a four-year study with the Objective Structured Clinical Examination. Surgery 1998;124:307-12.
Sloan DA, Donnelly MB, Schwartz RW, Strodel WE. The objective structured clinical examination. The new gold standard for evaluating postgraduate clinical performance. Ann Surg 1995;222:735-42.
Wright EJ, Khosla RK, Howell L, Lee GK. Rhinoplasty education using a standardized patient encounter. Arch Plast Surg 2016;43:451-6.
Denney ML, Freeman A, Wakeford R. MRCGP CSA: Are the examiners biased, favouring their own by sex, ethnicity, and degree source? Br J Gen Pract 2013;63:e718-25.
McManus IC, Elder AT, Dacre J. Investigating possible ethnicity and sex bias in clinical examiners: An analysis of data from the MRCP(UK) PACES and nPACES examinations. BMC Med Educ 2013;13:103.
Stegers-Jager KM, Steyerberg EW, Cohen-Schotanus J, Themmen AP. Ethnic disparities in undergraduate pre-clinical and clinical performance. Med Educ 2012;46:575-85.
Wass V, Roberts C, Hoogenboom R, Jones R, Van der Vleuten C. Effect of ethnicity on performance in a final objective structured clinical examination: Qualitative and quantitative study. BMJ 2003;326:800-3.
Law M, Stewart D, Letts L, Pollock N, Bosch J, Westmorland M. Guidelines for critical review of qualitative studies. McMaster University occupational therapy evidence-based practice research Group.1998:1-9.
Stupart D, Goldberg P, Krige J, Khan D. Does examiner bias in undergraduate oral and clinical surgery examinations occur? S Afr Med J 2008;98:805-7.
Letts L, Wilkins S, Law M, Stewart D, Bosch J, Westmorland M. Critical review form–qualitative studies (version 2.0). McMaster University; 2007.
Dewhurst NG, McManus C, Mollon J, Dacre JE, Vale AJ. Performance in the MRCP(UK) examination 2003-4: Analysis of pass rates of UK graduates in relation to self-declared ethnicity and gender. BMC Med 2007;5:8.
Richens D, Graham TR, James J, Till H, Turner PG, Featherstone C. Racial and gender influences on pass rates for the UK and Ireland specialty board examinations. J Surg Educ 2016;73:143-50.
Wiskin CM, Allan TF, Skelton JR. Gender as a variable in the assessment of final year degree-level communication skills. Med Educ 2004;38:129-37.
Woolf K, Haq I, McManus IC, Higham J, Dacre J. Exploring the underperformance of male and minority ethnic medical students in first year clinical examinations. Adv Health Sci Educ Theory Pract 2008;13:607-16.
Schleicher I, Leitner K, Juenger J, Moeltner A, Ruesseler M, Bender B, et al
. Examiner effect on the objective structured clinical exam - A study at five medical schools. BMC Med Educ 2017;17:71.
Yeates P, Woolf K, Benbow E, Davies B, Boohan M, Eva K. A randomised trial of the influence of racial stereotype bias on examiners’ scores, feedback and recollections in undergraduate clinical exams. BMC Med 2017;15:179.
Cohen J. Statistical Power Analysis for the Behavioral Sciences. Mahwah, NJ: L. Erlbaum Associates; 1988.
Bornstein MH, Jager J, Putnick DL. Sampling in developmental science: Situations, shortcomings, solutions, and standards. Dev Rev 2013;33:357-70.
Setia MS. Methodology series module 5: Sampling strategies. Indian J Dermatol 2016;61:505-9.
] [Full text]
Souza AC, Alexandre NM, Guirardello EB. Psychometric properties in instruments evaluation of reliability and validity. Epidemiol Serv Saude 2017;26:649-59.
McManus IC, Thompson M, Mollon J. Assessment of examiner leniency and stringency (‘hawk-dove effect’) in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling. BMC Med Educ 2006;6:42.
Gatewood E, Broholm C, Herman J, Yingling C. Making the invisible visible: Implementing an implicit bias activity in nursing education. J Prof Nurs 2019;35:447-51.
Crawford C. The everyone project unveils implicit bias training guide. Ann Fam Med2020;18:182.
Newman LR, Brodsky D, Jones RN, Schwartzstein RM, Atkins KM, Roberts DH. Frame-of-reference training: Establishing reliable assessment of teaching effectiveness. J Contin Educ Health Prof 2016;36:206-10.
Schleicher DJ, Day DV, Mayes BT, Riggio RE. A new frame for frame-of-reference training: Enhancing the construct validity of assessment centers. J Appl Psychol 2002;87:735-46.
Gorman CA, Rentsch JR. Evaluating frame-of-reference rater training effectiveness using performance schema accuracy. J Appl Psychol 2009;94:1336-44.
Woehr DJ, Huffcutt AI. Rater training for performance appraisal: A quantitative review. J Occup Organ Psychol 1994;67: 189-205.
Bernardin J, Buckley R. Strategies in rater training. Acad Manage Rev 1981;6:205-12.
Gardner AK, Russo MA, Jabbour II, Kosemund M, Scott DJ. Frame-of-reference training for simulation-based intraoperative communication assessment. Am J Surg 2016;212:548-51.e2.
Bagnasco A, Tolotti A, Pagnucci N, Torre G, Timmins F, Aleo G, Sasso L. How to maintain equity and objectivity in assessing the communication skills in a large group of student nurses during a long examination session, using the Objective Structured Clinical Examination (OSCE). Nurse Educ Today 2016;38:54-60.
Brannick MT, Erol-Korkmaz HT, Prewett M. A systematic review of the reliability of objective structured clinical examination scores. Med Educ 2011;45:1181-9.
Dickter DN, Stielstra S, Lineberry M. Interrater reliability of standardized actors versus nonactors in a simulation based assessment of interprofessional collaboration. Simul Healthc 2015;10:249-55.
Harasym PH, Woloschuk W, Cunning L. Undesired variance due to examiner stringency/leniency effect in communication skill scores assessed in OSCEs. Adv Health Sci Educ Theory Pract 2008;13:617-32.
Williams RG, Klamen DA, McGaghie WC. Cognitive, social and environmental sources of bias in clinical performance ratings. Teach Learn Med 2003;15:270-92.
Berger AJ, Gillespie CC, Tewksbury LR, Overstreet IM, Tsai Kal MC, , et al
. Assessment of medical student clinical reasoning by “lay” vs physician raters: Inter-rater reliability using a scoring guide in a multidisciplinary objective structured clinical examination. Am J Surg 2012;203:81-6.
Colliver JA, Vu NV, Marcy ML, Travis TA, Robbs RS. Effects of examinee gender, standardized-patient gender, and their interaction on standardized patients’ ratings of examinees’ interpersonal and communication skills. Acad Med 1993;68:153-7.
Baig LA, Violato C. Temporal stability of objective structured clinical exams: A longitudinal study employing item response theory. BMC Med Educ 2012;12:121.
Chong L, Taylor S, Haywood M, Adelstein BA, Shulruf B. Examiner seniority and experience are associated with bias when scoring communication, but not examination, skills in objective structured clinical examinations in Australia. J Educ Eval Health Prof 2018;15:17.
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]