• Users Online: 197
  • Print this page
  • Email this page


 
 
Table of Contents
METHODOLOGY
Year : 2019  |  Volume : 2  |  Issue : 1  |  Page : 48-50

Survey research methods: A guide for creating post-stratification weights to correct for sample bias


Department of Clinical Sciences, North Carolina State University, Raleigh, North Carolina, USA

Date of Web Publication30-May-2019

Correspondence Address:
Dr. Kenneth D Royal
Department of Clinical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina
USA
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/EHP.EHP_8_19

Rights and Permissions
  Abstract 


Nonrepresentative data pose one of the greatest validity threats in survey research. Samples that are underrepresented and/or overrepresented based on demographic subgroups can introduce bias that distorts both the accuracy and the inferences made about the results. This article discusses the concept of poststratification weighting, a post hoc statistical procedure used to correct for sampling bias in survey research studies. Procedural steps for calculating poststratification weights are presented, and an example involving a simulated cohort of students in a medical school is provided for demonstration purposes. SPSS statistical software coding is presented to help researchers get started with their own calculations of poststratification weights.

Keywords: Assessment, bias, evaluation, health surveys, medical education, statistics, survey research, surveys


How to cite this article:
Royal KD. Survey research methods: A guide for creating post-stratification weights to correct for sample bias. Educ Health Prof 2019;2:48-50

How to cite this URL:
Royal KD. Survey research methods: A guide for creating post-stratification weights to correct for sample bias. Educ Health Prof [serial online] 2019 [cited 2023 Mar 27];2:48-50. Available from: https://www.ehpjournal.com/text.asp?2019/2/1/48/259389




  Introduction Top


In medical and health professions education, most surveys are administered in the context of a census study in which all members of a population (e.g., a student cohort) are surveyed. With exception to surveys that require participation, it is typical for only some individuals to complete the survey. When surveys fail to achieve a 100% participation rate, response bias becomes a concern. In social research, various subpopulations often respond to survey items differently according to factors such as race, gender, and other demographic characteristics. As a result, underrepresentation or overrepresentation from members of various subpopulation groups can introduce bias into survey results. The consequence is that statistical software will simply analyze the data given, thus providing greater weight to those individuals who were overrepresented and lesser weight to those individuals who were underrepresented. This results in a validity threat, as both the accuracy and the inferences made about the results are distorted by the sampling bias.[1] Given this reality, it is critical that survey researchers make every effort to produce accurate, nonbiased estimates that characterize the views, attitudes, beliefs, etc., of both the entire population and its major subpopulation groups.

Typically, survey researchers attempt to minimize response bias by obtaining representative samples. In short, representative samples help ensure that one's findings may be generalizable to the population from which the sample was drawn. A major advantage of census studies in the context of medical and health professions education is that population parameters, such as demographic characteristics and other auxiliary statistics, typically are known. Thus, with the use of a chi-squared test, researchers can determine if the participants that completed the survey proportionally resemble the population of interest based on key characteristics. If chi-squared tests confirm that the sample resembles the population, then the researcher may proceed with the analysis and subsequent reporting of results. However, if chi-squared tests indicate that the sample is disproportionate, then the researcher should take some action to correct for this bias. One option is to obtain additional data from members of subpopulation groups that are underrepresented in the data. However, given the relatively small size of most populations, stratified sampling techniques typically are only marginally helpful in this context. Thus, researchers often are forced to consider other alternatives.

One robust alternative to correct for sample distributions that do not perfectly resemble population distributions is to apply poststratification weights. In short, poststratification weighting involves taking sample data and aligning the representation of various subpopulation groups to match that of the known population. As the name implies, poststratification weights are calculated after all data are collected. When the procedure is performed correctly, extant data are statistically adjusted to reflect population parameters, making results both more accurate and generalizable across the population of interest. Thus, the aim of this article is to provide an overview of poststratification weighting and demonstrate how this procedure can be leveraged to obtain more accurate results in many, if not most, medical and health professions education survey contexts.


  Procedural Steps for Creating Poststratification Weights Top


First, let us consider the procedural steps necessary for calculating weight values.

  • Step 1: Create a table to assemble your variables
  • Step 2: Populate your values for Population (N) and Sample (n), where appropriate
  • Step 3: Calculate a total count for the Population (N) and Sample (n) columns
  • Step 4: Populate the Proportion of Population column by dividing the value for each Combined Variable in the Population (N) column by its column Total
  • Step 5: Populate the Proportion of Sample column by dividing the value for each Combined Variable in the Sample (n) column by its column Total
  • Step 6: Calculate Weight by dividing the value in each cell of the Proportion of Population column by the value in each cell of the Proportion of Sample column.


Next, let us apply these steps using an illustrative example.

An illustrative example

Suppose a 1st-year medical school cohort consists of 100 students. Of those 100 students, 50 identified as male and 50 identified as female. With respect to race/ethnicity, 70 students self-reported as White, 20 as Black, and 10 as Other. These data serve as the auxiliary statistics for this exercise.

First, we need to produce a crosstab contingency table [Table 1] to establish counts for each combination of race and gender variables (see Combined Variables column). Known student cohort values are entered into the Population (N) column. For this exercise, let us assume that 57 students completed the survey. Thus, next, we need to identify which 57 students of the 100 in the population completed the survey and similarly provide counts for each combination of race and gender in the Sample (n) column. Let us also assume that our sample participants responded in a disproportionate manner (thus justifying the need for poststratification weights), with females responding in greater numbers than males and Black students responding in greater numbers than White or Other students. Simulated values are provided in the Sample (n) column. Proportional values are then created for both the population parameters (e.g., 35 White males divided by 100 students in the total population is 0.350) and the sample's statistics (e.g., 15 White males divided by 57 students in the sample of participants is 0.263) by dividing each value by its respective total count. Proportion of Population values are then divided by Proportion of Sample values (e.g., 0.350 divided by 0.263 equals 1.330) to determine the Weight. A visual inspection of the weights provides a quality assurance check confirming that the values are correct. Next, let us identify how to apply weights using IBM SPSS Statistics for Windows, Version 25.0. (IBM Corp., Armonk, NY, USA).
Table 1: A template for producing poststratification weights

Click here to view


Applying weights in a statistical software package

After weights are calculated, the weights need to be applied to the data. This process will vary depending on the statistical software program used, but the essence of the process is generally the same. For convenience, SPSS syntax is provided in this example. Suppose the coding schema in the dataset for Race is 1 = White, 2 = Black, and 3 = Other and for Gender is 1 = Male and 2 = Female, the following coding schema would create a new variable (named Weight).

If (Race = 1 and Gender = 1) Weight = 1.330.

If (Race = 2 and Gender = 1) Weight = 1.140.

If (Race = 3 and Gender = 1) Weight = 2.850.

If (Race = 1 and Gender = 2) Weight = 0.798.

If (Race = 2 and Gender = 2) Weight = 0.713.

If (Race = 3 and Gender = 2) Weight = 0.950.

Execute.

Finally, we would access the weighting function in the software program to ensure that weights are activated and the analyses are conducted using these weights. In SPSS, we would go to Data, select Weight Cases, select Weight Cases By, select the name of the weighting variable (Weight), and then, click OK. This will activate the weights, and all outputs will be weighted accordingly. Once the statistical analysis is performed, the output should be inspected again to ensure that the weighting was successful.


  Concluding Remarks Top


Poststratification weights offer an effective approach for correcting bias from overrepresented and underrepresented samples. The technique can also help discern the degree to which bias exists should a researcher choose to compare weighted versus unweighted results. The weighting process is relatively straightforward and can be applied to many survey research studies conducted in the field of medical and health professions education.

As noted previously, poststratification weights cannot be accurately calculated unless auxiliary statistics are available. Ideally, auxiliary statistics will consist of exact population parameters, as inexact estimates of a population will result in some measurement error that will be retained even after the weighting process.

There are a number of ways to produce poststratification weights. The method presented in this article is only one approach and was selected because it is a method that most medical and health professions education researchers can perform without having to consult a statistician or psychometrician for assistance. Persons with familiarity with other statistical software programs (e.g., SAS, STATA, and R) can similarly perform these functions. In fact, many programs have macros and other special features that can automate the process. Readers who are more comfortable in performing statistical analyses with other software programs are encouraged to consult the “Help” function within the software and/or perform an online search for tutorials on how to calculate weights using other programs.

There are some additional considerations that survey researchers should take into account. First, it is a good practice to report both weighted and unweighted values as part of the presentation of results. While many consider only the weighted values to be of importance, reporting unweighted values will provide transparency to readers. In addition, it is important to note that calculating weights typically results in an increase in the size of standard errors associated with the estimates. Therefore, for studies in which statistical precision is paramount, researchers should use a statistical procedure that adjusts standard errors based on the unweighted N, as opposed to the weighted N. Perhaps, the biggest problem with poststratification weights is that additional bias may result for subgroups that are not taken into account as part of the weighting process. Therefore, researchers should report weighted data only for those variables that were adjusted and refrain from speculating on how other subpopulations responded. Finally, it should be noted that the example presented in this study is a rather rudimentary example of poststratification weights. Studies involving multivariate data can quickly become increasingly complicated; thus, researchers should consult comprehensive texts by Valliant et al.,[2] Bethlehem and Biffignandi,[3] and Biemer and Christ [4] for additional guidance on how to use poststratification and other types of statistical weights in these contexts.

Financial support and sponsorship

Nil.

Conflicts of interest

Dr. Royal is the editor-in-chief of Education in the Health Professions. All peer-review activities relating to this manuscript were independently performed by other members of the editorial board.



 
  References Top

1.
Royal KD. Four tenets of modern validity theory for medical education assessment and evaluation. Adv Med Educ Pract 2017;8:567-70.  Back to cited text no. 1
    
2.
Valliant R, Dever JA, Kreuter F. Practical Tools for Designing and Weighting Survey Samples. New York: Springer; 2013.  Back to cited text no. 2
    
3.
Bethlehem J, Biffignandi S. Wiley Handbooks in Survey Methodology: Handbook of Web Surveys. Hoboken, US: Wiley; 2011.  Back to cited text no. 3
    
4.
Biemer PP, Christ LL. Weighting survey data. In: de Leeuw ED, Hox J, Dillman D, editors. International handbook of survey methodology. New York, NY: Routledge; 2008.  Back to cited text no. 4
    



 
 
    Tables

  [Table 1]


This article has been cited by
1 Applicability of the London Atlas method in the East China population
Jiaxin Zhou, Donglin Qu, Linfeng Fan, Xiaoyan Yuan, Yiwen Wu, Meizhi Sui, Junjun Zhao, Jiang Tao
Pediatric Radiology. 2022;
[Pubmed] | [DOI]
2 How is your life? understanding the relative importance of life domains amongst older adults, and their associations with self-perceived COVID-19 impacts
Gang Chen, Jan Abel Olsen
Quality of Life Research. 2022;
[Pubmed] | [DOI]
3 Combinations of Electronic Nicotine Delivery System Device and Liquid Characteristics among U.S. adults
Joanna E. Cohen, Jeffrey J. Hardesty, Qinghua Nian, Elizabeth Crespi, Joshua K. Sinamo, Ryan D. Kennedy, Kevin Welding, Bekir Kaplan, Eric Soule, Thomas Eissenberg, Alison B. Breland
Addictive Behaviors. 2022; : 107441
[Pubmed] | [DOI]
4 A pragmatic randomized trial of home-based testing for COVID-19 in rural Native American and Latino communities: Protocol for the “Protecting our Communities” study
Matthew J. Thompson, Paul K. Drain, Charlie E. Gregor, Laurie A. Hassell, Linda K. Ko, Victoria Lyon, Selena Ahmed, Sonia Bishop, Virgil Dupuis, Lorenzo Garza, Allison A. Lambert, Carly Rowe, Teresa Warne, Eliza Webber, Wendy Westbroek, Alexandra K. Adams
Contemporary Clinical Trials. 2022; 119: 106820
[Pubmed] | [DOI]
5 Public Opinion about America’s Opioid Crisis: Severity, Sources, and Solutions in Context
Diana Sun, Amanda Graham, Ben Feldmeyer, Francis T. Cullen, Teresa C. Kulig
Deviant Behavior. 2022; : 1
[Pubmed] | [DOI]
6 Associations between Laser Light Pointer Play and Repetitive Behaviors in Companion Cats: Does Participant Recruitment Method Matter?
Emma K. Grigg, Lori R. Kogan
Journal of Applied Animal Welfare Science. 2022; : 1
[Pubmed] | [DOI]
7 Comparison of Persistent Symptoms Following SARS-CoV-2 Infection by Antibody Status in Nonhospitalized Children and Adolescents
Sarah E. Messiah, Tianyao Hao, Stacia M. DeSantis, Michael D. Swartz, Yashar Talebi, Harold W. Kohl, Shiming Zhang, Melissa Valerio-Shewmaker, Ashraf Yaseen, Steven H. Kelder, Jessica Ross, Michael O. Gonzalez, Leqing Wu, Lindsay N Padilla, Kourtney R. Lopez, David Lakey, Jennifer A. Shuford, Stephen J. Pont, Eric Boerwinkle
Pediatric Infectious Disease Journal. 2022; Publish Ah
[Pubmed] | [DOI]
8 Should samples be weighted to decrease selection bias in online surveys during the COVID-19 pandemic? Data from seven datasets
Chadia Haddad, Hala Sacre, Rony M. Zeenny, Aline Hajj, Marwan Akel, Katia Iskandar, Pascale Salameh
BMC Medical Research Methodology. 2022; 22(1)
[Pubmed] | [DOI]
9 International Hierarchy and Functional Differentiation of States: Results of an Expert Survey
A. D. Nesmashnyi, V. M. Zhornist, I. A. Safranchuk
MGIMO Review of International Relations. 2022;
[Pubmed] | [DOI]
10 COVID-19 Pandemic: The Impact of COVID-19 on Mental Health and Life Habits in the Canadian Population
Felicia Iftene, Roumen Milev, Adriana Farcas, Scott Squires, Daria Smirnova, Konstantinos Fountoulakis
Frontiers in Psychiatry. 2022; 13
[Pubmed] | [DOI]
11 Self-Reported Reasons for Inconsistent Participation in Colorectal Cancer Screening Using FIT in Flanders, Belgium
Sarah Hoeck, Thuy Ngan Tran
Gastrointestinal Disorders. 2022; 5(1): 1
[Pubmed] | [DOI]
12 Health and Care Dependency of Older Adults in Dresden, Germany: Results from the LAB60+ Study
Karla Romero Romero Starke, Janice Hegewald, Stefanie Schmauder, Pauline Kaboth, Lena Marie Uhlmann, David Reissig, Kristin Klaudia Kaufmann, Jürgen Wegge, Gesine Marquardt, Andreas Seidler
International Journal of Environmental Research and Public Health. 2022; 19(18): 11777
[Pubmed] | [DOI]
13 Using Social Media to Assess Expressions of Gratitude to God: Issues for Consideration
Louis Tay, Stuti Thapa, David B. Newman, Munmun De Choudhury
Religions. 2022; 13(9): 778
[Pubmed] | [DOI]
14 Impact of the SARS-CoV-2 Delta Variant Versus Pre-Delta Variants In Non-Hospitalized Children
Sarah Messiah, Yashar Talebi, Michael D. Swartz, Frances Brito, Harold W. Kohl, III, Shiming Zhang, Melissa A. Valerio-Shewmaker, Stacia M. DeSantis, Ashraf Yaseen, Steven H. Kelder, Onyinye S. Omega-Njemnobi, Jessica A. Ross, Michael O. Gonzalez, Lequing Wu, David Lakey, Jennifer A. Shuford, Stephen J. Pont, Eric Boerwinkle
SSRN Electronic Journal. 2022;
[Pubmed] | [DOI]
15 Mental health in adult refugees from Syria resettled in Norway between 2015 and 2017: a nationwide, questionnaire-based, cross-sectional prevalence study
Alexander Nissen, Prue Cauley, Fredrik Saboonchi, Arnfinn J Andersen, Øivind Solberg
European Journal of Psychotraumatology. 2021; 12(1)
[Pubmed] | [DOI]
16 Estimating local prevalence of obesity via survey under cost constraints: Stratifying ZCTAs in Virginia’s Thomas Jefferson Health District
Benjamin J. Lobo, Denise E. Bonds, Karen Kafadar
Statistics and Public Policy. 2021; : 1
[Pubmed] | [DOI]
17 Mobility-as-a-Service as a transport demand management tool: A case study among employees in the Netherlands
Zakir Hussain Farahmand,Konstantinos Gkiotsalitis,Karst T. Geurs
Case Studies on Transport Policy. 2021;
[Pubmed] | [DOI]
18 A Comparison of Public Perceptions of Physicians and Veterinarians in the United States
April A. Kedrowicz,Kenneth D. Royal
Veterinary Sciences. 2020; 7(2): 50
[Pubmed] | [DOI]
19 Children at risk: A nation-wide, cross-sectional study examining post-traumatic stress symptoms in refugee minors from Syria, Iraq and Afghanistan resettled in Sweden between 2014 and 2018
Øivind Solberg,Alexander Nissen,Marjan Vaez,Prue Cauley,Anna-Karin Eriksson,Fredrik Saboonchi
Conflict and Health. 2020; 14(1)
[Pubmed] | [DOI]
20 Perceptions and attitudes of Small Animal Internal Medicine specialists toward the publication requirement for board certification
Adam J. Birkenheuer,Kenneth D. Royal,Anthony Cerreta,Daniel Hemstreet,Katharine F. Lunn,Jody L. Gookin,Stephanie McGarvey
Journal of Veterinary Internal Medicine. 2020;
[Pubmed] | [DOI]
21 Influence of health beliefs on adherence to COVID-19 preventative practices: an online international study via social media (Preprint)
Julianna C Hsing,Jasmin Ma,Alejandra Barrero-Castillero,Shilpa G Jani,Uma Palam Pulendran,Bea-Jane Lin,Monika Thomas-Uribe,C. Jason Wang
Journal of Medical Internet Research. 2020;
[Pubmed] | [DOI]



 

Top
 
  Search
 
    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

 
  In this article
Abstract
Introduction
Procedural Steps...
Concluding Remarks
References
Article Tables

 Article Access Statistics
    Viewed15978    
    Printed316    
    Emailed0    
    PDF Downloaded1342    
    Comments [Add]    
    Cited by others 21    

Recommend this journal