|
|
 |
|
METHODOLOGY |
|
Year : 2019 | Volume
: 2
| Issue : 1 | Page : 48-50 |
|
Survey research methods: A guide for creating post-stratification weights to correct for sample bias
Kenneth D Royal
Department of Clinical Sciences, North Carolina State University, Raleigh, North Carolina, USA
Date of Web Publication | 30-May-2019 |
Correspondence Address: Dr. Kenneth D Royal Department of Clinical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina USA
 Source of Support: None, Conflict of Interest: None  | 21 |
DOI: 10.4103/EHP.EHP_8_19
Nonrepresentative data pose one of the greatest validity threats in survey research. Samples that are underrepresented and/or overrepresented based on demographic subgroups can introduce bias that distorts both the accuracy and the inferences made about the results. This article discusses the concept of poststratification weighting, a post hoc statistical procedure used to correct for sampling bias in survey research studies. Procedural steps for calculating poststratification weights are presented, and an example involving a simulated cohort of students in a medical school is provided for demonstration purposes. SPSS statistical software coding is presented to help researchers get started with their own calculations of poststratification weights.
Keywords: Assessment, bias, evaluation, health surveys, medical education, statistics, survey research, surveys
How to cite this article: Royal KD. Survey research methods: A guide for creating post-stratification weights to correct for sample bias. Educ Health Prof 2019;2:48-50 |
How to cite this URL: Royal KD. Survey research methods: A guide for creating post-stratification weights to correct for sample bias. Educ Health Prof [serial online] 2019 [cited 2023 Mar 27];2:48-50. Available from: https://www.ehpjournal.com/text.asp?2019/2/1/48/259389 |
Introduction | |  |
In medical and health professions education, most surveys are administered in the context of a census study in which all members of a population (e.g., a student cohort) are surveyed. With exception to surveys that require participation, it is typical for only some individuals to complete the survey. When surveys fail to achieve a 100% participation rate, response bias becomes a concern. In social research, various subpopulations often respond to survey items differently according to factors such as race, gender, and other demographic characteristics. As a result, underrepresentation or overrepresentation from members of various subpopulation groups can introduce bias into survey results. The consequence is that statistical software will simply analyze the data given, thus providing greater weight to those individuals who were overrepresented and lesser weight to those individuals who were underrepresented. This results in a validity threat, as both the accuracy and the inferences made about the results are distorted by the sampling bias.[1] Given this reality, it is critical that survey researchers make every effort to produce accurate, nonbiased estimates that characterize the views, attitudes, beliefs, etc., of both the entire population and its major subpopulation groups.
Typically, survey researchers attempt to minimize response bias by obtaining representative samples. In short, representative samples help ensure that one's findings may be generalizable to the population from which the sample was drawn. A major advantage of census studies in the context of medical and health professions education is that population parameters, such as demographic characteristics and other auxiliary statistics, typically are known. Thus, with the use of a chi-squared test, researchers can determine if the participants that completed the survey proportionally resemble the population of interest based on key characteristics. If chi-squared tests confirm that the sample resembles the population, then the researcher may proceed with the analysis and subsequent reporting of results. However, if chi-squared tests indicate that the sample is disproportionate, then the researcher should take some action to correct for this bias. One option is to obtain additional data from members of subpopulation groups that are underrepresented in the data. However, given the relatively small size of most populations, stratified sampling techniques typically are only marginally helpful in this context. Thus, researchers often are forced to consider other alternatives.
One robust alternative to correct for sample distributions that do not perfectly resemble population distributions is to apply poststratification weights. In short, poststratification weighting involves taking sample data and aligning the representation of various subpopulation groups to match that of the known population. As the name implies, poststratification weights are calculated after all data are collected. When the procedure is performed correctly, extant data are statistically adjusted to reflect population parameters, making results both more accurate and generalizable across the population of interest. Thus, the aim of this article is to provide an overview of poststratification weighting and demonstrate how this procedure can be leveraged to obtain more accurate results in many, if not most, medical and health professions education survey contexts.
Procedural Steps for Creating Poststratification Weights | |  |
First, let us consider the procedural steps necessary for calculating weight values.
- Step 1: Create a table to assemble your variables
- Step 2: Populate your values for Population (N) and Sample (n), where appropriate
- Step 3: Calculate a total count for the Population (N) and Sample (n) columns
- Step 4: Populate the Proportion of Population column by dividing the value for each Combined Variable in the Population (N) column by its column Total
- Step 5: Populate the Proportion of Sample column by dividing the value for each Combined Variable in the Sample (n) column by its column Total
- Step 6: Calculate Weight by dividing the value in each cell of the Proportion of Population column by the value in each cell of the Proportion of Sample column.
Next, let us apply these steps using an illustrative example.
An illustrative example
Suppose a 1st-year medical school cohort consists of 100 students. Of those 100 students, 50 identified as male and 50 identified as female. With respect to race/ethnicity, 70 students self-reported as White, 20 as Black, and 10 as Other. These data serve as the auxiliary statistics for this exercise.
First, we need to produce a crosstab contingency table [Table 1] to establish counts for each combination of race and gender variables (see Combined Variables column). Known student cohort values are entered into the Population (N) column. For this exercise, let us assume that 57 students completed the survey. Thus, next, we need to identify which 57 students of the 100 in the population completed the survey and similarly provide counts for each combination of race and gender in the Sample (n) column. Let us also assume that our sample participants responded in a disproportionate manner (thus justifying the need for poststratification weights), with females responding in greater numbers than males and Black students responding in greater numbers than White or Other students. Simulated values are provided in the Sample (n) column. Proportional values are then created for both the population parameters (e.g., 35 White males divided by 100 students in the total population is 0.350) and the sample's statistics (e.g., 15 White males divided by 57 students in the sample of participants is 0.263) by dividing each value by its respective total count. Proportion of Population values are then divided by Proportion of Sample values (e.g., 0.350 divided by 0.263 equals 1.330) to determine the Weight. A visual inspection of the weights provides a quality assurance check confirming that the values are correct. Next, let us identify how to apply weights using IBM SPSS Statistics for Windows, Version 25.0. (IBM Corp., Armonk, NY, USA).
Applying weights in a statistical software package
After weights are calculated, the weights need to be applied to the data. This process will vary depending on the statistical software program used, but the essence of the process is generally the same. For convenience, SPSS syntax is provided in this example. Suppose the coding schema in the dataset for Race is 1 = White, 2 = Black, and 3 = Other and for Gender is 1 = Male and 2 = Female, the following coding schema would create a new variable (named Weight).
If (Race = 1 and Gender = 1) Weight = 1.330.
If (Race = 2 and Gender = 1) Weight = 1.140.
If (Race = 3 and Gender = 1) Weight = 2.850.
If (Race = 1 and Gender = 2) Weight = 0.798.
If (Race = 2 and Gender = 2) Weight = 0.713.
If (Race = 3 and Gender = 2) Weight = 0.950.
Execute.
Finally, we would access the weighting function in the software program to ensure that weights are activated and the analyses are conducted using these weights. In SPSS, we would go to Data, select Weight Cases, select Weight Cases By, select the name of the weighting variable (Weight), and then, click OK. This will activate the weights, and all outputs will be weighted accordingly. Once the statistical analysis is performed, the output should be inspected again to ensure that the weighting was successful.
Concluding Remarks | |  |
Poststratification weights offer an effective approach for correcting bias from overrepresented and underrepresented samples. The technique can also help discern the degree to which bias exists should a researcher choose to compare weighted versus unweighted results. The weighting process is relatively straightforward and can be applied to many survey research studies conducted in the field of medical and health professions education.
As noted previously, poststratification weights cannot be accurately calculated unless auxiliary statistics are available. Ideally, auxiliary statistics will consist of exact population parameters, as inexact estimates of a population will result in some measurement error that will be retained even after the weighting process.
There are a number of ways to produce poststratification weights. The method presented in this article is only one approach and was selected because it is a method that most medical and health professions education researchers can perform without having to consult a statistician or psychometrician for assistance. Persons with familiarity with other statistical software programs (e.g., SAS, STATA, and R) can similarly perform these functions. In fact, many programs have macros and other special features that can automate the process. Readers who are more comfortable in performing statistical analyses with other software programs are encouraged to consult the “Help” function within the software and/or perform an online search for tutorials on how to calculate weights using other programs.
There are some additional considerations that survey researchers should take into account. First, it is a good practice to report both weighted and unweighted values as part of the presentation of results. While many consider only the weighted values to be of importance, reporting unweighted values will provide transparency to readers. In addition, it is important to note that calculating weights typically results in an increase in the size of standard errors associated with the estimates. Therefore, for studies in which statistical precision is paramount, researchers should use a statistical procedure that adjusts standard errors based on the unweighted N, as opposed to the weighted N. Perhaps, the biggest problem with poststratification weights is that additional bias may result for subgroups that are not taken into account as part of the weighting process. Therefore, researchers should report weighted data only for those variables that were adjusted and refrain from speculating on how other subpopulations responded. Finally, it should be noted that the example presented in this study is a rather rudimentary example of poststratification weights. Studies involving multivariate data can quickly become increasingly complicated; thus, researchers should consult comprehensive texts by Valliant et al.,[2] Bethlehem and Biffignandi,[3] and Biemer and Christ [4] for additional guidance on how to use poststratification and other types of statistical weights in these contexts.
Financial support and sponsorship
Nil.
Conflicts of interest
Dr. Royal is the editor-in-chief of Education in the Health Professions. All peer-review activities relating to this manuscript were independently performed by other members of the editorial board.
References | |  |
1. | Royal KD. Four tenets of modern validity theory for medical education assessment and evaluation. Adv Med Educ Pract 2017;8:567-70. |
2. | Valliant R, Dever JA, Kreuter F. Practical Tools for Designing and Weighting Survey Samples. New York: Springer; 2013. |
3. | Bethlehem J, Biffignandi S. Wiley Handbooks in Survey Methodology: Handbook of Web Surveys. Hoboken, US: Wiley; 2011. |
4. | Biemer PP, Christ LL. Weighting survey data. In: de Leeuw ED, Hox J, Dillman D, editors. International handbook of survey methodology. New York, NY: Routledge; 2008. |
[Table 1]
This article has been cited by | 1 |
Applicability of the London Atlas method in the East China population |
|
| Jiaxin Zhou, Donglin Qu, Linfeng Fan, Xiaoyan Yuan, Yiwen Wu, Meizhi Sui, Junjun Zhao, Jiang Tao | | Pediatric Radiology. 2022; | | [Pubmed] | [DOI] | | 2 |
How is your life? understanding the relative importance of life domains amongst older adults, and their associations with self-perceived COVID-19 impacts |
|
| Gang Chen, Jan Abel Olsen | | Quality of Life Research. 2022; | | [Pubmed] | [DOI] | | 3 |
Combinations of Electronic Nicotine Delivery System Device and Liquid Characteristics among U.S. adults |
|
| Joanna E. Cohen, Jeffrey J. Hardesty, Qinghua Nian, Elizabeth Crespi, Joshua K. Sinamo, Ryan D. Kennedy, Kevin Welding, Bekir Kaplan, Eric Soule, Thomas Eissenberg, Alison B. Breland | | Addictive Behaviors. 2022; : 107441 | | [Pubmed] | [DOI] | | 4 |
A pragmatic randomized trial of home-based testing for COVID-19 in rural Native American and Latino communities: Protocol for the “Protecting our Communities” study |
|
| Matthew J. Thompson, Paul K. Drain, Charlie E. Gregor, Laurie A. Hassell, Linda K. Ko, Victoria Lyon, Selena Ahmed, Sonia Bishop, Virgil Dupuis, Lorenzo Garza, Allison A. Lambert, Carly Rowe, Teresa Warne, Eliza Webber, Wendy Westbroek, Alexandra K. Adams | | Contemporary Clinical Trials. 2022; 119: 106820 | | [Pubmed] | [DOI] | | 5 |
Public Opinion about America’s Opioid Crisis: Severity, Sources, and Solutions in Context |
|
| Diana Sun, Amanda Graham, Ben Feldmeyer, Francis T. Cullen, Teresa C. Kulig | | Deviant Behavior. 2022; : 1 | | [Pubmed] | [DOI] | | 6 |
Associations between Laser Light Pointer Play and Repetitive Behaviors in Companion Cats: Does Participant Recruitment Method Matter? |
|
| Emma K. Grigg, Lori R. Kogan | | Journal of Applied Animal Welfare Science. 2022; : 1 | | [Pubmed] | [DOI] | | 7 |
Comparison of Persistent Symptoms Following SARS-CoV-2 Infection by Antibody Status in Nonhospitalized Children and Adolescents |
|
| Sarah E. Messiah, Tianyao Hao, Stacia M. DeSantis, Michael D. Swartz, Yashar Talebi, Harold W. Kohl, Shiming Zhang, Melissa Valerio-Shewmaker, Ashraf Yaseen, Steven H. Kelder, Jessica Ross, Michael O. Gonzalez, Leqing Wu, Lindsay N Padilla, Kourtney R. Lopez, David Lakey, Jennifer A. Shuford, Stephen J. Pont, Eric Boerwinkle | | Pediatric Infectious Disease Journal. 2022; Publish Ah | | [Pubmed] | [DOI] | | 8 |
Should samples be weighted to decrease selection bias in online surveys during the COVID-19 pandemic? Data from seven datasets |
|
| Chadia Haddad, Hala Sacre, Rony M. Zeenny, Aline Hajj, Marwan Akel, Katia Iskandar, Pascale Salameh | | BMC Medical Research Methodology. 2022; 22(1) | | [Pubmed] | [DOI] | | 9 |
International Hierarchy and Functional Differentiation of States: Results of an Expert Survey |
|
| A. D. Nesmashnyi, V. M. Zhornist, I. A. Safranchuk | | MGIMO Review of International Relations. 2022; | | [Pubmed] | [DOI] | | 10 |
COVID-19 Pandemic: The Impact of COVID-19 on Mental Health and Life Habits in the Canadian Population |
|
| Felicia Iftene, Roumen Milev, Adriana Farcas, Scott Squires, Daria Smirnova, Konstantinos Fountoulakis | | Frontiers in Psychiatry. 2022; 13 | | [Pubmed] | [DOI] | | 11 |
Self-Reported Reasons for Inconsistent Participation in Colorectal Cancer Screening Using FIT in Flanders, Belgium |
|
| Sarah Hoeck, Thuy Ngan Tran | | Gastrointestinal Disorders. 2022; 5(1): 1 | | [Pubmed] | [DOI] | | 12 |
Health and Care Dependency of Older Adults in Dresden, Germany: Results from the LAB60+ Study |
|
| Karla Romero Romero Starke, Janice Hegewald, Stefanie Schmauder, Pauline Kaboth, Lena Marie Uhlmann, David Reissig, Kristin Klaudia Kaufmann, Jürgen Wegge, Gesine Marquardt, Andreas Seidler | | International Journal of Environmental Research and Public Health. 2022; 19(18): 11777 | | [Pubmed] | [DOI] | | 13 |
Using Social Media to Assess Expressions of Gratitude to God: Issues for Consideration |
|
| Louis Tay, Stuti Thapa, David B. Newman, Munmun De Choudhury | | Religions. 2022; 13(9): 778 | | [Pubmed] | [DOI] | | 14 |
Impact of the SARS-CoV-2 Delta Variant Versus Pre-Delta Variants In Non-Hospitalized Children |
|
| Sarah Messiah, Yashar Talebi, Michael D. Swartz, Frances Brito, Harold W. Kohl, III, Shiming Zhang, Melissa A. Valerio-Shewmaker, Stacia M. DeSantis, Ashraf Yaseen, Steven H. Kelder, Onyinye S. Omega-Njemnobi, Jessica A. Ross, Michael O. Gonzalez, Lequing Wu, David Lakey, Jennifer A. Shuford, Stephen J. Pont, Eric Boerwinkle | | SSRN Electronic Journal. 2022; | | [Pubmed] | [DOI] | | 15 |
Mental health in adult refugees from Syria resettled in Norway between 2015 and 2017: a nationwide, questionnaire-based, cross-sectional prevalence study |
|
| Alexander Nissen, Prue Cauley, Fredrik Saboonchi, Arnfinn J Andersen, Øivind Solberg | | European Journal of Psychotraumatology. 2021; 12(1) | | [Pubmed] | [DOI] | | 16 |
Estimating local prevalence of obesity via survey under cost constraints: Stratifying ZCTAs in Virginia’s Thomas Jefferson Health District |
|
| Benjamin J. Lobo, Denise E. Bonds, Karen Kafadar | | Statistics and Public Policy. 2021; : 1 | | [Pubmed] | [DOI] | | 17 |
Mobility-as-a-Service as a transport demand management tool: A case study among employees in the Netherlands |
|
| Zakir Hussain Farahmand,Konstantinos Gkiotsalitis,Karst T. Geurs | | Case Studies on Transport Policy. 2021; | | [Pubmed] | [DOI] | | 18 |
A Comparison of Public Perceptions of Physicians and Veterinarians in the United States |
|
| April A. Kedrowicz,Kenneth D. Royal | | Veterinary Sciences. 2020; 7(2): 50 | | [Pubmed] | [DOI] | | 19 |
Children at risk: A nation-wide, cross-sectional study examining post-traumatic stress symptoms in refugee minors from Syria, Iraq and Afghanistan resettled in Sweden between 2014 and 2018 |
|
| Øivind Solberg,Alexander Nissen,Marjan Vaez,Prue Cauley,Anna-Karin Eriksson,Fredrik Saboonchi | | Conflict and Health. 2020; 14(1) | | [Pubmed] | [DOI] | | 20 |
Perceptions and attitudes of Small Animal Internal Medicine specialists toward the publication requirement for board certification |
|
| Adam J. Birkenheuer,Kenneth D. Royal,Anthony Cerreta,Daniel Hemstreet,Katharine F. Lunn,Jody L. Gookin,Stephanie McGarvey | | Journal of Veterinary Internal Medicine. 2020; | | [Pubmed] | [DOI] | | 21 |
Influence of health beliefs on adherence to COVID-19 preventative practices: an online international study via social media (Preprint) |
|
| Julianna C Hsing,Jasmin Ma,Alejandra Barrero-Castillero,Shilpa G Jani,Uma Palam Pulendran,Bea-Jane Lin,Monika Thomas-Uribe,C. Jason Wang | | Journal of Medical Internet Research. 2020; | | [Pubmed] | [DOI] | |
|
 |
 |
|