Weighting in Survey: The Idea and Application

19 Oct

Posted by: Kultar Singh

Category: Research and M&E

Weighting in Survey: The Idea and Application

Data sets are weighted to increase their representation of the population they are intended to measure. Survey weights can be found in many datasets. This blog will present the idea of using weights for survey data.

What are survey weights?

These are statistical adjustments made to survey data after it has been collected to improve the accuracy of the survey estimates.

Why are survey weights used?

Weights are applied to reduce survey bias to make the sample of survey respondents more representative of our target population.

Survey designers must try to generate a representative sample using random sampling techniques and use survey weights to modify sample data to reflect the population better. Because of common sampling techniques, different population units may have varying chances of being chosen to participate, which is one reason for weighting data. Further, in some cases, such as stratification and non-response, it is essential to give weight to the data before analyzing it. There are several types of weighting techniques likes:

Weighting to account for the likelihood of selection

True probabilistic sampling is a random sampling: everyone has an equal chance of being chosen using a sampling procedure. Although in practice, not everyone may have an equal chance of being chosen. When all eligible household members are listed in demographic surveys, each household, not each eligible person, has an equal chance of being chosen. As a result, eligible members of households with more eligible members are frequently underrepresented because they have a lower chance of being chosen. In these situations, changing the weighting is important before continuing the analysis. 

With the aid of these well-known variations in selection probabilities, design weights can be computed to modify the sample to reflect the population better.

Non-response weights

These can be used to make up for survey non-response. Not every sample will participate in a survey: some individuals cannot be reached, and others decline to participate. Groups in the population can have systematic differences in response rates, which means that some subgroups may be more or less likely to respond to the survey or particular survey questions. Non-response weights correct data and reduce potential bias using information about subgroup response rates.

For example, in cases where female respondents participate in an HIV behavior surveillance survey, they have a relatively high non-response rate due to their reluctance to broach the subject openly. In these situations, it becomes necessary to give female responses more weight than male responses if the researcher wants to analyze the various behavioral traits among HIV patients by gender. Because the observed distributions don’t match the actual population, researchers must use weights to make adjustments.

Let’s consider a scenario where we could only interview 20 females and 80 males in the survey, even though the true proportion of HIV patients in the population is 50-50 by gender. If so, scientists can multiply each female response by 4.0, yielding 80 females and 80 males. However, the total sample size would rise from 100 to 160 in that scenario. Therefore, researchers need to tip the scales even further back to 100 to calculate all percentages on a sample size of 100. It could be done by adding another 5/8 to the weight of both the males and females.

Weighting when a stratum is underrepresented

In the case of some socioeconomic surveys, researchers may use weighting when a stratum is underrepresented. We frequently encounter situations where the data indicates that certain strata are underrepresented. It might occur because of disproportionate stratified sampling or non-response. In these situations, underrepresented strata must be weighed before the analysis can continue.

Weights can be used to modify the sample to reflect important population proportions accurately. These weights also referred to as “poststratification” or “calibration” weights, are frequently used by statistical organizations to increase the precision and accuracy of population estimates.

How are weights applied?

Each case has a value in the weighting variable that specifies how the case should be weighted during analysis. Over-represented cases have lower weights, which reduce their impact, while under-represented cases have higher weights, which increase their impact.

Using weights involves different procedures depending on the statistical software, but they all typically involve stating the name of the weighting variable before or during the analysis. In statistical software like SPSS, one can simply use the “weight cases” option, enter the weight variable, and have the weighted analysis.

In the case of statistical software such as STATA, too, users can use weights as part of the main syntax options to produce weighted analysis. It is important to point out that both SPSS and STATA also provide the option of complex sample analysis, wherein one can customize the sampling plan for a specific study and apply relevant weights at each stratum/unit to produce a weighted analysis for the study.

What makes applying weights so crucial?

Results from weighted and unweighted analyses can differ. In an ideal situation, if your sampling strategy is self-weighted, you might not require weights, but it may not happen in field situations. Hence it is important to apply weights so that the analysis represents the true representation of the population. Further, by applying weights, one can reduce biases, which is why weighted results are more representative of the population.

Kultar Singh – Chief Executive Officer, Sambodhi