When observational data are used, assignment to treatment group is not random and causal inference may be difficult. A common approach to addressing this is propensity score weighting where the propensity score is the probability that a person will be assigned to the treatment arm due to their observable characteristics. This propensity is often estimated using logistic regression of individual characteristics on a binary variable of whether or not the individual received treatment. Propensity scores are often used by applying inverse probability estimates of treatment weight (IPTW) to obtain treatment effects that adjust for known confounding factors.

A paper from Xu et al. (2010) shows that the use of the IPTW approach may result in a spurious overestimation of sample size and an increased likelihood of a type I error (i.e., rejection of the null hypothesis when it is actually true). The authors claim that powerful variance estimators can address this problem but only work well for large sample sizes. Instead, Xu and co-authors suggested the use of standardized weights in the IPTW as a simple and easy-to-implement strategy. Here is how this works.

The IPTW approach simply examines the difference between the treated and untreated group after applying IPTW weighting. Let the frequency with which someone is being treated be:

where *n1* is the number of persons treated f *n* is the total sample size. Leaves *z*= 1 if the person in the data is treated as *z*= 0 if the person is not processed. Suppose each person has a vector of patient characteristics, *X*affecting the likelihood of receiving treatment. Then one calculates the probability of a cure as follows:

Under the IPTW standard, the weights used would be:

Xu and his colleagues created simulations to show that the type 1 error is very high — often 15% to 40%. To correct this, standardized weights (SW) can be used as follows:

The former is used for the treated population (eg, z = 1) and the latter is used in the untreated population (z = 0). The authors show that under standardized weights, the type I error rate is about 5% as intended. Indeed, the authors also showed that standard weighting often outperforms robust estimates of variance as well as for main effects estimation.

For more information, you can read the full article here.