Compute variance inflation factor from uncertainty in responding sample size (with smoothing for continuous $n_h$)

Computes $\zeta_h(n_h, \bar{\phi}_h):=\mathrm{E}(r_h)\mathrm{E}\left(\frac{1}{r_h}\right)$, as defined in our paper (Mendelson & Elliott, in press) see Details for a summary. If there are any strata where the allocation may lead to $\mathrm{E}(r_h) < r_h^{LB}$ for user-specified $r_h^{LB}$ (3.5 by default), then $\zeta_h(.)$ is evaluated using $n_h':= max\left(n_h, \left\lceil \frac{r_h^{LB}}{\bar{\phi}_h}\right\rceil\right)$ in place of $n_h$. Further, $\zeta_h(.)$ is computed for continuous $n_h$ as a weighted average of evaluations at $\lfloor n_h \rfloor$ and $\lfloor n_h \rfloor + 1$, as in our paper.

Usage

calc_zeta(n_h, phibar_h, rh_min = 3.5, verbose_flag = FALSE)

Arguments

n_h: (vector) strata sample sizes before nonresponse ($n_h$)
phibar_h: (vector) strata response propensities ($\bar{\phi}_h$)
rh_min: (scalar) minimum target respondents per stratum ($r_h^{LB}$); default is 3.5
verbose_flag: (bool) flag on whether to provide noisy results

Value

vector of length $H$ containing $\left\{\zeta_h(n_h', \bar{\phi}_h):h=1,2,...,H\right\}$, where $n_h'$ is the larger of $n_h$ or $\frac{r_h^{LB}}{\bar{\phi}_h}$

Details

In Mendelson & Elliott (in press), we assumed that the number of respondents in stratum $h$ can be modeled as standard binomial with support for zero removed (i.e., zero-truncated binomial; see dtruncbinom()), written as $r_h \sim TBinom(n_h, \bar{\phi}_h)$, where $n_h$ is the number of invitees in stratum $h$, $\bar{\phi}_h$ is the average response propensity within stratum $h$, and where the unit-level response propensities are assumed constant within strata. Our paper defines the function $\zeta_h(n_h, \bar{\phi}_h):=\mathrm{E}(r_h)\mathrm{E}\left(\frac{1}{r_h}\right)$. This quantity is a variance inflation term that captures the effect of variability in the number of respondents for a given allocation (when computing the variance of the poststratified estimator under nonresponse for the finite population mean).

For discrete $n_h$, the current function (calc_zeta()) computes $\zeta_h(n_h', \bar{\phi}_h)$, where we use $n_h':= max\left(n_h, \left\lceil \frac{r_h^{LB}}{\bar{\phi}_h}\right\rceil\right)$ to avoid underallocating to strata with too few expected respondents, and where $r_h^{LB}$ is some given lower bound on the number of expected respondents. By default, we set $r_h^{LB} = 3.5$, since the truncated binomial distribution may sometimes be a poor approximation for the binomial distribution below these levels, and as we observed numerically that $\zeta_h(n_h, \bar{\phi}_h)$ is roughly maximized for various $\bar{\phi}_h$ (fixed at levels between .01 and 1) when $n_h \approx \frac{3.5}{\bar{\phi}_h}$.

For continuous $n_h$, we define $\zeta_h(n_h, \bar{\phi}_h)$ as a weighted average of its evaluations at $\lfloor n_h \rfloor$ and $\lfloor n_h \rfloor + 1$, via $$\zeta_h'(n_h,\bar{\phi}_h) = w_h \cdot \zeta_h(\lfloor n_h \rfloor,\bar{\phi}_h) + \left(1 - w_h\right)\cdot \zeta_h(\lfloor n_h \rfloor + 1,\bar{\phi}_h),$$ where $w_h= \left(\lfloor n_h \rfloor + 1\right) - n_h$.

References

Mendelson, J., & Elliott, M. R. (in press). Journal of Survey Statistics and Methodology.

Examples