Compute variance inflation factor from uncertainty in responding sample size (with smoothing for continuous \(n_h\))
Source:R/allocation.r
calc_zeta.Rd
Computes \(\zeta_h(n_h, \bar{\phi}_h):=\mathrm{E}(r_h)\mathrm{E}\left(\frac{1}{r_h}\right)\), as defined in our paper (Mendelson & Elliott, in press) see Details for a summary. If there are any strata where the allocation may lead to \(\mathrm{E}(r_h) < r_h^{LB}\) for user-specified \(r_h^{LB}\) (3.5 by default), then \(\zeta_h(.)\) is evaluated using \(n_h':= max\left(n_h, \left\lceil \frac{r_h^{LB}}{\bar{\phi}_h}\right\rceil\right)\) in place of \(n_h\). Further, \(\zeta_h(.)\) is computed for continuous \(n_h\) as a weighted average of evaluations at \(\lfloor n_h \rfloor\) and \(\lfloor n_h \rfloor + 1\), as in our paper.
Arguments
- n_h
(vector) strata sample sizes before nonresponse (\(n_h\))
- phibar_h
(vector) strata response propensities (\(\bar{\phi}_h\))
- rh_min
(scalar) minimum target respondents per stratum (\(r_h^{LB}\)); default is 3.5
- verbose_flag
(bool) flag on whether to provide noisy results
Value
vector of length \(H\) containing \(\left\{\zeta_h(n_h', \bar{\phi}_h):h=1,2,...,H\right\}\), where \(n_h'\) is the larger of \(n_h\) or \(\frac{r_h^{LB}}{\bar{\phi}_h}\)
Details
In Mendelson & Elliott (in press), we assumed that the number of respondents in stratum \(h\)
can be modeled as standard binomial with support for zero removed
(i.e., zero-truncated binomial; see dtruncbinom()
),
written as \(r_h \sim TBinom(n_h, \bar{\phi}_h)\), where
\(n_h\) is the number of invitees in stratum \(h\),
\(\bar{\phi}_h\) is the average response propensity within stratum \(h\),
and where the unit-level response propensities are assumed constant within
strata.
Our paper defines the function
\(\zeta_h(n_h, \bar{\phi}_h):=\mathrm{E}(r_h)\mathrm{E}\left(\frac{1}{r_h}\right)\).
This quantity is a variance inflation term that captures the effect of
variability in the number of respondents for a given allocation
(when computing the variance of the poststratified estimator under nonresponse
for the finite population mean).
For discrete \(n_h\), the current function (calc_zeta()
)
computes \(\zeta_h(n_h', \bar{\phi}_h)\), where we use
\(n_h':= max\left(n_h, \left\lceil \frac{r_h^{LB}}{\bar{\phi}_h}\right\rceil\right)\)
to avoid underallocating to strata with too few expected respondents, and
where \(r_h^{LB}\) is some given lower bound on the number
of expected respondents.
By default, we set \(r_h^{LB} = 3.5\), since the truncated binomial
distribution may sometimes be a poor approximation
for the binomial distribution below these levels,
and as we observed numerically that \(\zeta_h(n_h, \bar{\phi}_h)\)
is roughly maximized for various \(\bar{\phi}_h\) (fixed at levels
between .01 and 1) when \(n_h \approx \frac{3.5}{\bar{\phi}_h}\).
For continuous \(n_h\), we define \(\zeta_h(n_h, \bar{\phi}_h)\) as a weighted average of its evaluations at \(\lfloor n_h \rfloor\) and \(\lfloor n_h \rfloor + 1\), via $$\zeta_h'(n_h,\bar{\phi}_h) = w_h \cdot \zeta_h(\lfloor n_h \rfloor,\bar{\phi}_h) + \left(1 - w_h\right)\cdot \zeta_h(\lfloor n_h \rfloor + 1,\bar{\phi}_h),$$ where \(w_h= \left(\lfloor n_h \rfloor + 1\right) - n_h\).