Calculate optimal allocation under anticipated nonresponse

Computes the optimal stratified sampling allocation under anticipated nonresponse, as proposed in Mendelson & Elliott (in press), which is $$n_h \propto \frac{N_h S_h \sqrt{\zeta_h(n_h, \bar{\phi}_h)}} {\sqrt{\bar{\phi}_h c_h }},$$ where $N_h$ is the stratum $h$ population size, $S_h$ is the stratum $h$ unit standard deviation, $\bar{\phi}_h$ is the stratum $h$ average response propensity, $\zeta_h(.)$ is a variance inflation term that captures variability in the number of respondents (see calc_zeta()), and $c_h = \textrm{E}(C_h) / n_h = c_{NR_h} \left(\bar{\phi}_h (\tau_h - 1) + 1\right)$ is the expected cost per invitee in stratum $h$. The cost structure assumes that respondents and nonrespondents in stratum $h$ have per-unit costs of $c_{R_h}$ and $c_{NR_h}$, respectively, with a ratio of $\tau_h = c_{R_h}/c_{NR_h}$.

opt_nh_nonresp() computes the exact allocation in an iterative fashion (see Details section); individual iterations are computed using opt_nh_nonresp_oneiter(), which conditions on some given $\zeta_h(.)$.

Users must specify either the total sample size (n_total) or the total expected costs (cost_total), but not both. Under the cost specification, users must also specify the unit costs of nonrespondents (c_NR_h) and the ratio of unit costs for respondents to that of nonrespondents (tau_h). These variables are $h$-dimensional vectors, but can be supplied as scalars if equivalent across strata.

Optionally, the user can also specify the unit population standard deviations, S_h, assumed constant across strata by default.

opt_nh_nonresp_oneiter() computes one iteration of the proposed allocation for some user-supplied $\zeta_h(.)$.

Usage

opt_nh_nonresp(
  N_h,
  phibar_h,
  ...,
  tol = 1e-08,
  max_iter = 20,
  verbose_flag = FALSE
)

opt_nh_nonresp_oneiter(
  N_h,
  phibar_h,
  S_h = NULL,
  n_total = NULL,
  cost_total = NULL,
  zeta_h = NULL,
  c_NR_h = NULL,
  tau_h = NULL,
  strict_flag = TRUE,
  verbose_flag = FALSE
)

Arguments

N_h

(vector) strata population counts ($N_h$)

phibar_h

(vector) strata response propensities ($\bar{\phi}_h$)

...

Arguments passed on to opt_nh_nonresp_oneiter

tol

(scalar) tolerance (for stopping)

max_iter

(scalar) maximum number of iterations (>=2)

verbose_flag

(boolean) whether to provide detailed results

S_h

(vector) strata population standard deviations ($S_h$); constant, by default

n_total

(scalar) total sample to allocate

cost_total

(scalar) total expected costs to allocate

zeta_h

(vector; use with opt_nh_nonresp_oneiter, only; optional) adjustment factor to reflect inflation in variances from randomness in the number of respondents (default = 1)

c_NR_h

(vector; use with cost_total) per-unit costs for nonrespondents in stratum h ($c_{NR_h}$)

tau_h

(vector; use with cost_total) ratio of costs for respondents to costs for nonrespondents in stratum h ($\tau_h$)

strict_flag

(boolean) whether to throw error (versus warning) if any $n_h > N_h$

Value

opt_nh_nonresp() returns sample allocation vector n_h with the following attributes:

num_iter (scalar) number of iterations used;
zeta_h_prev (vector) final values of zeta_h used (i.e., from 2nd-to-last iteration); and
max_nh_delta (scalar) biggest change in stratum allocation from previous round.

opt_nh_nonresp_oneiter returns sample allocation vector n_h, computed from a single iteration given the user-supplied zeta_h.

Details

opt_nh_nonresp() computes the optimal allocation iteratively, as follows:

Iteration $k=1$ calculates $n_h^1$, under the assumption that $\zeta_h(n_h, \bar{\phi}_h)=1$.
Each subsequent iteration $k$, for $k = 2, 3, ...,$ does the following:
- Compute $\zeta_h(n_h^{k-1}, \bar{\phi}_h)$ via calc_zeta().
- Compute $n_h^k$ under the assumption that $\zeta_h(n_h, \bar{\phi}_h) = \zeta_h(n_h^{k-1}, \bar{\phi}_h)$.
- Accept the solution if the largest component of $n_h^k - n_h^{k-1}$ has magnitude below some tolerance (tol) or if the maximum number of iterations (max_iter) has been reached. Otherwise, continue.

Functions

opt_nh_nonresp(): computes the exact (i.e., iterative) version of the proposed allocation.
opt_nh_nonresp_oneiter(): computes a single iteration of the proposed allocation for user-supplied zeta_h (used in place of the $\zeta_h(.)$ term).

References

Mendelson, J., & Elliott, M. R. (in press). Journal of Survey Statistics and Methodology.

Examples

#Compute exact optimum allocation for PEVS-ADM 2016 data set for n = 50k
#Assumes tau = 1 by default
pevs_optE_alloc_50k <- opt_nh_nonresp(N_h = pevs_adm_2016_rrs$Nhat_h,
                                      phibar_h = pevs_adm_2016_rrs$rr_h,
                                      n_total = 50000)

#Merge results into data frame
(pevs_adm_alloc_50k_merged <- pevs_adm_2016_rrs %>%
  dplyr::mutate(n_h_optE = c(pevs_optE_alloc_50k),
                zeta_h_optE = attr(pevs_optE_alloc_50k,"zeta_h_prev")))
#> # A tibble: 91 × 10
#>    h     service paygrade age   region  sex   Nhat_h   rr_h n_h_optE zeta_h_optE
#>    <chr> <fct>   <fct>    <fct> <fct>   <fct>  <dbl>  <dbl>    <dbl>       <dbl>
#>  1 1     Army    E1-E5    18-24 US      M     1.09e5 0.0189    8267.        1.01
#>  2 2     Army    E1-E5    18-24 US      F     1.89e4 0.0364    1041.        1.03
#>  3 3     Army    E1-E5    18-24 Overse… M     1.85e4 0.0214    1333.        1.04
#>  4 4     Army    E1-E5    18-24 Overse… F     3.31e3 0.0431     179.        1.17
#>  5 5     Army    E1-E5    25-29 US      M     4.83e4 0.0487    2280.        1.01
#>  6 6     Army    E1-E5    25-29 US      F     8.68e3 0.0619     370.        1.05
#>  7 7     Army    E1-E5    25-29 Overse… M/F   1.08e4 0.0566     477.        1.04
#>  8 8     Army    E1-E5    30-34 US      M/F   2.14e4 0.0891     746.        1.01
#>  9 9     Army    E1-E5    30-34 Overse… M/F   3.74e3 0.0904     135.        1.09
#> 10 10    Army    E1-E5    35+   US      M/F   9.06e3 0.188      219.        1.02
#> # ℹ 81 more rows



#For comparison purposes, compute approximate version of proposed allocation
pevs_optA_alloc_50k <- opt_nh_nonresp_oneiter(N_h = pevs_adm_2016_rrs$Nhat_h,
                                              phibar_h = pevs_adm_2016_rrs$rr_h,
                                              n_total = 50000)

#Merge results to previous tibble and reorder columns to be adjacent
pevs_adm_alloc_50k_merged %>%
  dplyr::mutate(n_h_optA = c(pevs_optA_alloc_50k)) %>%
  dplyr::relocate(zeta_h_optE, .after = "n_h_optA")
#> # A tibble: 91 × 11
#>    h     service paygrade age   region   sex    Nhat_h   rr_h n_h_optE n_h_optA
#>    <chr> <fct>   <fct>    <fct> <fct>    <fct>   <dbl>  <dbl>    <dbl>    <dbl>
#>  1 1     Army    E1-E5    18-24 US       M     109214. 0.0189    8267.    8309.
#>  2 2     Army    E1-E5    18-24 US       F      18921. 0.0364    1041.    1036.
#>  3 3     Army    E1-E5    18-24 Overseas M      18465. 0.0214    1333.    1320.
#>  4 4     Army    E1-E5    18-24 Overseas F       3312. 0.0431     179.     167.
#>  5 5     Army    E1-E5    25-29 US       M      48348. 0.0487    2280.    2289.
#>  6 6     Army    E1-E5    25-29 US       F       8680  0.0619     370.     364.
#>  7 7     Army    E1-E5    25-29 Overseas M/F    10762. 0.0566     477.     473.
#>  8 8     Army    E1-E5    30-34 US       M/F    21352. 0.0891     746.     747.
#>  9 9     Army    E1-E5    30-34 Overseas M/F     3739. 0.0904     135.     130.
#> 10 10    Army    E1-E5    35+   US       M/F     9055. 0.188      219.     218.
#> # ℹ 81 more rows
#> # ℹ 1 more variable: zeta_h_optE <dbl>