Skip to contents

Computes the optimal stratified sampling allocation under anticipated nonresponse, as proposed in Mendelson & Elliott (in press), which is $$n_h \propto \frac{N_h S_h \sqrt{\zeta_h(n_h, \bar{\phi}_h)}} {\sqrt{\bar{\phi}_h c_h }},$$ where \(N_h\) is the stratum \(h\) population size, \(S_h\) is the stratum \(h\) unit standard deviation, \(\bar{\phi}_h\) is the stratum \(h\) average response propensity, \(\zeta_h(.)\) is a variance inflation term that captures variability in the number of respondents (see calc_zeta()), and \(c_h = \textrm{E}(C_h) / n_h = c_{NR_h} \left(\bar{\phi}_h (\tau_h - 1) + 1\right)\) is the expected cost per invitee in stratum \(h\). The cost structure assumes that respondents and nonrespondents in stratum \(h\) have per-unit costs of \(c_{R_h}\) and \(c_{NR_h}\), respectively, with a ratio of \(\tau_h = c_{R_h}/c_{NR_h}\).

opt_nh_nonresp() computes the exact allocation in an iterative fashion (see Details section); individual iterations are computed using opt_nh_nonresp_oneiter(), which conditions on some given \(\zeta_h(.)\).

Users must specify either the total sample size (n_total) or the total expected costs (cost_total), but not both. Under the cost specification, users must also specify the unit costs of nonrespondents (c_NR_h) and the ratio of unit costs for respondents to that of nonrespondents (tau_h). These variables are \(h\)-dimensional vectors, but can be supplied as scalars if equivalent across strata.

Optionally, the user can also specify the unit population standard deviations, S_h, assumed constant across strata by default.

opt_nh_nonresp_oneiter() computes one iteration of the proposed allocation for some user-supplied \(\zeta_h(.)\).

Usage

opt_nh_nonresp(
  N_h,
  phibar_h,
  ...,
  tol = 1e-08,
  max_iter = 20,
  verbose_flag = FALSE
)

opt_nh_nonresp_oneiter(
  N_h,
  phibar_h,
  S_h = NULL,
  n_total = NULL,
  cost_total = NULL,
  zeta_h = NULL,
  c_NR_h = NULL,
  tau_h = NULL,
  strict_flag = TRUE,
  verbose_flag = FALSE
)

Arguments

N_h

(vector) strata population counts (\(N_h\))

phibar_h

(vector) strata response propensities (\(\bar{\phi}_h\))

...

Arguments passed on to opt_nh_nonresp_oneiter

tol

(scalar) tolerance (for stopping)

max_iter

(scalar) maximum number of iterations (>=2)

verbose_flag

(boolean) whether to provide detailed results

S_h

(vector) strata population standard deviations (\(S_h\)); constant, by default

n_total

(scalar) total sample to allocate

cost_total

(scalar) total expected costs to allocate

zeta_h

(vector; use with opt_nh_nonresp_oneiter, only; optional) adjustment factor to reflect inflation in variances from randomness in the number of respondents (default = 1)

c_NR_h

(vector; use with cost_total) per-unit costs for nonrespondents in stratum h (\(c_{NR_h}\))

tau_h

(vector; use with cost_total) ratio of costs for respondents to costs for nonrespondents in stratum h (\(\tau_h\))

strict_flag

(boolean) whether to throw error (versus warning) if any \(n_h > N_h\)

Value

opt_nh_nonresp() returns sample allocation vector n_h with the following attributes:

  • num_iter (scalar) number of iterations used;

  • zeta_h_prev (vector) final values of zeta_h used (i.e., from 2nd-to-last iteration); and

  • max_nh_delta (scalar) biggest change in stratum allocation from previous round.

opt_nh_nonresp_oneiter returns sample allocation vector n_h, computed from a single iteration given the user-supplied zeta_h.

Details

opt_nh_nonresp() computes the optimal allocation iteratively, as follows:

  1. Iteration \(k=1\) calculates \(n_h^1\), under the assumption that \(\zeta_h(n_h, \bar{\phi}_h)=1\).

  2. Each subsequent iteration \(k\), for \(k = 2, 3, ...,\) does the following:

    • Compute \(\zeta_h(n_h^{k-1}, \bar{\phi}_h)\) via calc_zeta().

    • Compute \(n_h^k\) under the assumption that \(\zeta_h(n_h, \bar{\phi}_h) = \zeta_h(n_h^{k-1}, \bar{\phi}_h)\).

    • Accept the solution if the largest component of \(n_h^k - n_h^{k-1}\) has magnitude below some tolerance (tol) or if the maximum number of iterations (max_iter) has been reached. Otherwise, continue.

Functions

  • opt_nh_nonresp(): computes the exact (i.e., iterative) version of the proposed allocation.

  • opt_nh_nonresp_oneiter(): computes a single iteration of the proposed allocation for user-supplied zeta_h (used in place of the \(\zeta_h(.)\) term).

References

Mendelson, J., & Elliott, M. R. (in press). Journal of Survey Statistics and Methodology.

Examples

#Compute exact optimum allocation for PEVS-ADM 2016 data set for n = 50k
#Assumes tau = 1 by default
pevs_optE_alloc_50k <- opt_nh_nonresp(N_h = pevs_adm_2016_rrs$Nhat_h,
                                      phibar_h = pevs_adm_2016_rrs$rr_h,
                                      n_total = 50000)

#Merge results into data frame
(pevs_adm_alloc_50k_merged <- pevs_adm_2016_rrs %>%
  dplyr::mutate(n_h_optE = c(pevs_optE_alloc_50k),
                zeta_h_optE = attr(pevs_optE_alloc_50k,"zeta_h_prev")))
#> # A tibble: 91 × 10
#>    h     service paygrade age   region  sex   Nhat_h   rr_h n_h_optE zeta_h_optE
#>    <chr> <fct>   <fct>    <fct> <fct>   <fct>  <dbl>  <dbl>    <dbl>       <dbl>
#>  1 1     Army    E1-E5    18-24 US      M     1.09e5 0.0189    8267.        1.01
#>  2 2     Army    E1-E5    18-24 US      F     1.89e4 0.0364    1041.        1.03
#>  3 3     Army    E1-E5    18-24 Overse… M     1.85e4 0.0214    1333.        1.04
#>  4 4     Army    E1-E5    18-24 Overse… F     3.31e3 0.0431     179.        1.17
#>  5 5     Army    E1-E5    25-29 US      M     4.83e4 0.0487    2280.        1.01
#>  6 6     Army    E1-E5    25-29 US      F     8.68e3 0.0619     370.        1.05
#>  7 7     Army    E1-E5    25-29 Overse… M/F   1.08e4 0.0566     477.        1.04
#>  8 8     Army    E1-E5    30-34 US      M/F   2.14e4 0.0891     746.        1.01
#>  9 9     Army    E1-E5    30-34 Overse… M/F   3.74e3 0.0904     135.        1.09
#> 10 10    Army    E1-E5    35+   US      M/F   9.06e3 0.188      219.        1.02
#> # ℹ 81 more rows



#For comparison purposes, compute approximate version of proposed allocation
pevs_optA_alloc_50k <- opt_nh_nonresp_oneiter(N_h = pevs_adm_2016_rrs$Nhat_h,
                                              phibar_h = pevs_adm_2016_rrs$rr_h,
                                              n_total = 50000)

#Merge results to previous tibble and reorder columns to be adjacent
pevs_adm_alloc_50k_merged %>%
  dplyr::mutate(n_h_optA = c(pevs_optA_alloc_50k)) %>%
  dplyr::relocate(zeta_h_optE, .after = "n_h_optA")
#> # A tibble: 91 × 11
#>    h     service paygrade age   region   sex    Nhat_h   rr_h n_h_optE n_h_optA
#>    <chr> <fct>   <fct>    <fct> <fct>    <fct>   <dbl>  <dbl>    <dbl>    <dbl>
#>  1 1     Army    E1-E5    18-24 US       M     109214. 0.0189    8267.    8309.
#>  2 2     Army    E1-E5    18-24 US       F      18921. 0.0364    1041.    1036.
#>  3 3     Army    E1-E5    18-24 Overseas M      18465. 0.0214    1333.    1320.
#>  4 4     Army    E1-E5    18-24 Overseas F       3312. 0.0431     179.     167.
#>  5 5     Army    E1-E5    25-29 US       M      48348. 0.0487    2280.    2289.
#>  6 6     Army    E1-E5    25-29 US       F       8680  0.0619     370.     364.
#>  7 7     Army    E1-E5    25-29 Overseas M/F    10762. 0.0566     477.     473.
#>  8 8     Army    E1-E5    30-34 US       M/F    21352. 0.0891     746.     747.
#>  9 9     Army    E1-E5    30-34 Overseas M/F     3739. 0.0904     135.     130.
#> 10 10    Army    E1-E5    35+   US       M/F     9055. 0.188      219.     218.
#> # ℹ 81 more rows
#> # ℹ 1 more variable: zeta_h_optE <dbl>