Computes the optimal stratified sampling
allocation under anticipated nonresponse, as proposed in Mendelson & Elliott (in press),
which is
$$n_h \propto \frac{N_h S_h \sqrt{\zeta_h(n_h, \bar{\phi}_h)}}
{\sqrt{\bar{\phi}_h c_h }},$$
where \(N_h\) is the stratum \(h\) population size,
\(S_h\) is the stratum \(h\) unit standard deviation,
\(\bar{\phi}_h\) is the stratum \(h\) average response propensity,
\(\zeta_h(.)\) is a variance inflation term that captures variability
in the number of respondents (see calc_zeta()
and \(c_h = \textrm{E}(C_h) / n_h = c_{NR_h} \left(\bar{\phi}_h (\tau_h - 1) + 1\right)\)
is the expected cost per invitee in stratum \(h\).
The cost structure assumes that respondents and nonrespondents in stratum \(h\)
have per-unit costs of \(c_{R_h}\) and \(c_{NR_h}\), respectively,
with a ratio of \(\tau_h = c_{R_h}/c_{NR_h}\).
computes the exact allocation in an iterative fashion
(see Details section); individual iterations are computed using
, which conditions on some
given \(\zeta_h(.)\).
Users must specify either the total sample size (n_total
) or the
total expected costs (cost_total
), but not both.
Under the cost specification, users must also specify the unit costs
of nonrespondents (c_NR_h
) and the ratio of unit costs
for respondents to that of nonrespondents (tau_h
These variables are \(h\)-dimensional vectors, but can be supplied as
scalars if equivalent across strata.
Optionally, the user can also specify the unit population standard deviations,
, assumed constant across strata by default.
computes one iteration of the proposed
allocation for some user-supplied \(\zeta_h(.)\).
tol = 1e-08,
max_iter = 20,
verbose_flag = FALSE
S_h = NULL,
n_total = NULL,
cost_total = NULL,
zeta_h = NULL,
c_NR_h = NULL,
tau_h = NULL,
strict_flag = TRUE,
verbose_flag = FALSE
- N_h
(vector) strata population counts (\(N_h\))
- phibar_h
(vector) strata response propensities (\(\bar{\phi}_h\))
- ...
Arguments passed on to
- tol
(scalar) tolerance (for stopping)
- max_iter
(scalar) maximum number of iterations (>=2)
- verbose_flag
(boolean) whether to provide detailed results
- S_h
(vector) strata population standard deviations (\(S_h\)); constant, by default
- n_total
(scalar) total sample to allocate
- cost_total
(scalar) total expected costs to allocate
- zeta_h
(vector; use with
, only; optional) adjustment factor to reflect inflation in variances from randomness in the number of respondents (default = 1)- c_NR_h
(vector; use with cost_total) per-unit costs for nonrespondents in stratum h (\(c_{NR_h}\))
- tau_h
(vector; use with cost_total) ratio of costs for respondents to costs for nonrespondents in stratum h (\(\tau_h\))
- strict_flag
(boolean) whether to throw error (versus warning) if any \(n_h > N_h\)
returns sample allocation
vector n_h
with the following attributes:
(scalar) number of iterations used;zeta_h_prev
(vector) final values ofzeta_h
used (i.e., from 2nd-to-last iteration); andmax_nh_delta
(scalar) biggest change in stratum allocation from previous round.
returns sample allocation vector
, computed from a single iteration given
the user-supplied zeta_h
computes the optimal allocation iteratively,
as follows:
Iteration \(k=1\) calculates \(n_h^1\), under the assumption that \(\zeta_h(n_h, \bar{\phi}_h)=1\).
Each subsequent iteration \(k\), for \(k = 2, 3, ...,\) does the following:
Compute \(\zeta_h(n_h^{k-1}, \bar{\phi}_h)\) via
.Compute \(n_h^k\) under the assumption that \(\zeta_h(n_h, \bar{\phi}_h) = \zeta_h(n_h^{k-1}, \bar{\phi}_h)\).
Accept the solution if the largest component of \(n_h^k - n_h^{k-1}\) has magnitude below some tolerance (
) or if the maximum number of iterations (max_iter
) has been reached. Otherwise, continue.
: computes the exact (i.e., iterative) version of the proposed allocation.opt_nh_nonresp_oneiter()
: computes a single iteration of the proposed allocation for user-suppliedzeta_h
(used in place of the \(\zeta_h(.)\) term).
Mendelson, J., & Elliott, M. R. (in press). Journal of Survey Statistics and Methodology.
#Compute exact optimum allocation for PEVS-ADM 2016 data set for n = 50k
#Assumes tau = 1 by default
pevs_optE_alloc_50k <- opt_nh_nonresp(N_h = pevs_adm_2016_rrs$Nhat_h,
phibar_h = pevs_adm_2016_rrs$rr_h,
n_total = 50000)
#Merge results into data frame
(pevs_adm_alloc_50k_merged <- pevs_adm_2016_rrs %>%
dplyr::mutate(n_h_optE = c(pevs_optE_alloc_50k),
zeta_h_optE = attr(pevs_optE_alloc_50k,"zeta_h_prev")))
#> # A tibble: 91 × 10
#> h service paygrade age region sex Nhat_h rr_h n_h_optE zeta_h_optE
#> <chr> <fct> <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Army E1-E5 18-24 US M 1.09e5 0.0189 8267. 1.01
#> 2 2 Army E1-E5 18-24 US F 1.89e4 0.0364 1041. 1.03
#> 3 3 Army E1-E5 18-24 Overse… M 1.85e4 0.0214 1333. 1.04
#> 4 4 Army E1-E5 18-24 Overse… F 3.31e3 0.0431 179. 1.17
#> 5 5 Army E1-E5 25-29 US M 4.83e4 0.0487 2280. 1.01
#> 6 6 Army E1-E5 25-29 US F 8.68e3 0.0619 370. 1.05
#> 7 7 Army E1-E5 25-29 Overse… M/F 1.08e4 0.0566 477. 1.04
#> 8 8 Army E1-E5 30-34 US M/F 2.14e4 0.0891 746. 1.01
#> 9 9 Army E1-E5 30-34 Overse… M/F 3.74e3 0.0904 135. 1.09
#> 10 10 Army E1-E5 35+ US M/F 9.06e3 0.188 219. 1.02
#> # ℹ 81 more rows
#For comparison purposes, compute approximate version of proposed allocation
pevs_optA_alloc_50k <- opt_nh_nonresp_oneiter(N_h = pevs_adm_2016_rrs$Nhat_h,
phibar_h = pevs_adm_2016_rrs$rr_h,
n_total = 50000)
#Merge results to previous tibble and reorder columns to be adjacent
pevs_adm_alloc_50k_merged %>%
dplyr::mutate(n_h_optA = c(pevs_optA_alloc_50k)) %>%
dplyr::relocate(zeta_h_optE, .after = "n_h_optA")
#> # A tibble: 91 × 11
#> h service paygrade age region sex Nhat_h rr_h n_h_optE n_h_optA
#> <chr> <fct> <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Army E1-E5 18-24 US M 109214. 0.0189 8267. 8309.
#> 2 2 Army E1-E5 18-24 US F 18921. 0.0364 1041. 1036.
#> 3 3 Army E1-E5 18-24 Overseas M 18465. 0.0214 1333. 1320.
#> 4 4 Army E1-E5 18-24 Overseas F 3312. 0.0431 179. 167.
#> 5 5 Army E1-E5 25-29 US M 48348. 0.0487 2280. 2289.
#> 6 6 Army E1-E5 25-29 US F 8680 0.0619 370. 364.
#> 7 7 Army E1-E5 25-29 Overseas M/F 10762. 0.0566 477. 473.
#> 8 8 Army E1-E5 30-34 US M/F 21352. 0.0891 746. 747.
#> 9 9 Army E1-E5 30-34 Overseas M/F 3739. 0.0904 135. 130.
#> 10 10 Army E1-E5 35+ US M/F 9055. 0.188 219. 218.
#> # ℹ 81 more rows
#> # ℹ 1 more variable: zeta_h_optE <dbl>