Overview
The example below shows how to use the samplingNR
package to allocate a sample of 50,000 invitees, using methods described
in our paper (Mendelson & Elliott, in press) when assuming
equivalent strata variances across strata (i.e., \(S_h = S\) for some constant \(S\)). The example shows how to replicate
our paper’s Table 2, which compares the proposed allocation with common
alternatives in an application to a survey of military personnel.
The stratification, population counts, and response rate information
used in this example are included in our package as the data set
pevs_adm_2016_rrs
, structured as a tibble. These data are based on
secondary analysis of public use data for the 2016 Post-Election Voting
Survey of Active Duty Military (PEVS-ADM; Federal Assistance Voting
Program, 2017a, 2017b), as described in our paper.
Compute allocations
Exact optimal allocation (iterative)
The function opt_nh_nonresp()
calculates the exact
(i.e., iteratively computed) version of the optimal allocation under
nonresponse, as provided in our paper and summarized in the function
documentation. We assume that stratification follows that of
pevs_adm_2016_rrs
, and specify the strata population sizes
N_h
and anticipated response propensities
phibar_h
. We have omitted the optional argument
S_h
, leading to use of the default assumption that \(S_h\) is constant across strata. Note that
by fixing the total number of invitees (n_total
), we are
implicitly assuming use of the constant cost per invitee
scenario from our paper (i.e., per unit costs are assumed constant
across strata and do not vary by response status).
library(samplingNR)
library(magrittr)
library(dplyr)
#Compute exact optimum allocation for PEVS-ADM 2016 data set for n = 50k
#Assumes tau = 1 by default
pevs_optE_alloc_50k <- opt_nh_nonresp(N_h = pevs_adm_2016_rrs$Nhat_h,
phibar_h = pevs_adm_2016_rrs$rr_h,
n_total = 50000)
#Merge results into original data frame
pevs_adm_alloc_50k_merged <- pevs_adm_2016_rrs %>%
dplyr::mutate(n_h_optE = c(pevs_optE_alloc_50k),
zeta_h_optE = attr(pevs_optE_alloc_50k,"zeta_h_prev"))
#View results
pevs_adm_alloc_50k_merged
#> # A tibble: 91 × 10
#> h service paygrade age region sex Nhat_h rr_h n_h_optE zeta_h_optE
#> <chr> <fct> <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Army E1-E5 18-24 US M 1.09e5 0.0189 8267. 1.01
#> 2 2 Army E1-E5 18-24 US F 1.89e4 0.0364 1041. 1.03
#> 3 3 Army E1-E5 18-24 Overse… M 1.85e4 0.0214 1333. 1.04
#> 4 4 Army E1-E5 18-24 Overse… F 3.31e3 0.0431 179. 1.17
#> 5 5 Army E1-E5 25-29 US M 4.83e4 0.0487 2280. 1.01
#> 6 6 Army E1-E5 25-29 US F 8.68e3 0.0619 370. 1.05
#> 7 7 Army E1-E5 25-29 Overse… M/F 1.08e4 0.0566 477. 1.04
#> 8 8 Army E1-E5 30-34 US M/F 2.14e4 0.0891 746. 1.01
#> 9 9 Army E1-E5 30-34 Overse… M/F 3.74e3 0.0904 135. 1.09
#> 10 10 Army E1-E5 35+ US M/F 9.06e3 0.188 219. 1.02
#> # ℹ 81 more rows
Approximately optimal allocation
The approximate version of this allocation, which assumes that \(\zeta_h(n_h,\bar{\phi}_h) = 1\), can be
computed using the function opt_nh_nonresp_oneiter()
.
pevs_optA_alloc_50k <- opt_nh_nonresp_oneiter(N_h = pevs_adm_2016_rrs$Nhat_h,
phibar_h = pevs_adm_2016_rrs$rr_h,
n_total = 50000)
Neyman-style allocations for comparison
For comparison, we also compute Neyman allocation of invitees (\(n_h^{\textrm{Ninv}}\propto N_h S_h\)), which does not adjust for differential nonresponse, and Neyman allocation of expected respondents (\(n_h^{\textrm{Nresp}}\propto N_h S_h / \bar{\phi}_h\)), which adjusts by the inverse of the anticipated response rates. In each case, we ignore the \(S_h\) terms in our applications of these equations, for purposes of this example, given our earlier assumption of equivalent strata variances.
nh_Ninv_propto <- pevs_adm_2016_rrs$Nhat_h
nh_Ninv <- nh_Ninv_propto / sum(nh_Ninv_propto) * 50000
nh_Nresp_propto <- pevs_adm_2016_rrs$Nhat_h / pevs_adm_2016_rrs$rr_h
nh_Nresp <- nh_Nresp_propto / sum(nh_Nresp_propto) * 50000
#Make sure allocations are valid
all(nh_Ninv >= 2 & nh_Ninv <= pevs_adm_2016_rrs$Nhat_h)
#> [1] TRUE
all(nh_Nresp >= 2 & nh_Nresp <= pevs_adm_2016_rrs$Nhat_h)
#> [1] TRUE
View allocations
The resulting allocations can be seen to match those shown in our paper’s Appendix.
(pevs_adm_allocs_50k <-
pevs_adm_alloc_50k_merged %>%
mutate(n_h_optA = pevs_optA_alloc_50k,
n_h_Ninv = nh_Ninv,
n_h_Nresp = nh_Nresp) %>%
select(c("h", "n_h_Ninv", "n_h_Nresp", "n_h_optA", "n_h_optE", "zeta_h_optE")))
#> # A tibble: 91 × 6
#> h n_h_Ninv n_h_Nresp n_h_optA n_h_optE zeta_h_optE
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 4269. 12975. 8309. 8267. 1.01
#> 2 2 740. 1164. 1036. 1041. 1.03
#> 3 3 722. 1935. 1320. 1333. 1.04
#> 4 4 129. 172. 167. 179. 1.17
#> 5 5 1890. 2225. 2289. 2280. 1.01
#> 6 6 339. 314. 364. 370. 1.05
#> 7 7 421. 426. 473. 477. 1.04
#> 8 8 835. 537. 747. 746. 1.01
#> 9 9 146. 92.6 130. 135. 1.09
#> 10 10 354. 108. 218. 219. 1.02
#> # ℹ 81 more rows
Evaluate performance
Further, we can roughly examine performance by computing the number of respondents, the design effect from weighting (\(deff_w\); Kish, 1992), and the effective number of respondents under each allocation, replicating Table 2 of our paper. The results show that under the given assumptions, the proposed allocation has an effective sample size that is 25% higher than the Neyman-style allocations.
Code
#Calculate Kish's deff_w for poststratified estimator under nonresponse
calculate_deff_w <- function (N_h, r_h) {
stopifnot(length(N_h) == length(r_h))
stopifnot(all(r_h >= 1))
stopifnot(all(r_h <= N_h))
testthat::expect_equal(all(r_h <= N_h), TRUE)
w_h <- N_h/r_h
wbar <- sum(w_h * r_h)/sum(r_h)
deff_w <- 1 + 1/sum(r_h) * sum((w_h - wbar)^2 * r_h)/wbar^2
(deff_w)
}
#Summarize results from allocation under assumption that r_h = n_h * phibar_h
summarize_alloc <- function(N_h, phibar_h, n_h) {
n_total <- sum(n_h)
r_total <- sum(phibar_h * n_h)
rr <- r_total / n_total
deff_w <- calculate_deff_w(N_h = N_h, r_h = n_h * phibar_h)
eff_resp <- r_total / deff_w
c(n_total, r_total, rr, deff_w, eff_resp)
}
#Summarize results of each allocation
pevs_adm_alloc_summary_list <-
lapply(2:5,
function(col) summarize_alloc(N_h = pevs_adm_2016_rrs$Nhat_h,
phibar_h = pevs_adm_2016_rrs$rr_h,
n_h = unname(unlist(pevs_adm_allocs_50k[,col]))))
names(pevs_adm_alloc_summary_list) <- names(pevs_adm_allocs_50k[,2:5])
pevs_adm_alloc_summary_table <- pevs_adm_alloc_summary_list %>%
bind_rows() %>%
t()
colnames(pevs_adm_alloc_summary_table) <- c("n_total", "r_total", "rr",
"deff_w", "eff_resp")
Results
(pevs_adm_alloc_summary_tibble <-
tibble::as_tibble(pevs_adm_alloc_summary_table, rownames = "alloc"))
#> # A tibble: 4 × 6
#> alloc n_total r_total rr deff_w eff_resp
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 n_h_Ninv 50000 6794. 0.136 2.37 2865.
#> 2 n_h_Nresp 50000 2865. 0.0573 1 2865.
#> 3 n_h_optA 50000 4494. 0.0899 1.26 3570.
#> 4 n_h_optE 50000 4496. 0.0899 1.26 3570.
#View the percent increase in effective sample size from optE relative to Ninv
# and Nresp.
eff_resp_vec <- pevs_adm_alloc_summary_tibble$eff_resp
eff_resp_vec[4]/eff_resp_vec[1:2] - 1
#> [1] 0.2461351 0.2461351
References
Federal Voting Assistance Program. (2017a). 2015-2016 Active Duty Military dataset. Federal Voting Assistance Program, U.S. Department of Defense.
Federal Voting Assistance Program (2017b). Post-Election Voting Surveys: Active Duty Military: Technical report 2016. Technical report, Federal Voting Assistance Program, U.S. Department of Defense.
Kish, L. (1992). Weighting for unequal Pi. Journal of Official Statistics, 8(2), 183–200.
Mendelson, J., & Elliott, M. R. (in press). Optimal allocation under anticipated nonresponse. Journal of Survey Statistics and Methodology.