Example 10.8: The upper 95% CI is 3.69

Apologies for the long and unannounced break– the longest since we started blogging, three and a half years ago. I was writing a 2-day course for SAS users to learn R. Contact me if you’re interested. And Nick and I are beginning work on the second edition of our book– look for it in the fall. Please let us know if you have ideas about what we omitted last time or would otherwise like to see added. In the mean time, we’ll keep blogging, though likely at a reduced rate.



Today: what can you say about the probability of an event if the observed number of events is 0? It turns out that the upper 95% CI for the probability is 3.69/N. There’s a sweet little paper with some rationale for this, but it’s in my other office. And I couldn’t recall the precise value– so I used SAS and R to demonstrate it to myself.

R

The R code is remarkably concise. After generating some Ns, we write a little function to perform the test and extract the (exact) upper 95% confidence limit. This is facilitated by the “…” notation, which passes along unused arguments to functions. Then we use
apply()
to call the new function for each N, passing the numerator 0 each time. Note that
apply()
needs a matrix argument, so the simple vector of Ns is converted to a matrix before use. [The
sapply()
function will accept a vector input, but took about 8 times as long to run.] Finally, we plot the upper limit * N against N. showing the asymptote. A log scaled x-axis is useful here, and is achieved with the
log='x'
option. (Section 5.3.12.) the result is shown above.

bin.m = seq(10, 10000, by=5)
mybt = function(...) { binom.test(...)$conf.int[2] }
uci = apply(as.matrix(bin.m), 1, mybt, x=0)
plot(y=bin.m * uci, x=bin.m, ylim=c(0,4), type="l", 
     lwd=5, col="red", cex=5, log='x',  
     ylab="Exact upper CI", xlab="Sample size", 
     main="Upper CI when there are 0 cases observed")
abline(h=3.69)

SAS

In SAS, the data, really just the N and a numerator of 0, are generated in a
data
step. The CI are found using the
binomial
option in the
proc freq tables
statement and saved using the
output
statement. Note that the
weight
statement is used here to avoid having a row for each Bernoulli trial.

data binm;
do n = 10 to 10000 by 5;
  x=0;
  output;
  end;
run;

ods select none;
proc freq data=binm;
by n;
weight n;
tables x / binomial;
output out=bp binomial;
run;
ods select all;

To calculate the upper limit*N, another
data
step is needed– note that in this setting SAS will only produce the lower limit against the probability that all observations share the same value, thus the subtraction from 1 shown below. The log scale x-axis is obtained with the
logbase
option to the
axis
statement. (Section 5.3.12.) The result is shown below.

data uci;
set bp;
limit = (1-xl_bin) * n;
run;

axis1 order = (0 to 4 by 1);
axis2 logbase=10 logstyle=expand;
symbol1 i = j v = none c = red w=5 l=1;
proc gplot data=uci;
plot limit * n / vref=3.69 vaxis=axis1 haxis=axis2;
label n="Sample size" limit="Exact upper CI";
run;
quit;



It’s clear that the upper 95% limit on the number of successes asymptotes to about 3.69. Thus the upper limit on the binomial probability p is 3.69/N.

An unrelated note about aggregators:

稿源:SAS and R (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 综合技术 » Example 10.8: The upper 95% CI is 3.69

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录