Integer sample size and event counts

Keaven Anderson

Introduction

The gsDesign package was originally designed to have continuous sample size planned rather than integer-based sample size. Designs with time-to-event outcomes also had non-integer event counts at times of analysis. This vignette documents the capability to convert to integer sample sizes and event counts. This has a couple of implications on design characteristics:

This document goes through examples to demonstrate the calculations. The new function as of July, 2023 is the toInteger() which operates on group sequential designs to convert to integer-based total sample size and event counts at analyses. We begin with an abbreviated example for a time-to-event endpoint design to demonstrate basic concepts. We follow with a more extended example for a binary endpoint to explain more details.

Time-to-event endpoint example

The initial design for a time-to-event endpoint in a 2-arm trial does not have integer sample size and event counts. See comments in the code and output from the summary() function below to understand inputs.

library(gsDesign)

x <- gsSurv(
  k = 3, # Number of analyses
  test.type = 4, # Asymmetric 2-sided design with non-binding futility bound
  alpha = 0.025, # 1-sided Type I error
  beta = 0.1, # Type II error (1 - power; 90% power)
  timing = c(.25, .7), # Fraction of final planned events at interim analyses
  sfu = sfLDOF, # O'Brien-Fleming-like spending for efficacy
  sfl = sfHSD, # Hwang-Shih-DeCani spending for futility
  sflpar = -2.2, # Futility spending parameter to customize bound
  lambdaC = log(2) / 12, # 12 month median control survival
  hr = 0.75, # Alternate hypothesis hazard ratio
  eta = -log(.98) / 12, # 2% dropout rate per year
  # Enrollment accelerates over 6 months to steady state
  gamma = c(2.5, 5, 7.5, 10), # Relative enrollment rates
  # Duration of relative enrollment rate
  R = c(2, 2, 2, 100),
  # Enrollment duration targeted to T - minfup = 12 months total
  T = 36, # Trial duration
  minfup = 24, # Minimum follow-up duration
  ratio = 1 # Randomization ratio is 1:1
)

We can summarize this textually as:

cat(summary(x))

Asymmetric two-sided group sequential design with non-binding futility bound, 3 analyses, time-to-event outcome with sample size 726 and 540 events required, 90 percent power, 2.5 percent (1-sided) Type I error to detect a hazard ratio of 0.75. Enrollment and total study durations are assumed to be 12 and 36 months, respectively. Efficacy bounds derived using a Lan-DeMets O’Brien-Fleming approximation spending function with none = 1. Futility bounds derived using a Hwang-Shih-DeCani spending function with gamma = -2.2.

We now adapt this design to integer targeted events at each analysis as well as an sample size per arm at the end of the trial. We provide a table summarizing bounds. Due to rounding up of the final event count, we see slightly larger than the targeted 90% trial power in the last row of the efficacy column.

# Adjust design to integer-based event counts at analyses
# and even integer-based final event count
xi <- toInteger(x)
gsBoundSummary(xi) # Summarize design bounds
##     Analysis               Value Efficacy Futility
##    IA 1: 25%                   Z   4.3326  -0.6868
##       N: 690         p (1-sided)   0.0000   0.7539
##  Events: 135        ~HR at bound   0.4744   1.1255
##    Month: 12    P(Cross) if HR=1   0.0000   0.2461
##              P(Cross) if HR=0.75   0.0039   0.0091
##    IA 2: 70%                   Z   2.4381   1.0548
##       N: 726         p (1-sided)   0.0074   0.1458
##  Events: 378        ~HR at bound   0.7782   0.8972
##    Month: 22    P(Cross) if HR=1   0.0074   0.8580
##              P(Cross) if HR=0.75   0.6406   0.0457
##        Final                   Z   1.9999   1.9999
##       N: 726         p (1-sided)   0.0228   0.0228
##  Events: 540        ~HR at bound   0.8419   0.8419
##    Month: 36    P(Cross) if HR=1   0.0233   0.9767
##              P(Cross) if HR=0.75   0.9002   0.0998

We now summarize sample size and targeted events at analyses.

# Integer event counts at analyses are integer
xi$n.I
## [1] 135 378 540
# Control planned sample size at analyses
# Final analysis is integer; interim analyses before enrollment completion
# are continuous
xi$eNC
##          [,1]
## [1,] 344.4354
## [2,] 363.0000
## [3,] 363.0000
# Experimental analysis planned sample size at analyses
xi$eNE
##          [,1]
## [1,] 344.4354
## [2,] 363.0000
## [3,] 363.0000

Binomial endpoint designs

Fixed sample size

We present a simple example based on comparing binomial rates with interim analyses after 50% and 75% of events. We assume a 2:1 experimental:control randomization ratio. Note that the sample size is not an integer.

n.fix <- nBinomial(p1 = .2, p2 = .1, alpha = .025, beta = .2, ratio = 2)
n.fix
## [1] 429.8846

If we replace the beta argument above with a integer sample size that is a multiple of 3 so that we get the desired 2:1 integer sample sizes per arm (432 = 144 control + 288 experimental targeted) we get slightly larger thant the targeted 80% power:

nBinomial(p1 = .2, p2 = .1, alpha = .025, n = 432, ratio = 2)
## [1] 0.801814

1-sided design

Now we convert the fixed sample size n.fix from above to a 1-sided group sequential design with interims after 50% and 75% of observations. Again, sample size at each analysis is not an integer. We use the Lan-DeMets spending function approximating an O’Brien-Fleming efficacy bound.

# 1-sided design (efficacy bound only; test.type = 1)
x <- gsDesign(alpha = .025, beta = .2, n.fix = n.fix, test.type = 1, sfu = sfLDOF, timing = c(.5, .75))
# Continuous sample size (non-integer) at planned analyses
x$n.I
## [1] 219.1621 328.7432 438.3243

Next we convert to integer sample sizes at each analysis. Interim sample sizes are rounded to the nearest integer. The default roundUpFinal = TRUE rounds the final sample size to the nearest integer to 1 + the experimental:control randomization ratio. Thus, the final sample size of 441 below is a multiple of 3.

# Convert to integer sample size with even multiple of ratio + 1
# i.e., multiple of 3 in this case at final analysis
x_integer <- toInteger(x, ratio = 2)
x_integer$n.I
## [1] 219 329 441

Next we examine the efficacy bound of the 2 designs as they are slightly different.

# Bound for continuous sample size design
x$upper$bound
## [1] 2.962588 2.359018 2.014084
# Bound for integer sample size design
x_integer$upper$bound
## [1] 2.974067 2.366106 2.012987

The differences are associated with slightly different timing of the analyses associated with the different sample sizes noted above:

# Continuous design sample size fractions at analyses
x$timing
## [1] 0.50 0.75 1.00
# Integer design sample size fractions at analyses
x_integer$timing
## [1] 0.4965986 0.7460317 1.0000000

These differences also make a difference in the cumulative Type I error associated with each analysis as shown below.

# Continuous sample size design
cumsum(x$upper$prob[, 1])
## [1] 0.001525323 0.009649325 0.025000000
# Specified spending based on the spending function
x$upper$sf(alpha = x$alpha, t = x$timing, x$upper$param)$spend
## [1] 0.001525323 0.009649325 0.025000000
# Integer sample size design
cumsum(x_integer$upper$prob[, 1])
## [1] 0.001469404 0.009458454 0.025000000
# Specified spending based on the spending function
# Slightly different from continuous design due to slightly different information fraction
x$upper$sf(alpha = x_integer$alpha, t = x_integer$timing, x_integer$upper$param)$spend
## [1] 0.001469404 0.009458454 0.025000000

Finally, we look at cumulative boundary crossing probabilities under the alternate hypothesis for each design. Due to rounding up the final sample size, the integer-based design has slightly higher total power than the specified 80% (Type II error beta = 0.2.). Interim power is slightly lower for the integer-based design since sample size is rounded to the nearest integer rather than rounded up as at the final analysis.

# Cumulative upper boundary crossing probability under alternate by analysis
# under alternate hypothesis for continuous sample size
cumsum(x$upper$prob[, 2])
## [1] 0.1679704 0.5399906 0.8000000
# Same for integer sample sizes at each analysis
cumsum(x_integer$upper$prob[, 2])
## [1] 0.1649201 0.5374791 0.8025140

Non-binding design

The default test.type = 4 has a non-binding futility bound. We examine behavior of this design next. The futility bound is moderately aggressive and, thus, there is a compensatory increase in sample size to retain power. The parameter delta1 is the natural parameter denoting the difference in response (or failure) rates of 0.2 vs. 0.1 that was specified in the call to nBinomial() above.

# 2-sided asymmetric design with non-binding futility bound (test.type = 4)
xnb <- gsDesign(
  alpha = .025, beta = .2, n.fix = n.fix, test.type = 4,
  sfu = sfLDOF, sfl = sfHSD, sflpar = -2,
  timing = c(.5, .75), delta1 = .1
)
# Continuous sample size for non-binding design
xnb$n.I
## [1] 231.9610 347.9415 463.9219

As before, we convert to integer sample sizes at each analysis and see the slight deviations from the interim timing of 0.5 and 0.75.

xnbi <- toInteger(xnb, ratio = 2)
# Integer design sample size at each analysis
xnbi$n.I
## [1] 232 348 465
# Information fraction based on integer sample sizes
xnbi$timing
## [1] 0.4989247 0.7483871 1.0000000

These differences also make a difference in the Type I error associated with each analysis

# Type I error, continuous design
cumsum(xnb$upper$prob[, 1])
## [1] 0.001525323 0.009630324 0.023013764
# Type I error, integer design
cumsum(xnbi$upper$prob[, 1])
## [1] 0.001507499 0.009553042 0.022999870

The Type I error ignoring the futility bounds just shown does not use the full targeted 0.025 as the calculations assume the trial stops for futility if an interim futility bound is crossed. The non-binding Type I error assuming the trial does not stop for futility is:

# Type I error for integer design ignoring futility bound
cumsum(xnbi$falseposnb)
## [1] 0.001507499 0.009571518 0.025000000

Finally, we look at cumulative lower boundary crossing probabilities under the alternate hypothesis for the integer-based design and compare to the planned \(\beta\)-spending. We note that the final Type II error spending is slightly lower than the targeted 0.2 due to rounding up the final sample size.

# Actual cumulative beta spent at each analysis
cumsum(xnbi$lower$prob[, 2])
## [1] 0.05360549 0.10853733 0.19921266
# Spending function target is the same at interims, but larger at final
xnbi$lower$sf(alpha = xnbi$beta, t = xnbi$n.I / max(xnbi$n.I), param = xnbi$lower$param)$spend
## [1] 0.05360549 0.10853733 0.20000000

The \(\beta\)-spending lower than 0.2 in the first row above is due to the final sample size powering the trial to greater than 0.8 as seen below.

# beta-spending
sum(xnbi$upper$prob[, 2])
## [1] 0.8007874

References