This vignette demonstrates how to use the tipse package
to perform tipping point analyses as a sensitivity
analysis for time-to-event (survival) data. Tipping point analysis is an
approach to evaluate how sensitive treatment effect estimates are to
different censoring scenarios. It aims to evaluate the robustness of
trial conclusions by varying certain data and/or model aspects while
imputing missing data. This is particularly useful in clinical trials
where dropout or loss to follow-up may bias efficacy results.
The tipse package provides a flexible framework to
impute censored observations under different approaches and identify the
tipping point where the upper confidence limit (CL) of the hazard ratio
crosses 1. It also offers visualizations and plausibility assessment
facilitate the interpretation of tipping points.
Two approaches are implemented:
Model-Free Tipping Point Analysis
Implemented via tipping_point_model_free and does not
assume a parametric survival model. Uses either resampling of observed
event times (random sampling) or deterministic imputation
of a fixed number of censored patients
(deterministic sampling).
Model-Based Tipping Point Analysis
Implemented via tipping_point_model_based and uses
parametric survival models (e.g., Weibull) to adjust the imputation of
censored observations, allowing hazard inflation or deflation across a
range of parameters.
we extracted data from the published Kaplan-Meier (KM) curve in de Langen et al. [2023] using a KM digitizer. This re-constructed dataset contains:
TRT01P: treatment arm assignmentAVAL: observed event timeEVENT: event indicator (1=event, 0=censored)CNSRRS: censoring reasonMAXAVAL: maximum potential survival time (duration
between randomization to data cut-off)| SUBJID | TRT01P | AVAL | EVENT | CNSRRS | MAXAVAL | |
|---|---|---|---|---|---|---|
| 212 | 1 | docetaxel | 1.28 | 1 | NA | 19.35 |
| 340 | 2 | docetaxel | 15.24 | 0 | Lost to follow-up | 19.35 |
| 107 | 3 | sotorasib | 5.72 | 1 | NA | 19.35 |
| 129 | 4 | sotorasib | 9.53 | 1 | NA | 19.35 |
| 128 | 5 | sotorasib | 9.33 | 1 | NA | 19.35 |
There are 21 (12.1 %) patients in the docetaxel control arm and 6 (3.5 %) patients in the sotorasib treatment arm who dropped out early on, leading to potential information censoring. This raises concerns about the validity of trial outcomes and robustness of treatment effects.
We first fit a Cox proportional hazards model, which should have the same model specification (including covariates and stratification factors) as the original analysis. This model will be used for pooling hazard ratios after multiple imputations. In this example, we do not include any covariates besides the treatment arm itself.
There are two methods available in
tipping_point_model_free.
For method = "random sampling", the key
idea is to impute all censored observations belonging to the arm
specified in impute by randomly drawing event times from
either the best or worst percentiles of the observed data, where
observations are ranked according to their event and censoring
times.
For method = "deterministic sampling",
the idea is to modify the censored times directly: - when imputing the
control arm, the censoring times of a subset or all
relevant censored observations are extended to the maximum follow-up
time; - when imputing the treatment arm, the censoring
time is instead treated as the event time.
| Steps | Random Sampling | Deterministic Sampling |
|---|---|---|
| 1. Identify censored patients for imputation | Select all censored observations in the arm specified by the
impute argument that correspond to a particular censoring
reason (e.g., early dropout). Assume n patients are identified. |
Same as left |
| 2. Define tipping parameter range | Specifies the percentiles of the observed data from which event
times will be sampled to impute censored patients. * For treatment arm, use the worst percentiles (shortest survival times) from the observed data of both arms. * For control arm, use the best percentiles (longest survival times). |
* For treatment arm, define the number of patients
that will be assumed to have an event at their time of censoring. * For control arm, define the number of patients that will be assumed to be event-free at DCO. |
| 3. Sampling / Imputation | For each value in tipping_range impute patients using
event times corresponding up to that percentile. |
For each value in tipping_range, randomly select
x patients from the n censored patients to
impute as event-free at DCO.The remaining n - x
patients are not imputed. |
| 4. Multiple imputations | Repeat step 3 J times. |
Same as left |
| 5. Model fitting and pooling | For each imputed dataset, fit the Cox proportional hazards model
specified via the cox_fit argument).Pool the results across the J replicates using Rubin’s
rules, producing a combined hazard ratio (HR) and confidence
interval for that tipping parameter. |
Same as left |
| 6. Iterate across tipping points | Repeat steps 2–5 across all values in the
tipping_range.This generates a series of pooled HRs and confidence intervals showing how the treatment effect changes as increasingly conservative imputation. |
Same as left |
| 7. Identify the tipping point | tipping point is the parameter at which the upper bound first crosses 1, indicating the loss of the apparent treatment benefit. | Same as left |
| 8. Visualize and assess | Plot pooled Kaplan-Meier curves to visually inspect how much shift
the imputation created compared to the original data, and how HR changes
over different tipping parameters using plot.Assess the plausibility of such tipping point occurs using assess_plausibility. |
Same as left |
While there is no objective measure to assess the robustness of the result, the average KM curves give a visual representation of how optimistic the assumptions of the tipping point approaches are on the control arm.
Interpretation of the curves:
summary(tp_random)
#> HR CONFINT METHOD ARMIMP TIPPT TIPUNIT
#> 1 0.7744 (0.5925-1.0122) random sampling docetaxel 45 best percentile
#> DESC
#> 1 Tipping point (upper CL ≥ 1) reached when imputing docetaxel arm at 45 best percentile.
summary(tp_deterministic)
#> HR CONFINT METHOD ARMIMP TIPPT
#> 1 0.7781 (0.5983-1.0119) deterministic sampling docetaxel 4
#> TIPUNIT
#> 1 number of subjects extended censoring
#> DESC
#> 1 Tipping point (upper CL ≥ 1) reached when imputing docetaxel arm at 4 number of subjects extended censoring.The model-free approaches suggest that the results tip if at least 4 out of 21 (19 %) of early dropout patients in the control arm are considered as event-free at the data cut-off or if we impute all of the early dropout in the arm from a sample of 45 % best event times (both arms combined).
assess_plausibility(tp_random)
#> → Clinical plausibility assessment (Oodally et al, 2025):
#>
#> At the tipping point, the median duration of follow-up in imputed set in docetaxel arm was 8.37, compared to 4.24 in the sotorasib arm.
#>
#> Please carefully assess the clinical plausibility in light of the imputation method and study context.
assess_plausibility(tp_deterministic)
#> → Clinical plausibility assessment (Oodally et al, 2025):
#>
#> At the tipping point, the median duration of follow-up in imputed set in docetaxel arm was 0.74, compared to 4.24 in the sotorasib arm.
#>
#> Please carefully assess the clinical plausibility in light of the imputation method and study context.Model-based tipping point analysis uses parametric survival models to generate event times for censored patients. This approach allows systematic hazard inflation or deflation, known as the delta-adjustment method Lipkovich et al. (2016).
Compared to the steps described in the table above for model-free tipping point analysis, only step 2 and 3 differs:
For treatment arm, define the hazard inflation \(\delta>1\) for the patients’ follow-up time \(t>c\) after censoring. For control arm, define the hazard deflation \(\delta<1\) for the patients’ follow-up time \(t>c\) after censoring. \[\hat{h}(t|t > c) = \delta\hat{h}(t)\]
Fitting parametric model
For a Weilbull model with shape parameter \(p\) and scale parameter \(\lambda\), the cumulative hazard, survival and cumulative distribution functions are \[h(t) = \lambda p (\lambda t)^{(p-1)}, H(t) = (\lambda t)^p, S(t) = \exp(-(\lambda t)^p), F(t)=1-\exp(-(\lambda t)^p)\] A Weibull model is fitted to the treatment arm patients to obtain empirical estimates of \(p\) and \(\lambda\), the goal is to impute event times for a subset of censored patients based on the Weibull model using inverse probability sampling.
Impute event time under a Weibull model
Assuming a patient is censored at time \(c\) and we wish to impute the event time for this patient. We evaluate the cumulative distribution function at the survival time \(F(c)\). Using inverse probability sampling, we generate \(u_d\) from a uniform distribution \(U(F(c), 1)\) and solve \(d\) from the equation \[u_d = F(d) = 1-\exp(-(\lambda d)^p)\] The solution \(d\) is then a imputed event time based on the cumulative distribution function \(F(d)\) from a Weibull model. For each imputation, the Weibull model parameters \(p\) and \(\lambda\) are re-sampled from a multivariate normal distribution with their maximum likelihood estimates and covariate matrix.
Impute event time with inflated hazard
In order to inflate the hazard after time \(c\), we assume a two-piece Weibull model where the original hazard is observed until time \(c\) as the first piece, and for \(t\geq c\) the hazard function \(h(t)\) is inflated via \(\alpha\) as the second piece to become \(h_i(t) = \alpha h(t)\). This inflation is equivalent to inflating on the cumulative hazard scale via \[H_i(t) = H(c) + \alpha (H(t) - H(c))\] The imputed event time \(d\) can be found by solving the cumulative distribution function \(F(d) = 1-\exp(-H_i(d))\).
tp_model_based <- tipping_point_model_based(
dat = codebreak200, reason = "Early dropout",
impute = "docetaxel",
imputation_model = "weibull",
J = 100,
tipping_range = seq(0.1, 1, by = 0.1),
cox_fit = cox1,
seed = 12345
)The model-based approach tips the results with a hazard deflation of at least 60% for imputing event times for early dropout patients in the control arm.
assess_plausibility(tp_model_based)
#> → Clinical plausibility assessment (Oodally et al, 2025):
#>
#> At the tipping point, the HR between imputed set in docetaxel arm and sotorasib arm was approximately 0.9.
#>
#> Please carefully assess the clinical plausibility in light of the imputation method and study context.To assess the clinical plausibility on the tipping point from model-based approach using the trial data, 60% hazard deflation translates to a HR of 0.4 between early dropouts in control arm and control arm patients who did not drop out. This translated to HR = 0.4 / 0.67 = 0.6 between the early dropout patients in the control arm and sotorasib arm, which seems unlikely given the limited treatment options these patients have.
The Jump-to-Reference (J2R) method assumes that, upon censoring, subjects in the treatment arm behave as if they had switched to the reference (usually control) arm. In a time-to-event framework, this corresponds to replacing the post-censoring hazard for treatment-arm dropouts with the hazard estimated from the reference arm.
Importantly, J2R is a special case of hazard inflation, where the inflation factor is chosen to exactly match the reference arm hazard. Assuming a Cox model, the hazard inflation needed to convert treatment-arm hazard into reference-arm hazard is:
\[ \text{hazard inflation} = \frac{1}{\widehat{HR}} \]
Thus, J2R can be implemented in the tipping-point framework by setting the tipping-range to \(1/HR\), meaning the hazard is inflated until treatment arm dropouts “jump” to the reference hazard.
Below we illustrate this using a Weibull imputation model and the treatment-arm dropout reason “Lost to follow-up”.
| SUBJID | TRT01P | AVAL | EVENT | CNSRRS | MAXAVAL |
|---|---|---|---|---|---|
| 1 | neratinib | 1.29 | 0 | Lost to follow-up | 54.28 |
| 2 | neratinib | 1.29 | 0 | Lost to follow-up | 54.28 |
| 3 | neratinib | 1.29 | 0 | Lost to follow-up | 54.28 |
| 4 | neratinib | 1.29 | 0 | Lost to follow-up | 54.28 |
| 5 | neratinib | 1.29 | 0 | Lost to follow-up | 54.28 |
cox2 <- coxph(Surv(AVAL, EVENT) ~ TRT01P, data = extenet)
summary_cox2 <- summary(cox2)
## Extract HR for treatment vs reference
orig_HR <- summary_cox2$coefficients[, "exp(coef)"]
orig_HR
#> [1] 0.7929426J2R implemented as hazard inflation = 1 / HR
We see the parameter tipping_range to the J2R inflation
to produce the J2R scenario,
The output below shows the HR estimated in the J2R scenario. The hazard ratio changed from 0.793 in the original analysis to 0.807 in the jump-to-reference imputation.