The clinpubr package provides a comprehensive suite for
regression analysis in clinical research, supporting Cox proportional
hazards, logistic, and linear regression with publication-ready
visualizations. This vignette covers the typical analysis workflow:
The clinpubr package supports frequently used regression
models: Cox Proportional Hazards, Logistic
Regression, Linear Regression, and provides
unified interface for them. The model used would be infered based on the
settings of the y and time arguments. If
y is binary and time is given, the model would
be Cox Proportional Hazards; if y is a binary variable, and
time not given, the model would be Logistic Regression; if
y is a continuous variable, the model would be Linear
Regression.
We’ll use the NCCTG Lung Cancer dataset (228 patients with advanced lung cancer):
data(cancer, package = "survival")
cancer$dead <- cancer$status == 2
cancer_clean <- cancer %>%
mutate(
sex = factor(sex, labels = c("Male", "Female"))
) %>%
na.omit()
knitr::kable(data.frame(
Rows = nrow(cancer_clean),
Columns = ncol(cancer_clean),
Description = "Complete cases for analysis"
), caption = "Analysis Dataset Summary")| Rows | Columns | Description |
|---|---|---|
| 167 | 11 | Complete cases for analysis |
| inst | time | status | age | sex | ph.ecog | ph.karno | pat.karno | meal.cal | wt.loss | dead | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 3 | 455 | 2 | 68 | Male | 0 | 90 | 90 | 1225 | 15 | TRUE |
| 4 | 5 | 210 | 2 | 57 | Male | 1 | 90 | 60 | 1150 | 11 | TRUE |
| 6 | 12 | 1022 | 1 | 74 | Male | 1 | 50 | 80 | 513 | 0 | FALSE |
| 7 | 7 | 310 | 2 | 68 | Female | 2 | 70 | 60 | 384 | 10 | TRUE |
| 8 | 11 | 361 | 2 | 71 | Female | 2 | 60 | 80 | 538 | 1 | TRUE |
| 9 | 1 | 218 | 2 | 53 | Male | 1 | 70 | 80 | 825 | 16 | TRUE |
regression_scan() automatically tests multiple variable
transformations for each predictor:
cox_scan <- regression_scan(
data = cancer_clean,
y = "status",
time = "time",
# set predictors = NULL to scan all available variables
# predictors = NULL,
predictors = c("age", "sex", "ph.karno", "pat.karno", "wt.loss", "meal.cal"),
covars = NULL,
save_table = FALSE
)
knitr::kable(cox_scan, caption = "Cox Regression Scan Results")| predictor | nvalid | original.HR | original.pval | original.padj | logarithm.HR | logarithm.pval | logarithm.padj | categorized.HR | categorized.pval | categorized.padj | rcs.overall.pval | rcs.overall.padj | rcs.nonlinear.p | best.var.trans | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | pat.karno | 167 | 0.9812708 | 0.0021995 | 0.0131971 | 0.2933451 | 0.0024286 | 0.0097142 | 0.5831016 | 0.0038615 | 0.0193076 | 0.0210379 | 0.0698049 | 0.9680713 | original |
| 2 | sex | 167 | 0.6192558 | 0.0147907 | 0.0443721 | NA | NA | NA | NA | NA | NA | NA | NA | NA | original |
| 1 | age | 167 | 1.0200874 | 0.0641959 | 0.1029532 | 3.2622315 | 0.0688278 | 0.1275780 | 1.3335186 | 0.1177009 | 0.1471261 | 0.1464013 | 0.1830016 | 0.4447903 | original |
| 3 | ph.karno | 167 | 0.9878729 | 0.0686354 | 0.1029532 | 0.4378243 | 0.0956835 | 0.1275780 | 0.6274233 | 0.0184766 | 0.0461915 | 0.0279220 | 0.0698049 | 0.0803601 | categorized |
| 6 | meal.cal | 167 | 0.9998804 | 0.6209701 | 0.7451641 | 0.8967025 | 0.5548170 | 0.5548170 | 0.8640874 | 0.4291805 | 0.4291805 | 0.7222351 | 0.7222351 | 0.6013624 | categorized |
| 5 | wt.loss | 167 | 1.0001532 | 0.9817372 | 0.9817372 | NA | NA | NA | 1.4330287 | 0.0527764 | 0.0879607 | 0.1053526 | 0.1755877 | 0.0466637 | categorized |
The scan reports the effect size, p-value, adjusted p-value (BH correction by default), and the best transformation.
logistic_scan <- regression_scan(
data = cancer_clean,
y = "dead",
predictors = c("age", "sex", "ph.karno", "pat.karno", "wt.loss"),
covars = NULL,
save_table = FALSE
)
knitr::kable(logistic_scan, caption = "Logistic Regression Scan Results")| predictor | nvalid | original.OR | original.pval | original.padj | logarithm.OR | logarithm.pval | logarithm.padj | categorized.OR | categorized.pval | categorized.padj | rcs.overall.pval | rcs.overall.padj | rcs.nonlinear.p | best.var.trans | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | sex | 167 | 0.3742964 | 0.0053682 | 0.0268408 | NA | NA | NA | NA | NA | NA | NA | NA | NA | original |
| 4 | pat.karno | 167 | 0.9701901 | 0.0184097 | 0.0460243 | 0.1480629 | 0.0465029 | 0.0465029 | 0.4427438 | 0.0373640 | 0.0498187 | 0.0689533 | 0.1457501 | 0.5271516 | original |
| 3 | ph.karno | 167 | 0.9703678 | 0.0396626 | 0.0528173 | 0.0984975 | 0.0461388 | 0.0465029 | 0.3157895 | 0.0161386 | 0.0475296 | 0.0995735 | 0.1457501 | 0.2382474 | categorized |
| 1 | age | 167 | 1.0389903 | 0.0422538 | 0.0528173 | 10.2713218 | 0.0335189 | 0.0465029 | 1.7416268 | 0.1120568 | 0.1120568 | 0.1457501 | 0.1457501 | 0.5010354 | logarithm |
| 5 | wt.loss | 167 | 1.0084347 | 0.5300552 | 0.5300552 | NA | NA | NA | 2.2308546 | 0.0237648 | 0.0475296 | 0.1176075 | 0.1457501 | 0.0679239 | categorized |
In clinical publications, it’s standard to present multiple model specifications (Crude, Partially adjusted, Fully adjusted) to show how confounding affects the estimated association.
cox_results <- regression_basic_results(
data = cancer_clean,
x = "age",
y = "status",
time = "time",
model_covs = list(
Crude = c(),
Model1 = c("sex"),
Model2 = c("sex", "ph.karno", "pat.karno")
),
pers = c(1, 5, 10),
save_output = FALSE
)
knitr::kable(cox_results$table, caption = "Cox Regression Results for Age")| Terms | Count | Crude | Crude | Model1 | Model1 | Model2 | Model2 |
|---|---|---|---|---|---|---|---|
| age (All) | 167 | HR | P | HR | P | HR | P |
| Continuous | NA | 1.02(1.00,1.04) | 0.064 | 1.02(1.00,1.04) | 0.109 | 1.01(0.99,1.03) | 0.352 |
| Continuous, per 1 | NA | 1.02(1.00,1.04) | 0.064 | 1.02(1.00,1.04) | 0.109 | 1.01(0.99,1.03) | 0.352 |
| Continuous, per 5 | NA | 1.10(0.99,1.23) | 0.064 | 1.09(0.98,1.21) | 0.109 | 1.06(0.94,1.18) | 0.352 |
| Continuous, per 10 | NA | 1.22(0.99,1.51) | 0.064 | 1.19(0.96,1.47) | 0.109 | 1.11(0.89,1.40) | 0.352 |
| Continuous, per 1 SD | NA | 1.20(0.99,1.46) | 0.064 | 1.17(0.97,1.43) | 0.109 | 1.10(0.90,1.36) | 0.352 |
| Continuous, logarithm | NA | 3.26(0.91,11.66) | 0.069 | 2.79(0.78,10.06) | 0.116 | 1.89(0.49,7.34) | 0.359 |
| Grouped by Quartiles | NA | NA | NA | NA | NA | NA | NA |
| Q1 | 41 | 1 (Reference) | NA | 1 (Reference) | NA | 1 (Reference) | NA |
| Q2 | 42 | 0.79(0.46,1.34) | 0.373 | 0.72(0.42,1.23) | 0.228 | 0.71(0.42,1.22) | 0.215 |
| Q3 | 40 | 1.03(0.61,1.73) | 0.905 | 0.98(0.58,1.65) | 0.941 | 0.87(0.51,1.49) | 0.613 |
| Q4 | 44 | 1.34(0.81,2.20) | 0.256 | 1.23(0.75,2.04) | 0.414 | 1.08(0.64,1.83) | 0.775 |
| P for trend | NA | NA | 0.144 | NA | 0.208 | NA | 0.561 |
| Grouped by Median Value | NA | NA | NA | NA | NA | NA | NA |
| Low | 83 | 1 (Reference) | NA | 1 (Reference) | NA | 1 (Reference) | NA |
| High | 84 | 1.33(0.93,1.91) | 0.118 | 1.31(0.92,1.88) | 0.138 | 1.16(0.79,1.71) | 0.437 |
logistic_results <- regression_basic_results(
data = cancer_clean,
x = "ph.karno",
y = "dead",
model_covs = list(
Crude = c(),
Adjusted = c("age", "sex")
),
factor_breaks = c(80),
factor_labels = c("<80", ">=80"),
save_output = FALSE
)
knitr::kable(logistic_results$table, caption = "Logistic Regression Results for Performance Status")| Terms | Count | Crude | Crude | Adjusted | Adjusted |
|---|---|---|---|---|---|
| ph.karno (All) | 167 | OR | P | OR | P |
| Continuous | NA | 0.97(0.94,1.00) | 0.040 | 0.97(0.94,1.00) | 0.085 |
| Continuous, per 0.1 | NA | 1.00(0.99,1.00) | 0.040 | 1.00(0.99,1.00) | 0.085 |
| Continuous, per 10 | NA | 0.74(0.55,0.98) | 0.040 | 0.76(0.55,1.03) | 0.085 |
| Continuous, per 100 | NA | 0.05(0.003,0.80) | 0.040 | 0.07(0.003,1.36) | 0.085 |
| Continuous, per 1 SD | NA | 0.68(0.47,0.97) | 0.040 | 0.71(0.47,1.04) | 0.085 |
| Continuous, logarithm | NA | 0.10(0.009,0.87) | 0.046 | 0.12(0.009,1.34) | 0.097 |
| Grouped by Quartiles | NA | NA | NA | NA | NA |
| Q1 | 20 | 1 (Reference) | NA | 1 (Reference) | NA |
| Q2 | 24 | 2.75(0.48,21.64) | 0.275 | 2.58(0.43,20.67) | 0.314 |
| Q3 | 47 | 0.53(0.14, 1.76) | 0.326 | 0.56(0.14, 1.99) | 0.393 |
| Q4 | 76 | 0.48(0.13, 1.47) | 0.229 | 0.50(0.12, 1.69) | 0.291 |
| P for trend | NA | NA | 0.039 | NA | 0.077 |
| Grouped by Median Value | NA | NA | NA | NA | NA |
| Low | 44 | 1 (Reference) | NA | 1 (Reference) | NA |
| High | 123 | 0.32(0.11,0.76) | 0.016 | 0.34(0.12,0.88) | 0.035 |
| Grouped by Clinical Value | NA | NA | NA | NA | NA |
| <80 | 44 | 1 (Reference) | NA | 1 (Reference) | NA |
| >=80 | 123 | 0.32(0.11,0.76) | 0.016 | 0.34(0.12,0.88) | 0.035 |
When clinical cut-points aren’t established, use tertiles for balanced group comparison:
custom_results <- regression_basic_results(
data = cancer_clean,
x = "age",
y = "status",
time = "time",
quantile_breaks = c(0.33, 0.67),
quantile_labels = c("Youngest Third", "Middle Third", "Oldest Third"),
label_with_range = TRUE,
save_output = FALSE
)
knitr::kable(custom_results$table, caption = "Cox Regression Results - Age Tertiles")| Terms | Count | Crude | Crude |
|---|---|---|---|
| age (All) | 167 | HR | P |
| Continuous | NA | 1.02(1.00,1.04) | 0.064 |
| Continuous, per 0.1 | NA | 1.00(1.00,1.00) | 0.064 |
| Continuous, per 10 | NA | 1.22(0.99,1.51) | 0.064 |
| Continuous, per 100 | NA | 7.31(0.89,60.04) | 0.064 |
| Continuous, per 1 SD | NA | 1.20(0.99,1.46) | 0.064 |
| Continuous, logarithm | NA | 3.26(0.91,11.66) | 0.069 |
| Grouped by Quartiles | NA | NA | NA |
| Q1 [39,57) | 41 | 1 (Reference) | NA |
| Q2 [57,64) | 42 | 0.79(0.46,1.34) | 0.373 |
| Q3 [64,70) | 40 | 1.03(0.61,1.73) | 0.905 |
| Q4 [70,82] | 44 | 1.34(0.81,2.20) | 0.256 |
| P for trend | NA | NA | 0.144 |
| Grouped by Median Value | NA | NA | NA |
| Low [39,64) | 83 | 1 (Reference) | NA |
| High [64,82] | 84 | 1.33(0.93,1.91) | 0.118 |
| Values | NA | NA | NA |
| Youngest Third [39,58.8) | 55 | 1 (Reference) | NA |
| Middle Third [58.8,68) | 54 | 0.92(0.58,1.46) | 0.721 |
| Oldest Third [68,82] | 58 | 1.37(0.89,2.13) | 0.154 |
| P for trend | NA | NA | 0.145 |
uni_forest <- regression_forest(
data = cancer_clean,
model_vars = c("age", "ph.karno", "pat.karno", "wt.loss"),
y = "status",
time = "time",
as_univariate = TRUE, # Generate univariate forest plot
save_plot = FALSE
)
uni_forestThe univariate forest plot displays:
In this example, pat.karno shows a significant
protective effect (HR ≈ 0.98, p < 0.01), while age,
ph.karno, and wt.loss do not reach statistical
significance in univariate analysis.
multi_forest <- regression_forest(
data = cancer_clean,
model_vars = list(
Model1 = c("age"),
Model2 = c("age", "ph.karno"),
Model3 = c("age", "ph.karno", "pat.karno")
),
y = "status",
time = "time",
as_univariate = FALSE, # Generate multivariate forest plot
save_plot = FALSE
)
multi_forestThe multivariate forest plot compares different model specifications:
age (crude
model)age and
ph.karno (partially adjusted)age,
ph.karno, and pat.karno (fully adjusted)Key observations: - As more covariates are added, the effect estimate
for age changes slightly - The confidence intervals may
widen or narrow depending on the correlation between predictors - This
visualization helps assess confounding effects and model stability
custom_forest <- regression_forest(
data = cancer_clean,
model_vars = c("age", "ph.karno", "pat.karno", "wt.loss", "meal.cal"),
y = "status",
time = "time",
as_univariate = TRUE,
show_vars = c("age", "ph.karno", "pat.karno"),
est_nsmall = 2,
p_nsmall = 4,
save_plot = FALSE
)
custom_forestThe customized forest plot demonstrates several formatting options:
show_vars: Limits displayed variables
to age, ph.karno, and
pat.karnoest_nsmall = 2: Displays effect
estimates with 2 decimal placesp_nsmall = 4: Displays p-values with 4
decimal placesThis allows focused presentation of key predictors with precise numerical formatting suitable for publication.
A unified interface for both continuous and categorical predictors:
effect_linear <- predictor_effect_plot(
data = cancer_clean,
x = "age",
y = "status",
time = "time",
covars = c("sex"),
method = "linear",
add_hist = TRUE,
save_plot = FALSE
)
effect_linearThe linear predictor effect plot shows:
This plot assumes a linear relationship between the predictor and outcome, useful for confirming linearity assumptions or presenting simple linear effects.
effect_cat <- predictor_effect_plot(
data = cancer_clean,
x = "sex",
y = "status",
time = "time",
covars = c("age", "ph.karno"),
method = "categorical",
save_plot = FALSE
)
effect_catThe categorical predictor effect plot displays:
In this example, females show a significantly lower hazard compared to males (HR ≈ 0.62, 95% CI does not include 1.0), indicating better survival outcomes for female patients after adjusting for age and performance status.
RCS plots visualize non-linear relationships while preserving the
continuous nature of the data, the plot can also be generated using
predictor_effect_plot() with
method = "rcs":
rcs_age <- rcs_plot(
data = cancer_clean,
x = "age",
y = "status",
time = "time",
covars = c("sex", "ph.karno"),
knot = 4,
ref = "x_median",
add_hist = TRUE,
save_plot = FALSE
)
rcs_ageThe RCS (Restricted Cubic Spline) plot visualizes the non-linear relationship between age and survival, the element interpretation is similar to the linear plot. There is one additional element:
Key features: - The spline with 4 knots (knot = 4)
allows flexible modeling of non-linear relationships - Reference point
(ref = "x_median") is set at the median age (HR = 1.0) -
The plot shows how the hazard changes relative to the median age
custom_rcs <- rcs_plot(
data = cancer_clean,
x = "wt.loss",
y = "status",
time = "time",
covars = c("age", "sex"),
knot = 5,
ref = 0,
add_hist = FALSE,
trans = "log10",
y_lim = c(0.5, 5),
save_plot = FALSE
)
custom_rcsThe customized RCS plot demonstrates advanced options:
knot = 5: Uses 5 knots for more
flexible curve fittingref = 0: Sets reference point at
weight loss = 0 (no weight loss)trans = "log10": Applies log10
transformation to the y-axis. To enable log-scale y-axis,
add_hist must be FALSEy_lim = c(0.5, 5): Custom y-axis
limits for better visualizationadd_hist = FALSE: Excludes the
histogram for cleaner presentationThis plot examines the relationship between weight loss and survival, with the reference at zero weight loss. The log-scale y-axis helps visualize effect sizes across a wide range.
Interaction analysis is essential in clinical research to assess whether the effect of a predictor variable on the outcome differs across subgroups defined by another variable (effect modification). In medical papers, this is commonly referred to as subgroup analysis with interaction testing.
The interaction_scan() function systematically evaluates
potential interactions between multiple predictors and grouping
variables:
linear.p.int):
Tests whether the linear effect of a predictor differs across
subgroupsrcs.p.int): Tests for
non-linear interactions using restricted cubic splines (4 knots)nvalid): Reports the
number of complete cases for each interaction testClinical interpretation: A significant interaction p-value (typically < 0.05) indicates that the effect of the predictor on the outcome is modified by the grouping variable, suggesting that treatment effects or risk associations may differ between subgroups.
interaction_scan <- interaction_scan(
data = cancer_clean,
y = "status",
time = "time",
predictors = c("age", "ph.karno", "pat.karno"),
group_vars = c("sex", "ph.karno"), # continuous variables will be split by median into subgroups
save_table = FALSE
)
knitr::kable(interaction_scan, caption = "Interaction Scan Results")| predictor | group.by | nvalid | linear.p.int | rcs.p.int | linear.p.adj | rcs.p.adj | |
|---|---|---|---|---|---|---|---|
| 2 | age | ph.karno | 167 | 0.0972655 | 0.0117594 | 0.3101036 | 0.0235189 |
| 3 | ph.karno | sex | 167 | 0.1240414 | NA | 0.3101036 | NA |
| 5 | pat.karno | ph.karno | 167 | 0.5731909 | NA | 0.7485232 | NA |
| 1 | age | sex | 167 | 0.6276238 | 0.9342510 | 0.7485232 | 0.9342510 |
| 4 | pat.karno | sex | 167 | 0.7485232 | NA | 0.7485232 | NA |
Understanding the output columns:
| Column | Description |
|---|---|
predictor |
The continuous or categorical variable being tested for effect modification |
group.by |
The subgroup variable (e.g., sex) that may modify the predictor’s effect |
nvalid |
Number of complete observations used in the interaction test |
linear.p.int |
Raw p-value for linear interaction effect |
linear.p.adj |
BH-adjusted p-value for linear interaction |
rcs.p.int |
Raw p-value for non-linear (RCS) interaction effect |
rcs.p.adj |
BH-adjusted p-value for RCS interaction |
interaction_age_sex <- interaction_plot(
data = cancer_clean,
predictor = "age",
y = "status",
time = "time",
group_var = "sex",
covars = c("ph.karno"),
save_plot = FALSE
)
interaction_age_sex
#> $lin#>
#> $rcs
The interaction plot displays two panels:
Linear Interaction Plot ($lin): - Shows
the effect of age on survival separately for males and females -
Different colored lines represent different sex groups - Parallel lines
suggest no interaction; diverging/converging lines suggest interaction -
Shaded areas represent 95% confidence intervals
RCS Interaction Plot ($rcs): - Shows
non-linear effects of age within each sex group - Allows detection of
non-linear interactions - Each curve represents the HR relative to the
reference point for that sex
In this example, the lines appear roughly parallel, suggesting no significant interaction between age and sex (confirmed by p-value > 0.05).
subgroup_result <- subgroup_forest(
data = cancer_clean,
x = "age",
y = "status",
time = "time",
subgroup_vars = c("sex", "pat.karno"),
covars = c("ph.karno"),
save_plot = FALSE
)
subgroup_resultThe subgroup forest plot displays the effect of age on survival stratified by groups:
Each row shows: - Sample size for that subgroup - Effect estimate with 95% confidence interval - P-value of interaction
This visualization helps assess: - Whether the effect is consistent across subgroups - Potential effect modification by the stratifying variable
| Function | Purpose |
|---|---|
regression_scan() |
Automated screening of predictors with multiple transformations |
regression_basic_results() |
Comprehensive results with various model specifications |
regression_forest() |
Publication-ready forest plots |
rcs_plot() / predictor_effect_plot() |
Flexible visualization of predictor effects |
interaction_scan() /
interaction_plot() |
Interaction analysis and visualization |
subgroup_forest() |
Subgroup analysis with forest plots |