Conditional Inference Trees in R for Clinical Risk Stratification -

Conditional inference trees (ctrees) implement recursive partitioning using permutation‑based hypothesis tests rather than impurity‑based split criteria. At each node, the algorithm tests global independence between the response and each predictor, selects the variable with the smallest p‑value, and then determines the optimal split via a secondary test. This framework removes the variable‑selection bias inherent in CART, yields asymptotically unbiased split decisions, and provides an explicit statistical stopping rule.

Conditional inference trees are well‑suited for clinical decision‑support because they generate explicit subgroup definitions. Each split corresponds to a statistically validated change in outcome distribution, producing partitions that can be interpreted as clinically meaningful risk strata. This is particularly valuable in heterogeneous patient populations, where interactions and nonlinear effects often obscure subgroup‑specific risks in parametric models. By structuring these effects into a transparent sequence of conditional tests, conditional inference trees provide a principled mechanism for identifying clinically relevant subgroups and supporting individualized prognosis.

In this post, I demonstrate how to perform a conditional inference tree modeling in R using survival data and how to interpret a conditional inference tree plot in the context of clinical decision-making support for risk subgroup stratification.

Conditional Inference Trees in R

What risk factors impact the survival probability of a patient with primary brain tumor? Do patients’ sex, tumor location, gross tumor volume, tumor type, patient functional status index (Karnofsky index), and tumor treatment method have a significant relationship with survival probability?

To address this research question, a team of doctors and health researchers (Masaryk Memorial Cancer Institute Brno) collected data from 88 primary brain tumor patients on their sex (male, female), gross tumor volume (GTV), tumor diagnosis (meningioma, LG glioma, HG glioma, others), the location of the tumor in the brain (infratentorial or supratentorial), Karnofsky index (an index showing health, ranging from excellent health 100% to very poor health 0%), and treatment methods (SRS or SRT). The two treatment methods include SRS (stereotactic radiosurgery) or SRT (stereotactic radiotherapy). The event of interest in this survival analysis was the death of the patients, shown in the status variable (1 = dead, 0 = censored). Time to event is shown in the time variable (in months). Table 1 includes data for five patients in this study.

Table 1: Primary brain tumor patients’ recovery times by therapy group
Patient	Sex	Diagnosis	Location	Karnofsky Index	GTV	Treatment Method	Status	Time (m)
Patient 1	Female	Meningioma	Infratentorial	90	6.11	SRS	0	57.64
Patient 2	Male	HG glioma	Supratentorial	90	19.35	SRT	1	8.98
Patient 3	Female	Meningioma	Infratentorial	70	7.95	SRS	0	26.46
Patient 4	Female	LG glioma	Supratentorial	80	7.61	SRT	1	47.8
Patient 5	Male	HG glioma	Supratentorial	90	5.06	SRS	1	6.3
…	…	…	…	…	…	…	…	…

The complete data set for this example can be downloaded from here. The data is also available in the supplemental file of the published paper.

Listing 1 shows the R code to run a conditional inference tree model on the example survival data. The statistical results (estimates and p-values for the most significant and informative variables) are also printed in Listing 1, followed by the tree plot in Figure 1.

Listing 1: R code to run conditional inference trees.

# Conditional Inference Tree 

# ------------------------------
# Load libraries
# ------------------------------

library(partykit) # for conditional trees
library(strucchange) # for extracting test statistics and p-values
library(survival)
library(dplyr)

# ------------------------------
# Load data
# ------------------------------

dfBrainTumor <- read.csv("df_brain_tumor.csv")


# Convert missing values of -1 to NA

dfBrainTumor <- dfBrainTumor %>% mutate(across(everything(), ~ replace(.x, .x == -1, NA)))
dfBrainTumor <- na.omit(dfBrainTumor)


# Set reference level for categorical variables
refLevel<- c(sex = "Male",
             diagnosis = "Other",
             location = "Supratentorial",
             treatment = "SRT")

for (col in names(refLevel)) {
  dfBrainTumor[[col]] <- relevel(factor(dfBrainTumor[[col]]), ref = refLevel[[col]])
}


# ------------------------------
# Conditional Inference Tree
# ------------------------------
ctBrain <- ctree(Surv(time, status) ~ sex+diagnosis+location+KI+GTV+treatment, data = dfBrainTumor)

plot(ctBrain)
print(ctBrain)

# Conditional Inference Tree 

# ------------------------------
# Load libraries
# ------------------------------

library(partykit) # for conditional trees
library(strucchange) # for extracting test statistics and p-values
library(survival)
library(dplyr)

# ------------------------------
# Load data
# ------------------------------

dfBrainTumor <- read.csv("df_brain_tumor.csv")


# Convert missing values of -1 to NA

dfBrainTumor <- dfBrainTumor %>% mutate(across(everything(), ~ replace(.x, .x == -1, NA)))
dfBrainTumor <- na.omit(dfBrainTumor)


# Set reference level for categorical variables
refLevel<- c(sex = "Male",
             diagnosis = "Other",
             location = "Supratentorial",
             treatment = "SRT")

for (col in names(refLevel)) {
  dfBrainTumor[[col]] <- relevel(factor(dfBrainTumor[[col]]), ref = refLevel[[col]])
}


# ------------------------------
# Conditional Inference Tree
# ------------------------------
ctBrain <- ctree(Surv(time, status) ~ sex+diagnosis+location+KI+GTV+treatment, data = dfBrainTumor)

plot(ctBrain)
print(ctBrain)

Conditional inference trees results.

# Output

Model formula:
Surv(time, status) ~ sex + diagnosis + location + KI + GTV + 
    treatment

Fitted party:
[1] root
|   [2] diagnosis in Other, LG glioma, Meningioma
|   |   [3] KI <= 80: 47.800 (n = 42)
|   |   [4] KI > 80: Inf (n = 23)
|   [5] diagnosis in HG glioma: 11.020 (n = 22)

Number of inner nodes:    2
Number of terminal nodes: 3

# Output

Model formula:
Surv(time, status) ~ sex + diagnosis + location + KI + GTV + 
    treatment

Fitted party:
[1] root
|   [2] diagnosis in Other, LG glioma, Meningioma
|   |   [3] KI <= 80: 47.800 (n = 42)
|   |   [4] KI > 80: Inf (n = 23)
|   [5] diagnosis in HG glioma: 11.020 (n = 22)

Number of inner nodes:    2
Number of terminal nodes: 3

The model output in Listing 1 reveals what variables the conditional inference tree model has used to split the data into risk subgroups. The model selected only diagnosis and KI as splitting variables to partition the data and form stratified risk groups. The conditional inference survival tree identifies diagnosis as the primary determinant of survival, producing an initial split that separates patients with high‑grade glioma (HG glioma) from all other diagnostic categories. The HG glioma subgroup forms its own terminal node with a markedly reduced median survival of 11.0 months (n = 22), reflecting its aggressive clinical course.

Among patients with more favorable diagnoses—low‑grade glioma, meningioma, or other tumors—the next most informative predictor is Karnofsky Index (KI). Individuals with KI ≤ 80 show a median survival of 47.8 months, whereas those with KI > 80 have a median survival that was not reached during follow‑up (n = 23), indicating that more than half remained alive at study end. Overall, the tree yields three distinct risk strata defined by diagnosis and functional status.

In addition to the statistical output, conditional inference trees produce tree plots that map the sequential model decision making in a visually intuitive way. Figure 1 shows the tree plot based on the fitted model in Listing 1.

As the tree plot shows in Figure 1, the first split is on diagnosis variable, which branches into two major paths. On the left, a node groups patients with Other, low‑grade glioma, or meningioma; on the right, a separate branch isolates high‑grade glioma, forming its own terminal node. Within the left branch, a second split appears based on Karnofsky Index (KI). Patients with KI ≤ 80 form one terminal node, while those with KI > 80 form another terminal node. The plots within each terminal node shows the Kaplan-Meier survival curve estimation for that particular subgroup.

Node 3 (Leftmost – KI ≤ 80) subgroup represents patients with low‑grade glioma, meningioma, or other non‑HG tumors who also have a reduced Karnofsky Index (≤ 80). Clinically, this is an intermediate‑risk subgroup.

Node 4 (Middle – KI > 80) subgroup represents patients with the same diagnosis levels but with functional status (KI > 80), creating the lowest‑risk subgroup.

Node 5 (Rightmost – High‑grade glioma) This terminal node isolates patients with high‑grade glioma, regardless of KI. They form the highest‑risk group, with a median survival of 11.0 months, consistent with the aggressive natural history and limited responsiveness of HG gliomas. Diagnosis alone is sufficient to define this poor‑prognosis stratum.

References

Cheng, F.W., Gao, X., Bao, L., Mitchell, D.C., Wood, C., Sliwinski, M.J., Smiciklas-Wright, H., Still, C.D., Rolston, D.D.K. and Jensen, G.L. (2017), Obesity as a risk factor for developing functional limitation among older adults: A conditional inference tree analysis. Obesity, 25: 1263-1269. https://doi.org/10.1002/oby.21861

Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.doi:10.1198/106186006X133933.