Abstract

The ODT R package implements the Optimal Decision Tree (ODT) algorithm (1), a novel approach designed for the field of personalized medicine. This algorithm employs tree-based methods to recommend the most suitable treatment for each patient by considering their unique genomic and mutational data.

Optimal Decision Trees iteratively refine drug recommendations along each branch until a predefined group size is achieved, ensuring that treatment suggestions are both personalized and statistically robust. This approach enhances decision-making in therapeutic contexts, allowing healthcare professionals to tailor interventions based on individual patient profiles.

Introduction

Unlike other personalized medicine algorithms that use classification or regression trees, ODT works by solving optimization problems. It takes into account how each patient responds to different drugs (sensitivity data) and their genomic or mutational information.

The algorithm selects a splitting variable, which could be a gene or a type of mutation, depending on the data being studied. For each split, ODT determines the best treatments and optimizes the measure of sensitivity for both branches based on these treatments (for example, using IC50 data). In other words, the algorithm assigns the best treatment to each patient by optimizing sensitivity data while creating an optimal decision tree.

The package consists of three main functions:

Model Training (trainTree): This function allows users to train the decision tree using the patients’ genomic or mutational data (biomarker matrix) and the drug responses (sensitivity matrix).
Optimal Treatment Assignment (predictTree): After training the tree, this function predicts the optimal treatment for each patient based on their expression and/or mutational data.
Decision Tree Visualization (niceTree): This function generates a graphical representation of the decision tree splits. Users can also download this plot in various formats to a specified directory.

Figure 1. ODT Model Workflow.

As shown in Figure 1, the ODT model operates using two key inputs: the sensitivity matrix and the biomarker matrix. Initially, the model takes the biomarker data - which may consist of a binary matrix indicating the presence or absence of mutations, or a matrix reflecting gene expression levels - to train the decision tree.

At each step, the trained tree splits patients into two groups based on the presence or absence of specific biomarkers. This split is optimized to ensure that the assigned treatment has the highest sensitivity for each group. The algorithm continues to recursively divide the branches until a predefined minimum group size is reached, at which point further splits are no longer possible.

Example Using Mutational Data

In this example, we will use a binary matrix called mut_small, which contains mutation information, along with a drug response matrix named drug_small for selected patients. We will work with a small dataset that has IC50 values.

First, we need to train the decision tree using the selected data. We will use the trainTree function, which requires the following inputs:

PatientData: The binary matrix containing mutation information, where rows correspond to patients/samples and columns correspond to genes/features.
PatientSensitivity: The matrix that provides drug response information, where rows correspond to patients/samples and columns correspond to drugs.
minbucket: A fixed parameter that specifies the minimum number of patients required in a branch of the tree to allow a split.

ODT_MUT <- trainTree(PatientResponse = mut_small, PatientSensitivity = drug_small, minbucket = 1)

The output of the trainTree function will be a decision tree that reflects the splits made by the ODT algorithm based on the provided mutational and sensitivity data, along with the treatments assigned at each split. To visualize the optimized tree, we will use the niceTree function. This function displays the mutations selected at each node and the treatment assigned to each branch (both for branches with and without the mutation).

The necessary inputs for the niceTree function are:

tree: The trained decision tree obtained from the trainTree function.
folder: The directory where the output image will be saved.

Additionally, users can customize several fixed parameters related to the plot’s appearance:

colors
fontname
fontstyle
shape
output_format

For more information regarding plot customization options, please refer to the niceTree function documentation.

niceTree(tree = ODT_MUT, folder = NULL)

To determine the treatment selected for each specific patient, we will use the predictTree function. This function identifies the treatment assigned by the algorithm based on the trained decision tree and the provided patient data. The required inputs for this function are:

tree: The trained decision tree obtained from the trainTree function.
PatientData: The binary matrix containing mutation information, where rows correspond to patients/samples and columns correspond to genes/features.
PatientSensitivityTrain: A matrix containing the drug response values of the training dataset. In this matrix, rows correspond to patients, and columns correspond to drugs. It is only for extracting treatment names and is not used in the prediction process itself.

The following code snippet demonstrates how to use the predictTree function:

# Load the necessary library and datasets
library(ODT)
data("mutations_w34")
data("drug_response_w34")

# Select a subset of the mutation and drug response data
mut_small <- mutations_w34[1:100, 1:50] # Select first 100 patients and 50 genes
drug_small <- drug_response_w34[1:100, 1:15] # Select first 100 patients and 15 drugs

# Train the decision tree using the selected patient data
ODT_MUT <- trainTree(PatientData = mut_small, PatientSensitivity = drug_small, minbucket = 2)

# Visualize the trained decision tree
niceTree(ODT_MUT)

## $Tree
## [1] root
## |   [2] NPM1 <= 1
## |   |   [3] KRAS <= 1
## |   |   |   [4] NRAS <= 1
## |   |   |   |   [5] WT1 <= 1
## |   |   |   |   |   [6] SF3B1 <= 1
## |   |   |   |   |   |   [7] FLT3 <= 1
## |   |   |   |   |   |   |   [8] KIAA0907 <= 1: Dasatinib
## |   |   |   |   |   |   |   [9] KIAA0907 > 1: Sorafenib
## |   |   |   |   |   |   [10] FLT3 > 1
## |   |   |   |   |   |   |   [11] TET2 <= 1: Sorafenib
## |   |   |   |   |   |   |   [12] TET2 > 1: Lapatinib
## |   |   |   |   |   [13] SF3B1 > 1: Ruxolitinib (INCB018424)
## |   |   |   |   [14] WT1 > 1: Crenolanib
## |   |   |   [15] NRAS > 1: Nilotinib
## |   |   [16] KRAS > 1: Pazopanib (GW786034)
## |   [17] NPM1 > 1
## |   |   [18] SRSF2 <= 1
## |   |   |   [19] IDH1 <= 1: Ibrutinib (PCI-32765)
## |   |   |   [20] IDH1 > 1: Crenolanib
## |   |   [21] SRSF2 > 1: Quizartinib (AC220)
## 
## $Plot

# Predict the optimal treatment for each patient
ODT_MUTpred <- predictTree(tree = ODT_MUT, PatientSensitivityTrain = drug_small, PatientData = mut_small)

# Retrieve and display the names of the selected treatments
names_drug <- colnames(drug_small)
selected_treatments <- names_drug[ODT_MUTpred]
selected_treatments[1:3] # Treatment selected for first 3 patients

## [1] "Crenolanib"               "Ruxolitinib (INCB018424)"
## [3] "Dasatinib"

Figure 2. Trained Decision Tree Output from the niceTree Function: This figure illustrates the decision tree generated by the ODT algorithm, showcasing the splits based on mutational data and the corresponding treatments assigned at each node.

Example Using Gene Expression Data

In this example, we will use a matrix called gene_small, which contains gene expression information, along with a drug response matrix named drug_small for selected patients.

First, we will train the decision tree using the selected data with the trainTree function. The required inputs for this function are:

PatientData: The numeric matrix containing gene expression information, where rows correspond to patients/samples and columns correspond to genes/features.
PatientSensitivity: The matrix that provides drug response information, where rows correspond to patients/samples and columns correspond to drugs.
minbucket: A fixed parameter that specifies the minimum number of patients required in a branch of the tree to allow a split.

 ODT_EXP <- trainTree(PatientData = gene_small, PatientSensitivity = drug_small, minbucket = 1)

The output of the trainTree function will be a decision tree that reflects the splits made by the ODT algorithm based on the provided genomic and sensitivity data, along with the treatments assigned at each split. To visualize the optimized tree, we will use the niceTree function. This function displays the biomarker selected at each node and the treatment assigned to each branch.

The necessary inputs for the niceTree function are:

tree: The trained decision tree obtained from the trainTree function.
folder: The directory where the output image will be saved.

Additionally, users can customize several fixed parameters related to the plot’s appearance:

colors
fontname
fontstyle
shape
output_format

For more information regarding plot customization options, please refer to the niceTree function documentation.

niceTree(tree = ODT_EXP, folder = NULL)

tree: The trained decision tree obtained from the trainTree function.
PatientData: The numeric matrix containing gene expression information, where rows correspond to patients/samples and columns correspond to genes/features.
PatientSensitivityTrain: A matrix containing the drug response values of the training dataset. In this matrix, rows correspond to patients, and columns correspond to drugs. It is only for extracting treatment names and is not used in the prediction process itself.

The following code snippet demonstrates how to use the predictTree function:

# Load the necessary library and datasets
library(ODT)

# Load the gene expression and drug response data
data("expression_w34")
data("drug_response_w34")

# Select a subset of the gene expression and drug response data
gene_small <- expression_w34[1:3, 1:3]
drug_small <- drug_response_w34[1:3, 1:3]

# Train the decision tree using the selected patient data
ODT_EXP <- trainTree(PatientData = gene_small, PatientSensitivity = drug_small, minbucket = 1)

# Visualize the trained decision tree
niceTree(ODT_EXP)

## $Tree
## [1] root
## |   [2] TSPAN6 <= -0.86591: Crizotinib (PF-2341066)
## |   [3] TSPAN6 > -0.86591: Axitinib (AG-013736)
## 
## $Plot

# Predict the optimal treatment for each patient
ODT_EXPpred <- predictTree(tree = ODT_EXP, PatientSensitivityTrain = drug_small, PatientData = gene_small)

# Retrieve and display the names of the selected treatments
selected_treatments <- colnames(drug_small)[ODT_EXPpred]
selected_treatments

## [1] "Crizotinib (PF-2341066)" "Axitinib (AG-013736)"   
## [3] "Axitinib (AG-013736)"

Figure 3. Trained Decision Tree Output from the niceTree Function: This figure illustrates the decision tree generated by the ODT algorithm, showcasing the splits based on expression data and the corresponding treatments assigned at each node.

Example: Assigning Optimal Treatment to New Patients (Mutational Data)

In this example, we will use a binary matrix containing mutation information along with a drug response matrix from existing patients. We will train a model to later predict the best treatment for a new patient whose sensitivity response to different treatments is unknown.

# Load the necessary library and datasets
library(ODT)
data("mutations_w34")
data("mutations_w12")
data("drug_response_w12")
data("drug_response_w34")

# Define a binary matrix for new patients (using the first patient as an example)
mut_newpatients<-mutations_w34[1, ,drop=FALSE]

# Train the decision tree model using known patient data
ODT_MUT<-trainTree(PatientData = mutations_w12, PatientSensitivity=drug_response_w12, minbucket =10)

# Visualize the trained decision tree
niceTree(ODT_MUT,folder=NULL)

## $Tree
## [1] root
## |   [2] NRAS <= 1
## |   |   [3] KRAS <= 1
## |   |   |   [4] BCOR <= 1
## |   |   |   |   [5] PTPN11 <= 1
## |   |   |   |   |   [6] TP53 <= 1
## |   |   |   |   |   |   [7] CBFB-MYH11 <= 1
## |   |   |   |   |   |   |   [8] CEBPA <= 1: Quizartinib (AC220)
## |   |   |   |   |   |   |   [9] CEBPA > 1: AZD1480
## |   |   |   |   |   |   [10] CBFB-MYH11 > 1: JNJ-28312141
## |   |   |   |   |   [11] TP53 > 1: XAV-939
## |   |   |   |   [12] PTPN11 > 1: Panobinostat
## |   |   |   [13] BCOR > 1: RAF265 (CHIR-265)
## |   |   [14] KRAS > 1: Selumetinib (AZD6244)
## |   [15] NRAS > 1: Trametinib (GSK1120212)
## 
## $Plot

# Predict the optimal treatment for the new patient
ODT_MUTpred<-predictTree(tree=ODT_MUT, PatientSensitivityTrain=drug_response_w12, PatientData=mut_newpatients)

# Retrieve and display the name of the selected treatment
selected_treatment <- colnames(drug_response_w12)[ODT_MUTpred]
selected_treatment

## [1] "Quizartinib (AC220)"

Figure 4. Trained Decision Tree for New Patients Using Mutational Data: This figure illustrates the output of the niceTree function, showcasing the decision tree trained on existing patient data. It highlights the splits based on mutation information and the treatment recommendations for new patients.

Example: Assigning Optimal Treatment to New Patients (Gene Expression Data)

In this example, we will use a matrix containing gene expression information along with a drug response matrix from existing patients. We will train a model to predict the best treatment for a new patient whose sensitivity response to different treatments is unknown.

# Load the necessary library and datasets
library(ODT)

# Load gene expression and drug response data
data("expression_w34")
data("expression_w12")
data("drug_response_w12")
data("drug_response_w34")

# Define a matrix for new patients (using the first patient as an example)
exp_newpatients <- expression_w34[1, , drop = FALSE]
# Train the decision tree model using known patient data
ODT_EXP <- trainTree(PatientData = expression_w12, PatientSensitivity = drug_response_w12, minbucket = 10)

# Visualize the trained decision tree
niceTree(ODT_EXP, folder = NULL)

## $Tree
## [1] root
## |   [2] VCAN <= 6.53
## |   |   [3] VAMP3 <= 7.55
## |   |   |   [4] LUC7L <= 5.44: ABT-737
## |   |   |   [5] LUC7L > 5.44: Venetoclax
## |   |   [6] VAMP3 > 7.55: CHIR-99021
## |   [7] VCAN > 6.53
## |   |   [8] TEAD3 <= -1.41: Trametinib (GSK1120212)
## |   |   [9] TEAD3 > -1.41
## |   |   |   [10] LRP6 <= -1.74: JNJ-28312141
## |   |   |   [11] LRP6 > -1.74: Panobinostat
## 
## $Plot

# Predict the optimal treatment for the new patient
ODT_EXPpred <- predictTree(tree = ODT_EXP, PatientSensitivityTrain = drug_response_w12, PatientData = exp_newpatients)

# Retrieve and display the name of the selected treatment
selected_treatment <- colnames(drug_response_w12)[ODT_EXPpred]
selected_treatment

## [1] "Panobinostat"

Figure 5. Trained Decision Tree for New Patients Using Genomic Expression Data: This figure illustrates the output of the niceTree function, showcasing the decision tree trained on existing patient data. It highlights the splits based on gene expression information and the treatment recommendations for new patients.

References

More information can be found at:

Gimeno, M., Sada del Real, K., & Rubio, A. (2023). Precision oncology: a review to assess interpretability in several explainable methods. Briefings in Bioinformatics, 24(4), bbad200. https://doi.org/10.1093/bib/bbad200

Session Information

sessionInfo()

## R version 4.4.1 Patched (2024-09-30 r87211)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.0
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Madrid
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] ODT_1.0.0         data.tree_1.1.0   partykit_1.2-22   mvtnorm_1.3-1    
## [5] libcoin_1.0-10    matrixStats_1.4.1 rmarkdown_2.28    knitr_1.48       
## 
## loaded via a namespace (and not attached):
##  [1] Matrix_1.7-0       jsonlite_1.8.9     dplyr_1.1.4        compiler_4.4.1    
##  [5] tidyselect_1.2.1   rpart_4.1.23       Rcpp_1.0.13        stringr_1.5.1     
##  [9] rsvg_2.6.1         magick_2.8.5       DiagrammeR_1.0.11  jquerylib_0.1.4   
## [13] splines_4.4.1      yaml_2.3.10        fastmap_1.2.0      lattice_0.22-6    
## [17] R6_2.5.1           generics_0.1.3     Formula_1.2-5      htmlwidgets_1.6.4 
## [21] visNetwork_2.1.2   tibble_3.2.1       inum_1.0-5         pillar_1.9.0      
## [25] bslib_0.8.0        RColorBrewer_1.1-3 rlang_1.1.4        utf8_1.2.4        
## [29] cachem_1.1.0       stringi_1.8.4      xfun_0.48          sass_0.4.9        
## [33] cli_3.6.3          withr_3.0.1        magrittr_2.0.3     digest_0.6.37     
## [37] rstudioapi_0.16.0  lifecycle_1.0.4    vctrs_0.6.5        evaluate_1.0.0    
## [41] glue_1.8.0         survival_3.7-0     fansi_1.0.6        purrr_1.0.2       
## [45] pkgconfig_2.0.3    tools_4.4.1        htmltools_0.5.8.1

ODT

Maddi Eceiza, Lucia Ruiz, Angel Rubio, Katyna Sada Del Real

September 2024