Package 'MBMethPred' reference manual

Title:	Medulloblastoma Subgroups Prediction
Description:	Utilizing a combination of machine learning models (Random Forest, Naive Bayes, K-Nearest Neighbor, Support Vector Machines, Extreme Gradient Boosting, and Linear Discriminant Analysis) and a deep Artificial Neural Network model, 'MBMethPred' can predict medulloblastoma subgroups, including wingless (WNT), sonic hedgehog (SHH), Group 3, and Group 4 from DNA methylation beta values. See Sharif Rahmani E, Lawarde A, Lingasamy P, Moreno SV, Salumets A and Modhukur V (2023), MBMethPred: a computational framework for the accurate classification of childhood medulloblastoma subgroups using data integration and AI-based approaches. Front. Genet. 14:1233657. <doi: 10.3389/fgene.2023.1233657> for more details.
Authors:	Edris Sharif Rahmani [aut, ctb, cre] , Ankita Sunil Lawarde [aut, ctb] , Vijayachitra Modhukur [aut, ctb]
Maintainer:	Edris Sharif Rahmani <[email protected]>
License:	GPL
Version:	0.1.4.2
Built:	2025-03-21 04:35:17 UTC
Source:	https://github.com/sharifrahmanie/mbmethpred

Box plot

Description

A function to draw a box plot for the DNA methylation dataset.

Usage

BoxPlot(File, Projname = NULL)
BoxPlot(File, Projname = NULL)

Arguments

`File`	The output of ReadMethylFile function.
`Projname`	A name used to name the plot. The default is null.

Value

A ggplot2 object

Examples

data <- Data2[1:10,]
data <- cbind(rownames(data), data)
colnames(data)[1] <- "ID"
BoxPlot(File = data)
data <- Data2[1:10,]
data <- cbind(rownames(data), data)
colnames(data)[1] <- "ID"
BoxPlot(File = data)

Confusion matrix

Description

A function to calculate the confusion matrix of the machine and deep learning models. It outputs Accuracy, Precision, Sensitivity, F1-Score, Specificity, and AUC_average.

Usage

ConfusionMatrix(y_true, y_pred)
ConfusionMatrix(y_true, y_pred)

Arguments

`y_true`	True labels
`y_pred`	Predicted labels

Value

A data frame

Examples

set.seed(1234)
data <- Data1[1:10,]
data$subgroup <- factor(data$subgroup)
fac <- ncol(data)
split <- caTools::sample.split(data[, fac], SplitRatio = 0.8)
training_set <- subset(data, split == TRUE)
test_set <- subset(data, split == FALSE)
rf <- randomForest::randomForest(x = training_set[-fac],
                                 y = training_set[, fac],
                                 ntree = 10)
y_pred <- predict(rf, newdata = test_set[-fac])
ConfusionMatrix(y_true = test_set[, fac],
                y_pred = y_pred)
set.seed(1234)
data <- Data1[1:10,]
data$subgroup <- factor(data$subgroup)
fac <- ncol(data)
split <- caTools::sample.split(data[, fac], SplitRatio = 0.8)
training_set <- subset(data, split == TRUE)
test_set <- subset(data, split == FALSE)
rf <- randomForest::randomForest(x = training_set[-fac],
                                 y = training_set[, fac],
                                 ntree = 10)
y_pred <- predict(rf, newdata = test_set[-fac])
ConfusionMatrix(y_true = test_set[, fac],
                y_pred = y_pred)

Data1 is a medulloblastoma DNA methylation beta values from a GEO series (GSE85212) and focuses on 399 as the most important probes. This dataset is used to train and test the machine and deep learning models.

Value

A data frame

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85212

Examples

data(Data1)
data(Data1)

Data2

Description

Data2 is a medulloblastoma DNA methylation beta values (GSE85212, 50 samples) including 10000 most variable probes used for similarity network fusion.

Value

A data frame

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85212

References

Cavalli FMG, Remke M, Rampasek L, Peacock J et al. Intertumoral Heterogeneity within Medulloblastoma Subgroups. Cancer Cell 2017 Jun 12;31(6):737-754.e6. PMID: 28609654

Examples

data(Data2)
data(Data2)

Data3

Description

Data3 is an gene expression dataset from primary medulloblastoma samples (GSE85217, 50 samples) used for similarity network fusion.

Value

A data frame

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85217

References

Cavalli FMG, Remke M, Rampasek L, Peacock J et al. Intertumoral Heterogeneity within Medulloblastoma Subgroups. Cancer Cell 2017 Jun 12;31(6):737-754.e6. PMID: 28609654

Examples

data(Data3)
data(Data3)

K nearest neighbor model

Description

A function to train a K nearest neighbor model to classify medulloblastoma subgroups using DNA methylation beta values (Illumina Infinium HumanMethylation450). Prediction is followed by training if new data is provided.

Arguments

`SplitRatio`	Train and test split ratio. A value greater or equal to zero and less than one.
`CV`	The number of folds for cross-validation. It should be greater than one.
`K`	The number of nearest neighbors.
`NCores`	The number of cores for parallel computing.
`NewData`	A methylation beta values input from the ReadMethylFile function.

Value

A list

Examples

set.seed(111)
knn <- KNearestNeighborModel(SplitRatio = 0.8,
                             CV = 3,
                             K = 3,
                             NCores = 1,
                             NewData = NULL)
set.seed(111)
knn <- KNearestNeighborModel(SplitRatio = 0.8,
                             CV = 3,
                             K = 3,
                             NCores = 1,
                             NewData = NULL)

Linear discriminant analysis model

Description

A function to train a linear discriminant analysis model to classify medulloblastoma subgroups using the DNA methylation beta values (Illumina Infinium HumanMethylation450). Prediction is followed by training if new data is provided.

Arguments

`SplitRatio`	Train and test split ratio. A value greater or equal than zero and less than one.
`CV`	The number of folds for cross validation. It should be greater than one.
`NCores`	The number of cores for parallel computing.
`NewData`	A methylation beta values input from the ReadMethylFile function.

Value

A list

Examples

set.seed(123)
lda <- LinearDiscriminantAnalysisModel(SplitRatio = 0.8,
                                       CV = 2,
                                       NCores = 1,
                                       NewData = NULL)
set.seed(123)
lda <- LinearDiscriminantAnalysisModel(SplitRatio = 0.8,
                                       CV = 2,
                                       NCores = 1,
                                       NewData = NULL)

Model metrics

Description

A function to extract the confusion matrix information.

Usage

ModelMetrics(Model)
ModelMetrics(Model)

Arguments

Model

A trained model.

Value

A list

Examples

xgboost <- XGBoostModel(SplitRatio = 0.2,
                        CV = 2,
                        NCores = 1,
                        NewData = NULL)
ModelMetrics(Model = xgboost)
xgboost <- XGBoostModel(SplitRatio = 0.2,
                        CV = 2,
                        NCores = 1,
                        NewData = NULL)
ModelMetrics(Model = xgboost)

Naive bayes model

Description

A function to train a Naive Bayes model to classify medulloblastoma subgroups using DNA methylation beta values (Illumina Infinium HumanMethylation450). Prediction is followed by training if new data is provided.

Arguments

`SplitRatio`	Train and test split ratio. A value greater or equal to zero and less than one.
`CV`	The number of folds for cross-validation. It should be greater than one.
`Threshold`	The threshold for deciding class probability. A value greater or equal to zero and less than one.
`NCores`	The number of cores for parallel computing.
`NewData`	A methylation beta values input from the ReadMethylFile function.

Value

A list

Examples

set.seed(123)
nb <- NaiveBayesModel(SplitRatio = 0.8,
                      CV = 2,
                      Threshold = 0.8,
                      NCores = 1,
                      NewData = NULL)
set.seed(123)
nb <- NaiveBayesModel(SplitRatio = 0.8,
                      CV = 2,
                      Threshold = 0.8,
                      NCores = 1,
                      NewData = NULL)

Artificial neural network model

Description

A function to train an artificial neural network model to classify medulloblastoma subgroups using DNA methylation beta values (Illumina Infinium HumanMethylation450). Prediction is followed by training if new data is provided.

Arguments

`Epochs`	The number of epochs.
`NewData`	A methylation beta values input from the ReadMethylFile function.
`InstallTensorFlow`	Logical. Running this function for the first time, you need to install TensorFlow library (V 2.10-cpu). Default is TRUE.

Value

A list

Examples

## Not run: 
set.seed(1234)
ann <- NeuralNetworkModel(Epochs = 100,
                          NewData = NULL,
                          InstallTensorFlow = TRUE)

## End(Not run)
## Not run: 
set.seed(1234)
ann <- NeuralNetworkModel(Epochs = 100,
                          NewData = NULL,
                          InstallTensorFlow = TRUE)

## End(Not run)

New data prediction result

Description

A function to output the predicted medulloblastoma subgroups by trained models.

Usage

NewDataPredictionResult(Model)
NewDataPredictionResult(Model)

Arguments

Model

A trained model

Value

A data frame

Examples

set.seed(10)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
xgboost <- XGBoostModel(SplitRatio = 0.2,
                        CV = 2,
                        NCores = 1,
                        NewData = NewData)
NewDataPredictionResult(Model = xgboost)
set.seed(10)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
xgboost <- XGBoostModel(SplitRatio = 0.2,
                        CV = 2,
                        NCores = 1,
                        NewData = NewData)
NewDataPredictionResult(Model = xgboost)

Random forest model

Description

A function to train a random forest model to classify medulloblastoma subgroups using DNA methylation beta values (Illumina Infinium HumanMethylation450). Prediction is followed by training if new data is provided.

Arguments

`SplitRatio`	Train and test split ratio. A value greater or equal to zero and less than one.
`CV`	The number of folds for cross-validation. It should be greater than one.
`NTree`	The number of trees to be grown.
`NCores`	The number of cores for parallel computing.
`NewData`	A methylation beta values input from the ReadMethylFile function.

Value

A list

Examples

set.seed(21)
rf <- RandomForestModel(SplitRatio = 0.8,
                        CV = 3,
                        NTree = 10,
                        NCores = 1,
                        NewData = NULL)
set.seed(21)
rf <- RandomForestModel(SplitRatio = 0.8,
                        CV = 3,
                        NTree = 10,
                        NCores = 1,
                        NewData = NULL)

Input file for prediction

Description

A function to read DNA methylation files. It can be used as the new data for prediction by every model.

Usage

ReadMethylFile(File)
ReadMethylFile(File)

Arguments

File

A data frame with tsv or csv file extension. The first column of the data frame is the CpG methylation probe that starts with cg characters and is followed by a number (e.g., cg100091). Other columns are samples with methylation beta values. All columns in the data frame should have a name.

Value

A data frame

Examples

## Not run: 
methyl <- ReadMethylFile(File = "file.csv")

## End(Not run)
## Not run: 
methyl <- ReadMethylFile(File = "file.csv")

## End(Not run)

Input file for similarity network fusion (SNF)

Description

A function to read user-provided file feeding into the SNF function (from the SNFtools package).

Usage

ReadSNFData(File)
ReadSNFData(File)

Arguments

File

Value

A data frame

Examples

## Not run: 
data <- ReadSNFData(File = "file.csv")

## End(Not run)
## Not run: 
data <- ReadSNFData(File = "file.csv")

## End(Not run)

RLabels

Description

The actual labels from the medulloblastoma DNA methylation beta values (GSE85212, 50 samples) that was used for similarity network fusion.

Value

Factor

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85212

References

Cavalli FMG, Remke M, Rampasek L, Peacock J et al. Intertumoral Heterogeneity within Medulloblastoma Subgroups. Cancer Cell 2017 Jun 12;31(6):737-754.e6. PMID: 28609654

Examples

data(RLabels)
data(RLabels)

Similarity network fusion (SNF)

Description

A function to perform SNF function (from SNFtool package) and output clusters.

Usage

SimilarityNetworkFusion(
  Files = NULL,
  NNeighbors,
  Sigma,
  NClusters,
  CLabels = NULL,
  RLabels = NULL,
  Niterations
)
SimilarityNetworkFusion(
  Files = NULL,
  NNeighbors,
  Sigma,
  NClusters,
  CLabels = NULL,
  RLabels = NULL,
  Niterations
)

Arguments

`Files`	A list of data frames created using the ReadSNFData function or matrices.
`NNeighbors`	The number of nearest neighbors.
`Sigma`	The variance for local model.
`NClusters`	The number of clusters.
`CLabels`	A string vector to name the clusters. Optional.
`RLabels`	The actual label of samples to calculate the Normalized Mutual Information (NMI) score. Optional.
`Niterations`	The number of iterations for the diffusion process.

Value

Factor

Examples

data(RLabels) # Real labels
data(Data2) # Methylation
data(Data3) # Gene expression
snf <- SimilarityNetworkFusion(Files = list(Data2, Data3),
                               NNeighbors  = 13,
                               Sigma = 0.75,
                               NClusters = 4,
                               CLabels = c("Group4", "SHH", "WNT", "Group3"),
                               RLabels = RLabels,
                               Niterations = 10)
snf
data(RLabels) # Real labels
data(Data2) # Methylation
data(Data3) # Gene expression
snf <- SimilarityNetworkFusion(Files = list(Data2, Data3),
                               NNeighbors  = 13,
                               Sigma = 0.75,
                               NClusters = 4,
                               CLabels = c("Group4", "SHH", "WNT", "Group3"),
                               RLabels = RLabels,
                               Niterations = 10)
snf

Support vector machine model

Description

A function to train a support vector machine model to classify medulloblastoma subgroups using DNA methylation beta values (Illumina Infinium HumanMethylation450). Prediction is followed by training if new data is provided.

Arguments

`SplitRatio`	Train and test split ratio. A value greater or equal to zero and less than one.
`CV`	The number of folds for cross-validation. It should be greater than one.
`NCores`	The number of cores for parallel computing.
`NewData`	A methylation beta values input from the ReadMethylFile function.

Value

A list

Examples

set.seed(56)
svm <- SupportVectorMachineModel(SplitRatio = 0.8,
                                 CV = 3,
                                 NCores = 1,
                                 NewData = NULL)
set.seed(56)
svm <- SupportVectorMachineModel(SplitRatio = 0.8,
                                 CV = 3,
                                 NCores = 1,
                                 NewData = NULL)

t-SNE 3D plot

Description

A function to draw a 3D t-SNE plot for DNA methylation beta values using the K-means clustering technique.

Usage

TSNEPlot(File, NCluster = 4)
TSNEPlot(File, NCluster = 4)

Arguments

`File`	The output of ReadMethylFile function.
`NCluster`	The number of cluster.

Value

Objects of rgl

Examples


set.seed(123)
data <- Data2[1:100,]
data <- data.frame(t(data))
data <- cbind(rownames(data), data)
colnames(data)[1] <- "ID"
TSNEPlot(File = data, NCluster = 4)

set.seed(123)
data <- Data2[1:100,]
data <- data.frame(t(data))
data <- cbind(rownames(data), data)
colnames(data)[1] <- "ID"
TSNEPlot(File = data, NCluster = 4)

XGBoost model

Description

A function to train an XGBoost model to classify medulloblastoma subgroups using DNA methylation beta values (Illumina Infinium HumanMethylation450). Prediction is followed by training if new data is provided.

Arguments

`SplitRatio`	Train and test split ratio. A value greater or equal to zero and less than one.
`CV`	The number of folds for cross-validation. It should be greater than one.
`NCores`	The number of cores for parallel computing.
`NewData`	A methylation beta values input from the ReadMethylFile function.

Value

A list

Examples

set.seed(123)
xgboost <- XGBoostModel(SplitRatio = 0.2,
                        CV = 2,
                        NCores = 1,
                        NewData = NULL)
set.seed(123)
xgboost <- XGBoostModel(SplitRatio = 0.2,
                        CV = 2,
                        NCores = 1,
                        NewData = NULL)

Package 'MBMethPred'

Help Index

Box plot

Description

Usage

Arguments

Value

Examples

Confusion matrix

Description

Usage

Arguments

Value

Examples

Training data

Description

Value

Source

Examples

Data2

Description

Value

Source

References

Examples

Data3

Description

Value

Source

References

Examples

K nearest neighbor model

Description

Arguments

Value

Examples

Linear discriminant analysis model

Description

Arguments

Value

Examples

Model metrics

Description

Usage

Arguments

Value

Examples

Naive bayes model

Description

Arguments

Value

Examples

Artificial neural network model

Description

Arguments

Value

Examples

New data prediction result

Description

Usage

Arguments

Value

Examples

Random forest model

Description

Arguments

Value

Examples

Input file for prediction

Description

Usage

Arguments

Value

Examples

Input file for similarity network fusion (SNF)

Description

Usage

Arguments

Value

Examples