MBMethPred is a user-friendly package developed for the accurate prediction of medulloblastoma subgroups using DNA methylation beta values. It incorporates seven machine learning models, including Random Forest, K-Nearest Neighbors, Support Vector Machine, Linear Discriminant Analysis, Extreme Gradient Boosting, Naive Bayes, and a neural network model specifically designed for the complexities of medulloblastoma data. The package provides streamlined workflows for data preprocessing, feature selection, model training, cross-validation, and prediction. This vignette offers detailed explanations, examples, and resulting outputs for each functionality. The MBMethPred package was tested on an Ubuntu machine equipped with an Intel Core i5-6200U processor and 16GB RAM.
The ReadMethylFile
is a function for reading DNA
methylation beta values files and use them as new data for prediction by
every model. The input for this function should be either CSV or TSV
file format. Please uncomment the following lines and run the
function.
# set.seed(1234)
# fac <- ncol(Data1)
# NewData <- sample(data.frame(t(Data1[,-fac])),10)
# NewData <- cbind(rownames(NewData), NewData)
# colnames(NewData)[1] <- "ID"
# write.csv(NewData, "NewData.csv", quote = FALSE, row.names = FALSE)
# methyl <- ReadMethylFile(File = "NewData.csv")
This function has only one argument, the File. The first column of the File is the CpG methylation probe that starts with cg characters and is followed by a number (e.g., cg100091). Other columns are samples with methylation beta values. All columns in the data frame should have a name.
The BoxPlot
function draws a box plot out of DNA
methylation beta values or other data frames.
The TSNEPlot
function draws a 3D t-SNE plot for DNA
methylation dataset using the K-means clustering technique. This
function has two arguments File
(any matrices) and
NCluster
( number of clusters for K-Means clustering).
Using ReadSNFData
function, one can read files (any
matrices with CSV or TSV format) and feed them into the similarity
network fusion (SNF) function (from the SNFtools package). Please
uncomment the following lines and run the function.
The SimilarityNetworkFusion
is a function to perform SNF
function (from SNFtool package) and output clusters.
data(RLabels) # Real labels
data(Data2) # Methylation
data(Data3) # Gene expression
snf <- SimilarityNetworkFusion(Files = list(Data2, Data3),
NNeighbors = 13,
Sigma = 0.75,
NClusters = 4,
CLabels = c("Group4", "SHH", "WNT", "Group3"),
RLabels = RLabels,
Niterations = 60)
snf
#> [1] SHH Group3 Group4 Group4 Group4 SHH SHH Group3 Group4 SHH
#> [11] WNT SHH SHH WNT SHH WNT Group3 Group3 Group3 Group4
#> [21] Group4 Group3 Group3 Group3 Group4 Group4 Group4 Group3 Group3 SHH
#> [31] SHH SHH SHH SHH Group4 Group3 SHH Group4 Group4 Group3
#> [41] Group4 Group4 WNT Group3 Group4 Group4 Group4 Group4 SHH Group4
#> Levels: Group4 SHH WNT Group3
This function has several arguments as follow:
Files
A list of data frames created using the
ReadSNFData function.NNeighbors
The number of nearest neighbors.Sigma
The variance for local model.NClusters
The number of clusters.CLabels
A string vector to name the clusters.
Optional.RLabels
The actual label of samples to calculate the
Normalized Mutual Information (NMI) score. Optional.Niterations
The number of iterations for the diffusion
process.The SupportVectorMachineModel
is a function to train a
support vector machine model to classify medulloblastoma subgroups using
DNA methylation beta values (Illumina Infinium HumanMethylation450).
Prediction is followed by training if new data is provided.
Model metrics, including accuracy, precision, sensitivity F1-Score,
specificity, and AUC_average can be calculated for the test dataset
using the ModelMetrics
function, which calculates the
average of the above parameters from the result of the
ConfusionMatrix
function.
The prediction result on new data can be accessed through the
NewDataPredictionResult
function, which calculates every
prediction’s mode across the number of cross-validation folds.
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
svm <- SupportVectorMachineModel(SplitRatio = 0.8,
CV = 10,
NCores = 1,
NewData = NewData)
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 41 0 0
#> WNT 1 0 37 0
#> Group4 1 0 0 59
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 28 0 0 0
#> SHH 0 42 0 0
#> WNT 1 0 36 0
#> Group4 1 0 0 59
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 27 0 0 0
#> SHH 0 43 0 0
#> WNT 1 0 35 0
#> Group4 1 0 0 60
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 39 0 0
#> WNT 0 0 38 0
#> Group4 1 0 0 57
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 42 0 0
#> WNT 1 0 37 0
#> Group4 1 0 0 56
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 27 0 0 0
#> SHH 0 40 0 0
#> WNT 1 0 37 0
#> Group4 2 0 0 52
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 24 0 0 0
#> SHH 0 40 0 0
#> WNT 1 0 38 0
#> Group4 1 0 0 55
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 25 0 0 0
#> SHH 0 42 0 0
#> WNT 1 0 34 0
#> Group4 0 0 0 61
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 25 0 0 1
#> SHH 0 43 0 0
#> WNT 1 0 39 0
#> Group4 1 0 0 59
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 42 0 0
#> WNT 1 0 38 0
#> Group4 1 0 0 57
ModelMetrics(Model = svm)
#> $ConfusionMatrix
#> y_pred
#> Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 41 0 0
#> WNT 1 0 37 0
#> Group4 1 0 0 59
#>
#> $ModelPerformance
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.988 0.932 0.996 0.963 0.986 0.985
#> SHH 1.000 1.000 1.000 1.000 1.000 0.985
#> WNT 0.995 1.000 0.976 0.988 1.000 0.985
#> Group4 0.993 0.998 0.983 0.990 0.999 0.985
NewDataPredictionResult(Model = svm)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
This function has the following arguments:
SplitRatio
Train and test split ratio. A value greater
or equal to zero and less than one.CV
The number of folds for cross-validation. It should
be greater than one.NCores
The number of cores for parallel computing.NewData
A methylation beta values input from the
ReadMethylFile function.The KNearestNeighborModel
is a function to train a K
nearest neighbor model to classify medulloblastoma subgroups using DNA
methylation beta values.
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
knn <- KNearestNeighborModel(SplitRatio = 0.8,
CV = 10,
K = 3,
NCores = 1,
NewData = NewData)
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 25 0 0 1
#> SHH 0 41 0 0
#> WNT 0 0 38 0
#> Group4 1 0 0 59
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 27 0 0 1
#> SHH 0 42 0 0
#> WNT 0 0 37 0
#> Group4 1 0 0 59
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 1
#> SHH 0 43 0 0
#> WNT 0 0 36 0
#> Group4 0 0 0 61
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 38 1 0
#> WNT 0 0 38 0
#> Group4 0 0 0 58
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 25 0 0 1
#> SHH 0 42 0 0
#> WNT 0 0 38 0
#> Group4 0 0 0 57
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 27 0 0 0
#> SHH 0 40 0 0
#> WNT 0 0 38 0
#> Group4 1 0 0 53
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 23 0 0 1
#> SHH 0 40 0 0
#> WNT 0 0 39 0
#> Group4 1 0 0 55
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 24 0 0 1
#> SHH 0 42 0 0
#> WNT 0 0 35 0
#> Group4 1 0 0 60
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 43 0 0
#> WNT 0 0 40 0
#> Group4 0 0 0 60
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 25 0 0 1
#> SHH 0 42 0 0
#> WNT 0 0 39 0
#> Group4 0 0 0 58
ModelMetrics(Model = knn)
#> $ConfusionMatrix
#> knnclass_pred
#> Group3 SHH WNT Group4
#> Group3 25 0 0 1
#> SHH 0 41 0 0
#> WNT 0 0 38 0
#> Group4 1 0 0 59
#>
#> $ModelPerformance
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.993 0.981 0.973 0.977 0.996 0.985
#> SHH 0.999 1.000 0.997 0.999 1.000 0.985
#> WNT 0.999 0.997 1.000 0.999 0.999 0.985
#> Group4 0.993 0.988 0.991 0.990 0.993 0.985
NewDataPredictionResult(Model = knn)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
This function has the following arguments:
SplitRatio
Train and test split ratio. A value greater
or equal to zero and less than one.CV
The number of folds for cross-validation. It should
be greater than one.K
The number of nearest neighbors.NCores
The number of cores for parallel computing.NewData
A methylation beta values input from the
ReadMethylFile function.The RandomForestModel
is a function to train a random
forest model to classify medulloblastoma subgroups using DNA methylation
beta values.
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
rf <- RandomForestModel(SplitRatio = 0.8,
CV = 10,
NTree = 100,
NCores = 1,
NewData = NewData)
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 41 0 0
#> WNT 0 0 38 0
#> Group4 0 0 0 60
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 28 0 0 0
#> SHH 0 42 0 0
#> WNT 0 0 37 0
#> Group4 0 0 0 60
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 27 0 0 0
#> SHH 0 43 0 0
#> WNT 0 0 36 0
#> Group4 0 0 0 61
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 39 0 0
#> WNT 0 0 38 0
#> Group4 0 0 0 58
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 42 0 0
#> WNT 0 0 38 0
#> Group4 0 0 0 57
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 1
#> SHH 0 40 0 0
#> WNT 0 0 38 0
#> Group4 0 0 0 54
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 24 0 0 0
#> SHH 0 40 0 0
#> WNT 0 0 39 0
#> Group4 0 0 0 56
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 25 0 0 0
#> SHH 0 42 0 0
#> WNT 0 0 35 0
#> Group4 0 0 0 61
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 43 0 0
#> WNT 0 0 40 0
#> Group4 0 0 0 60
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 42 0 0
#> WNT 0 0 39 0
#> Group4 0 0 0 58
ModelMetrics(Model = rf)
#> $ConfusionMatrix
#> y_pred
#> Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 41 0 0
#> WNT 0 0 38 0
#> Group4 0 0 0 60
#>
#> $ModelPerformance
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.999 1.000 0.996 0.998 1.000 0.998
#> SHH 1.000 1.000 1.000 1.000 1.000 0.998
#> WNT 1.000 1.000 1.000 1.000 1.000 0.998
#> Group4 0.999 0.998 1.000 0.999 0.999 0.998
NewDataPredictionResult(Model = rf)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
This function has the following arguments:
SplitRatio
Train and test split ratio. A value greater
or equal to zero and less than one.CV
The number of folds for cross-validation. It should
be greater than one.NTree
The number of trees to be grown.NCores
The number of cores for parallel computing.NewData
A methylation beta values input from the
ReadMethylFile function.The XGBoostModel
is a A function to train an XGBoost
model to classify medulloblastoma subgroups using DNA methylation beta
values.
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
xgboost <- XGBoostModel(SplitRatio = 0.8,
CV = 10,
NCores = 1,
NewData = NewData)
#> [1] train-mlogloss:0.390594
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.177861
#> [3] train-mlogloss:0.087035
#> [4] train-mlogloss:0.043112
#> [5] train-mlogloss:0.022536
#> [6] train-mlogloss:0.012486
#> [7] train-mlogloss:0.007278
#> [8] train-mlogloss:0.004395
#> [9] train-mlogloss:0.002879
#> [10] train-mlogloss:0.002457
#> y_pred
#> y_true 0 1 2 3
#> 0 24 1 0 1
#> 1 2 58 0 0
#> 2 0 0 41 0
#> 3 3 0 0 35
#>
#> [1] train-mlogloss:0.388419
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.177664
#> [3] train-mlogloss:0.085746
#> [4] train-mlogloss:0.043333
#> [5] train-mlogloss:0.022637
#> [6] train-mlogloss:0.012444
#> [7] train-mlogloss:0.007140
#> [8] train-mlogloss:0.004413
#> [9] train-mlogloss:0.002823
#> [10] train-mlogloss:0.002431
#> y_pred
#> y_true 0 1 2 3
#> 0 28 0 0 0
#> 1 0 60 0 0
#> 2 0 1 41 0
#> 3 3 0 0 34
#>
#> [1] train-mlogloss:0.388072
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.176992
#> [3] train-mlogloss:0.085394
#> [4] train-mlogloss:0.043119
#> [5] train-mlogloss:0.022323
#> [6] train-mlogloss:0.012245
#> [7] train-mlogloss:0.006953
#> [8] train-mlogloss:0.004304
#> [9] train-mlogloss:0.002808
#> [10] train-mlogloss:0.002544
#> y_pred
#> y_true 0 1 2 3
#> 0 27 0 0 0
#> 1 0 61 0 0
#> 2 0 1 42 0
#> 3 3 0 0 33
#>
#> [1] train-mlogloss:0.386945
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.175823
#> [3] train-mlogloss:0.085418
#> [4] train-mlogloss:0.042969
#> [5] train-mlogloss:0.022146
#> [6] train-mlogloss:0.012049
#> [7] train-mlogloss:0.006975
#> [8] train-mlogloss:0.004246
#> [9] train-mlogloss:0.002766
#> [10] train-mlogloss:0.002319
#> y_pred
#> y_true 0 1 2 3
#> 0 25 1 0 0
#> 1 0 58 0 0
#> 2 0 0 39 0
#> 3 1 0 0 37
#>
#> [1] train-mlogloss:0.387957
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.177210
#> [3] train-mlogloss:0.085601
#> [4] train-mlogloss:0.043317
#> [5] train-mlogloss:0.022903
#> [6] train-mlogloss:0.012530
#> [7] train-mlogloss:0.007282
#> [8] train-mlogloss:0.004478
#> [9] train-mlogloss:0.002934
#> [10] train-mlogloss:0.002514
#> y_pred
#> y_true 0 1 2 3
#> 0 26 0 0 0
#> 1 2 55 0 0
#> 2 0 0 42 0
#> 3 2 0 0 36
#>
#> [1] train-mlogloss:0.390082
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.177320
#> [3] train-mlogloss:0.085780
#> [4] train-mlogloss:0.043300
#> [5] train-mlogloss:0.022592
#> [6] train-mlogloss:0.012513
#> [7] train-mlogloss:0.007264
#> [8] train-mlogloss:0.004434
#> [9] train-mlogloss:0.002923
#> [10] train-mlogloss:0.002552
#> y_pred
#> y_true 0 1 2 3
#> 0 27 0 0 0
#> 1 1 53 0 0
#> 2 0 0 39 1
#> 3 3 0 0 35
#>
#> [1] train-mlogloss:0.391327
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.178573
#> [3] train-mlogloss:0.086585
#> [4] train-mlogloss:0.043456
#> [5] train-mlogloss:0.022623
#> [6] train-mlogloss:0.012235
#> [7] train-mlogloss:0.007101
#> [8] train-mlogloss:0.004310
#> [9] train-mlogloss:0.002876
#> [10] train-mlogloss:0.002484
#> y_pred
#> y_true 0 1 2 3
#> 0 24 0 0 0
#> 1 0 56 0 0
#> 2 0 0 40 0
#> 3 3 0 0 36
#>
#> [1] train-mlogloss:0.387270
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.176343
#> [3] train-mlogloss:0.085810
#> [4] train-mlogloss:0.042979
#> [5] train-mlogloss:0.022061
#> [6] train-mlogloss:0.011726
#> [7] train-mlogloss:0.006691
#> [8] train-mlogloss:0.004024
#> [9] train-mlogloss:0.002633
#> [10] train-mlogloss:0.002421
#> y_pred
#> y_true 0 1 2 3
#> 0 25 0 0 0
#> 1 0 61 0 0
#> 2 0 0 41 1
#> 3 2 0 0 33
#>
#> [1] train-mlogloss:0.385785
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.175053
#> [3] train-mlogloss:0.084670
#> [4] train-mlogloss:0.042764
#> [5] train-mlogloss:0.022229
#> [6] train-mlogloss:0.011936
#> [7] train-mlogloss:0.006921
#> [8] train-mlogloss:0.004281
#> [9] train-mlogloss:0.002846
#> [10] train-mlogloss:0.002425
#> y_pred
#> y_true 0 1 2 3
#> 0 25 1 0 0
#> 1 0 60 0 0
#> 2 0 0 43 0
#> 3 2 0 0 38
#>
#> [1] train-mlogloss:0.388743
#> Will train until train_mlogloss hasn't improved in 10 rounds.
#>
#> [2] train-mlogloss:0.176686
#> [3] train-mlogloss:0.086097
#> [4] train-mlogloss:0.043017
#> [5] train-mlogloss:0.022624
#> [6] train-mlogloss:0.012143
#> [7] train-mlogloss:0.007023
#> [8] train-mlogloss:0.004200
#> [9] train-mlogloss:0.002729
#> [10] train-mlogloss:0.002506
#> y_pred
#> y_true 0 1 2 3
#> 0 25 1 0 0
#> 1 0 58 0 0
#> 2 0 0 42 0
#> 3 3 0 0 36
ModelMetrics(Model = xgboost)
#> $ConfusionMatrix
#> y_pred
#> y_truth Group3 Group4 SHH WNT
#> Group3 24 1 0 1
#> Group4 2 58 0 0
#> SHH 0 0 41 0
#> WNT 3 0 0 35
#>
#> $ModelPerformance
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.979 0.896 0.981 0.936 0.978 0.968
#> Group4 0.993 0.990 0.991 0.991 0.994 0.968
#> SHH 0.998 1.000 0.990 0.995 1.000 0.968
#> WNT 0.983 0.992 0.934 0.962 0.998 0.968
NewDataPredictionResult(Model = xgboost)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
This function has the following arguments:
SplitRatio
Train and test split ratio. A value greater
or equal to zero and less than one.CV
The number of folds for cross-validation. It should
be greater than one.NCores
The number of cores for parallel computing.NewData
A methylation beta values input from the
ReadMethylFile function.The LinearDiscriminantAnalysisModel
is a function to
train a linear discriminant analysis model to classify medulloblastoma
subgroups using DNA methylation beta values.
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
lda <- LinearDiscriminantAnalysisModel(SplitRatio = 0.8,
CV = 10,
NCores = 1,
NewData = NewData)
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 22 1 0 3
#> SHH 0 41 0 0
#> WNT 0 0 38 0
#> Group4 4 1 0 55
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 2
#> SHH 0 42 0 0
#> WNT 1 0 36 0
#> Group4 6 0 0 54
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 22 0 1 4
#> SHH 0 43 0 0
#> WNT 1 0 35 0
#> Group4 7 0 1 53
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 24 0 1 1
#> SHH 0 38 0 1
#> WNT 0 0 38 0
#> Group4 9 0 0 49
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 25 0 0 1
#> SHH 0 42 0 0
#> WNT 0 0 37 1
#> Group4 2 0 0 55
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 24 0 0 3
#> SHH 1 38 0 1
#> WNT 0 0 38 0
#> Group4 3 0 0 51
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 22 0 0 2
#> SHH 1 39 0 0
#> WNT 1 0 38 0
#> Group4 6 0 0 50
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 23 0 0 2
#> SHH 0 41 0 1
#> WNT 1 0 34 0
#> Group4 11 0 1 49
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 22 1 0 3
#> SHH 0 42 0 1
#> WNT 1 0 39 0
#> Group4 6 0 0 54
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 22 1 1 2
#> SHH 0 42 0 0
#> WNT 1 0 38 0
#> Group4 6 0 0 52
ModelMetrics(Model = lda)
#> $ConfusionMatrix
#> y_pred
#> Group3 SHH WNT Group4
#> Group3 22 1 0 3
#> SHH 0 41 0 0
#> WNT 0 0 38 0
#> Group4 4 1 0 55
#>
#> $ModelPerformance
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.941 0.778 0.889 0.828 0.951 0.91
#> SHH 0.994 0.991 0.985 0.988 0.997 0.91
#> WNT 0.993 0.986 0.981 0.984 0.996 0.91
#> Group4 0.945 0.949 0.893 0.920 0.973 0.91
NewDataPredictionResult(Model = lda)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
This function has the following arguments:
SplitRatio
Train and test split ratio. A value greater
or equal to zero and less than one.CV
The number of folds for cross-validation. It should
be greater than one.NCores
The number of cores for parallel computing.NewData
A methylation beta values input from the
ReadMethylFile function.The NaiveBayesModel
is a function to train a Naive Bayes
model to classify medulloblastoma subgroups using DNA methylation beta
values.
set.seed(1234)
fac <- ncol(Data1)
NewData <- sample(data.frame(t(Data1[,-fac])),10)
NewData <- cbind(rownames(NewData), NewData)
colnames(NewData)[1] <- "ID"
nb <- NaiveBayesModel(SplitRatio = 0.8,
CV = 10,
Threshold = 0.8,
NCores = 1,
NewData = NewData)
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 41 0 0
#> WNT 3 0 35 0
#> Group4 1 0 0 59
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 28 0 0 0
#> SHH 0 42 0 0
#> WNT 3 0 34 0
#> Group4 1 0 0 59
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 27 0 0 0
#> SHH 0 43 0 0
#> WNT 3 0 33 0
#> Group4 2 0 0 59
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 39 0 0
#> WNT 1 0 37 0
#> Group4 2 0 0 56
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 42 0 0
#> WNT 2 0 36 0
#> Group4 2 0 0 55
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 27 0 0 0
#> SHH 0 40 0 0
#> WNT 3 0 35 0
#> Group4 2 0 0 52
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 24 0 0 0
#> SHH 0 40 0 0
#> WNT 3 0 36 0
#> Group4 2 0 0 54
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 25 0 0 0
#> SHH 0 42 0 0
#> WNT 3 0 32 0
#> Group4 1 0 0 60
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 43 0 0
#> WNT 3 0 37 0
#> Group4 2 0 0 58
#>
#> y_pred
#> y_true Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 42 0 0
#> WNT 3 0 36 0
#> Group4 1 0 0 57
ModelMetrics(Model = nb)
#> $ConfusionMatrix
#> y_pred
#> Group3 SHH WNT Group4
#> Group3 26 0 0 0
#> SHH 0 41 0 0
#> WNT 3 0 35 0
#> Group4 1 0 0 59
#>
#> $ModelPerformance
#> Accuracy Precision Sensitivity F1_Score Specificity AUC_average
#> Group3 0.974 0.859 1.000 0.924 0.969 0.971
#> SHH 1.000 1.000 1.000 1.000 1.000 0.971
#> WNT 0.984 1.000 0.928 0.963 1.000 0.971
#> Group4 0.990 1.000 0.972 0.986 1.000 0.971
NewDataPredictionResult(Model = nb)
#> Subgroup
#> GSM2261711 Group3
#> X78 WNT
#> GSM2261640 Group4
#> GSM2261575 Group4
#> X135 WNT
#> GSM2262184 Group3
#> GSM2261613 Group3
#> X130 WNT
#> GSM2261922 Group4
#> GSM2261980 Group3
This function has the following arguments:
SplitRatio
Train and test split ratio. A value greater
or equal to zero and less than one.CV
The number of folds for cross-validation. It should
be greater than one.Threshold
The threshold for deciding class probability.
A value greater or equal to zero and less than one.NCores
The number of cores for parallel computing.NewData
A methylation beta values input from the
ReadMethylFile function.The NeuralNetworkModel
is a function to train an
artificial neural network model to classify medulloblastoma subgroups
using DNA methylation beta values. Please uncomment the following lines
and run the function. If it is the first time you run this function, set
the InstallTensorFlow parameter to TRUE. It will automatically install
the Python and TensorFlow library (version 2.10-cpu) in a virtual
environment then set the parameter to FALSE.
# set.seed(1234)
# fac <- ncol(Data1)
# NewData <- sample(data.frame(t(Data1[,-fac])),10)
# NewData <- cbind(rownames(NewData), NewData)
# colnames(NewData)[1] <- "ID"
# ann <- NeuralNetworkModel(Epochs = 100,
# NewData = NewData,
# InstallTensorFlow = TRUE)
# ModelMetrics(Model = ann)
# NewDataPredictionResult(Model = ann)
This function has the following arguments:
Epochs
The number of epochs.NewData
A methylation beta values input from the
ReadMethylFile function.InstallTensorFlow
Logical. Running this function for
the first time, you need to install TensorFlow library (V 2.10-cpu).
Default is TRUE.