Skip to content

Nelson-Gon/manymodelr

Repository files navigation

manymodelr(Development Version 0.2.2.9000)

Tune and build several Machine Learning models.

CRAN_Status_Badge Build Status Rdoc license TotalDownloads lifecycle

  • Installing the package

  • From CRAN(0.2.2)

install.packages("manymodelr")

  • From GitHub
remotes::install_github("Nelson-Gon/manymodelr")
devtools::install_github("Nelson-Gon/manymodelr")
devtools::install_github("Nelson-Gon/manymodelr",build_vignettes=TRUE) #Builds vignettes

For the current(unstable) developer version, please see develop.

  • Loading the package

library(manymodelr)

Example usage of major functions

  1. multi_model_1
suppressMessages(library(caret))
set.seed(520)
train_set<-createDataPartition(iris$Species,p=0.8,list=FALSE)
valid_set<-iris[-train_set,]
train_set<-iris[train_set,]
ctrl<-trainControl(method="cv",number=5)

 m<-multi_model_1(train_set,"Species",".",c("knn","rpart"),
"Accuracy",ctrl,newdata =valid_set,valid=TRUE)

In the above we have trained and also got predictions(validated) on our data.

Results:

To get the metrics for all our models, we can proceed as follows:

m$Metrics
# A tibble: 1 x 2
    knn rpart
  <dbl> <dbl>
1 0.933 0.967

To obtain the predicted values(validation in this case):

head(m$Predictions)
# A tibble: 6 x 2
  knn    rpart 
  <fct>  <fct> 
1 setosa setosa
2 setosa setosa
3 setosa setosa
4 setosa setosa
5 setosa setosa
6 setosa setosa

One can also get all the corresponding model statistics as follows:

m$modelInfo

  1. modeleR

This provides a convenient way to build linear models, generalised linear models and carry out analysis of variance(currently). Example usage is as shown below:


iris1<-iris[1:60,]
iris2<-iris[60:nrow(iris),]
m1<-modeleR(iris1,Sepal.Length,Petal.Length,
        lm,na.rm=TRUE,iris2)

We can get the predicted values as shown below:

head(m1$Predictions)
 Predicted
60  5.985141
61  5.821972
62  6.107518
63  6.025933
64  6.311478
65  5.862764

  1. get_var_corr

As can probably(hopefully) be guessed from the name, this provides a convenient way to get variable correlations. It enables one to get correlation between one variable and all other variables in the data set if get_all is set to TRUE or with specific variables if get_all is set to FALSE

Sample usage:

corrs <- get_var_corr(mtcars,comparison_var="mpg",
get_all=TRUE)

The result is as follows(default pearson):


head(corrs)

Comparison_Var Other_Var      p_value Correlation    lower_ci
1            mpg       cyl 6.112687e-10  -0.8521620 -0.92576936
2            mpg      disp 9.380327e-10  -0.8475514 -0.92335937
3            mpg        hp 1.787835e-07  -0.7761684 -0.88526861
4            mpg      drat 1.776240e-05   0.6811719  0.43604838
5            mpg        wt 1.293959e-10  -0.8676594 -0.93382641
6            mpg      qsec 1.708199e-02   0.4186840  0.08195487
    upper_ci
1 -0.7163171
2 -0.7081376
3 -0.5860994
4  0.8322010
5 -0.7440872
6  0.6696186


  1. A closely related function is get_var_corr_(note the underscore) that enables finer control over which correlations to obtain with the ability to perform combination wise correlations. To get correlations for mpg and vs "against" cyl and displ, one could do:
head(get_var_corr_(mtcars, comparison_var=c("mpg","vs"), other_var=c("cyl",displ),method="kendall"))

The above gives us(strictly kendall is used for demonstration purposes):


 Comparison_Var Other_Var      p.value Correlation    lower_ci
1            mpg       cyl 6.112687e-10  -0.8521620 -0.92576936
2            mpg      disp 9.380327e-10  -0.8475514 -0.92335937
3            mpg        hp 1.787835e-07  -0.7761684 -0.88526861
4            mpg      drat 1.776240e-05   0.6811719  0.43604838
5            mpg        wt 1.293959e-10  -0.8676594 -0.93382641
6            mpg      qsec 1.708199e-02   0.4186840  0.08195487
    upper_ci
1 -0.7163171
2 -0.7081376
3 -0.5860994
4  0.8322010
5 -0.7440872
6  0.6696186

  1. rowdiff

If one needs to obtain differences between rows, rowdiff is designed to do exactly that.

head(rowdiff(iris,direction="reverse", exclude="non_numeric"))

This gives us the following result:

Sepal.Length Sepal.Width Petal.Length Petal.Width
1           NA          NA           NA          NA
2         -0.2        -0.5          0.0         0.0
3         -0.2         0.2         -0.1         0.0
4         -0.1        -0.1          0.2         0.0
5          0.4         0.5         -0.1         0.0
6          0.4         0.3          0.3         0.2

The NAs can simply be dealt with as necessary. An NA simply serves to show the direction in which the differences were performed. See the documentation for more details.

Space constraints mean that a detailed exploration of the package cannot be made. A more thorough walkthrough is provided in the vignettes that can be opened as shown below:

browseVignettes("manymodelr")

For previous users, please see the NEWS.md file for a list of changes and/or additions. For a complete list of available functions, please use:


help(package="manymodelr")

Thank You and Happy Coding!