Development Version 0.2.5.9000

Installing the package
From CRAN(0.2.4)

install.packages("manymodelr")

From GitHub

# development version(more stable)
remotes::install_github("Nelson-Gon/manymodelr")
devtools::install_github("Nelson-Gon/manymodelr")
devtools::install_github("Nelson-Gon/manymodelr",build_vignettes=TRUE) #Builds vignettes

For the current(unstable) developer version, please see develop.

To install the less stable dev version:

devtools::install_github("Nelson-Gon/manymodelr@develop")

For previous users, please see the NEWS.md file for a list of changes and/or additions.

Loading the package


library(manymodelr)

Example usage of major functions

multi_model_1

suppressMessages(library(caret))
set.seed(520)
train_set<-createDataPartition(iris$Species,p=0.8,list=FALSE)
valid_set<-iris[-train_set,]
train_set<-iris[train_set,]
ctrl<-trainControl(method="cv",number=5)

 m<-multi_model_1(train_set,"Species",".",c("knn","rpart"),"Accuracy",ctrl,newdata =valid_set)

In the above we have trained and also got predictions(validated) on our data.

Results:

To get the metrics for all our models, we can proceed as follows:

m$Metrics
# A tibble: 1 x 2
    knn rpart
  <dbl> <dbl>
1     1 0.933

To obtain the predicted values(validation in this case):

head(m$Predictions)
# A tibble: 6 x 2
  knn    rpart 
  <fct>  <fct> 
1 setosa setosa
2 setosa setosa
3 setosa setosa
4 setosa setosa
5 setosa setosa
6 setosa setosa

One can also get all the corresponding model statistics as follows:

m$modelInfo

A related function is multi_model_2 that aims to allow fitting and prediction in the same function. This builds on top of other modeling functions meaning that it can work with any model from any package. For demonstration purposes, only linear models will be shown.

# fit a linear model and get predictions
head(multi_model_2(iris[1:50,],iris[50:99,],"Sepal.Length","Petal.Length","lm"))
 
 Sepal.Length Sepal.Width Petal.Length Petal.Width Species predicted
1          5.1         3.5          1.4         0.2  setosa  4.972378
2          4.9         3.0          1.4         0.2  setosa  6.761943
3          4.7         3.2          1.3         0.2  setosa  6.653485
4          4.6         3.1          1.5         0.2  setosa  6.870402
5          5.0         3.6          1.4         0.2  setosa  6.382339
6          5.4         3.9          1.7         0.4  setosa  6.707714

We can also fit a multilinear model as shown below:

head(multi_model_2(iris[1:50,],iris[50:99,],"Sepal.Length","Petal.Length + Sepal.Width","lm"))
    
 Sepal.Length Sepal.Width Petal.Length Petal.Width Species predicted
1          5.1         3.5          1.4         0.2  setosa  4.902999
2          4.9         3.0          1.4         0.2  setosa  5.771541
3          4.7         3.2          1.3         0.2  setosa  5.714857
4          4.6         3.1          1.5         0.2  setosa  5.761483
5          5.0         3.6          1.4         0.2  setosa  4.972473
6          5.4         3.9          1.7         0.4  setosa  5.476232

As stated, the function can work with any model type and any package. However it uses a few other functions that have only been tested on models built with lm, glm, aov, lme4.

fit_model

What if I just want to fit a model and not predict? fit_model does just that. It provides user friendly syntax and is more readable. Again you can build any model type. Example usage is as shown below:


# Make some data
iris1 <- iris[1:50,]
iris2 <-iris[51:100,]
lm_model <- fit_model(iris1,"Sepal.Width","Sepal.Length", "lm")
aov_model <- fit_model(iris,"Sepal.Width","Species", "aov")

# Multilinear

mlm_model <- fit_model(iris1,"Sepal.Width","Sepal.Length + Petal.Length", "lm")

# glm
glm_model <- fit_model(iris1,"Sepal.Width","Sepal.Length", "glm")

To extract model attributes from the above models, we can use extract_model_info as shown below:

# extract coefficients
extract_model_info(lm_model, "coeffs")

          Estimate Std. Error   t value     Pr(>|t|)
(Intercept)  -0.5694327  0.5217119 -1.091470 2.805148e-01
Sepal.Length  0.7985283  0.1039651  7.680738 6.709843e-10

extract_model_info(lm_model, "p_value")

(Intercept) Sepal.Length 
2.805148e-01 6.709843e-10 

# get predictors
extract_model_info(lm_model,"predictors")

Sepal.Length

# get response variable
extract_model_info(lm_model,"response")
Sepal.Width

# glm
extract_model_info(glm_model, "aic")

[1] 9.800332

To get several values:

extract_model_info(lm_model,c("p_value","coeffs","predictors"))

$p_value
 (Intercept) Sepal.Length 
2.805148e-01 6.709843e-10 

$coeffs
               Estimate Std. Error   t value     Pr(>|t|)
(Intercept)  -0.5694327  0.5217119 -1.091470 2.805148e-01
Sepal.Length  0.7985283  0.1039651  7.680738 6.709843e-10

$predictors
Sepal.Length

To see currently supported model types, please see help(extract_model_info). To request support for a given model, please file an issue at: issues. Exploration of other available values is left to the user.

add_model_residuals and add_model_predictions

To add predictions or residuals to a data set, we can use add_model_predictions and add_model_residuals respectively.


head(add_model_predictions(lm_model, iris1, iris2))

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species predicted
1          5.1         3.5          1.4         0.2  setosa  5.020265
2          4.9         3.0          1.4         0.2  setosa  4.541148
3          4.7         3.2          1.3         0.2  setosa  4.940413
4          4.6         3.1          1.5         0.2  setosa  3.822473
5          5.0         3.6          1.4         0.2  setosa  4.621001
6          5.4         3.9          1.7         0.4  setosa  3.982179


head(add_model_residuals(lm_model, iris1))

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species   residuals
1          5.1         3.5          1.4         0.2  setosa -0.00306166
2          4.9         3.0          1.4         0.2  setosa -0.34335600
3          4.7         3.2          1.3         0.2  setosa  0.01634966
4          4.6         3.1          1.5         0.2  setosa -0.00379751
5          5.0         3.6          1.4         0.2  setosa  0.17679117
6          5.4         3.9          1.7         0.4  setosa  0.15737985

# dplyr compatible
#library(dplyr)
iris1 %>% 
add_model_predictions(model=lm_model, new_data = iris2) %>% 
head()

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species predicted
1          5.1         3.5          1.4         0.2  setosa  5.020265
2          4.9         3.0          1.4         0.2  setosa  4.541148
3          4.7         3.2          1.3         0.2  setosa  4.940413
4          4.6         3.1          1.5         0.2  setosa  3.822473
5          5.0         3.6          1.4         0.2  setosa  4.621001
6          5.4         3.9          1.7         0.4  setosa  3.982179

get_var_corr

As can probably(hopefully) be guessed from the name, this provides a convenient way to get variable correlations. It enables one to get correlation between one variable and all other variables in the data set.

Previously, one would set get_all to TRUE if they wanted to get correlations between all variables. This argument has been dropped in favor of simply supplying an optional other_vars vector if one does not want to get all correlations.

Sample usage:


# get all correlations
corrs <- get_var_corr(mtcars,comparison_var="mpg")

The result is as follows(default pearson):


head(corrs)

    comparison_Var other_Var      p_value  correlation    lower_ci   upper_ci
1            mpg       cyl   6.112687e-10  -0.8521620 -0.92576936 -0.7163171
2            mpg      disp   9.380327e-10  -0.8475514 -0.92335937 -0.7081376
3            mpg        hp   1.787835e-07  -0.7761684 -0.88526861 -0.5860994
4            mpg      drat   1.776240e-05   0.6811719  0.43604838  0.8322010
5            mpg        wt   1.293959e-10  -0.8676594 -0.93382641 -0.7440872
6            mpg      qsec   1.708199e-02   0.4186840  0.08195487  0.6696186

Previously, one would also set drop_columns to TRUE if they wanted to drop factor columns. Now, a user simply provides a character vector specifying which column types(classes) should be dropped. It defaults to c("character","factor").

# purely demonstrative
get_var_corr(iris,"Sepal.Length",other_vars="Petal.Length",drop_columns= c("factor","character"), method="spearman", exact=FALSE)

#  comparison_Var    other_var      p.value   correlation
# 1   Sepal.Length Petal.Length 3.443087e-50   0.8818981

A closely related function is get_var_corr_(note the underscore) that enables finer control over which correlations to obtain with the ability to perform combination wise correlations.

head(get_var_corr_(mtcars, method="spearman", exact=FALSE))

   comparison_var  other_var      p.value correlation
1            mpg       cyl 4.690287e-13  -0.9108013
2            mpg      disp 6.370336e-13  -0.9088824
3            mpg        hp 5.085969e-12  -0.8946646
4            mpg      drat 5.381347e-05   0.6514555
5            mpg        wt 1.487595e-11  -0.8864220
6            mpg      qsec 7.055765e-03   0.4669358

To use only a few columns, we specify a list of columns in subset_cols:


head(get_var_corr_(mtcars, method="spearman", exact=FALSE, subset_cols=list(c("mpg","disp"),  c("wt","drat"))))
  
      comparison_Var  other_var      p.value  correlation
4             mpg      drat     5.381347e-05   0.6514555
5             mpg        wt    1.487595e-11  -0.8864220
21           disp      drat    1.613884e-05  -0.6835921
22           disp        wt    3.346362e-12   0.8977064

To plot the above, one can use plot_corr as shown below:


plot_corr(mtcars,show_which = "corr",
          round_which = "correlation",decimals = 2,
          x="other_var", 
           y="comparison_var",plot_style = "squares"
          ,width = 1.1,
           custom_cols = c("green","blue","red"),colour_by = "correlation")

To show significance instead(ie based on p values), one can set show_which to "signif". The default is show_which="corr" which will display correlations.

plot_corr(mtcars, x="other_var", y="comparison_var",
          plot_style = "squares",show_which = "signif",
          colour_by = "p.value", custom_cols = c("indianred4","skyblue","red"))

You can explore more options via help(plot_corr) or ?plot_corr. Since the function uses ggplot2 backend, one can change themes by adding theme components to the plot.

rowdiff

If one needs to obtain differences between rows, rowdiff is designed to do exactly that.


head(rowdiff(iris,direction="reverse", exclude="factor"))

This gives us the following result:


Sepal.Length Sepal.Width Petal.Length Petal.Width
1           NA          NA           NA          NA
2         -0.2        -0.5          0.0         0.0
3         -0.2         0.2         -0.1         0.0
4         -0.1        -0.1          0.2         0.0
5          0.4         0.5         -0.1         0.0
6          0.4         0.3          0.3         0.2

To replace the calculation induced NAs, we can set na.rm to TRUE and specify na_action(uses na_replace).


# since reverse, first value is replaced with 0.
head(rowdiff(mtcars,direction="reverse", na.rm=TRUE, na_action="value", value=0))

   mpg  cyl disp  hp  drat     wt  qsec vs am gear carb
1  0.0   0    0   0  0.00  0.000  0.00  0  0    0    0
2  0.0   0    0   0  0.00  0.255  0.56  0  0    0    0
3  1.8  -2  -52 -17 -0.05 -0.555  1.59  1  0    0   -3
4 -1.4   2  150  17 -0.77  0.895  0.83  0 -1   -1    0
5 -2.7   2  102  65  0.07  0.225 -2.42 -1  0    0    1
6 -0.6  -2 -135 -70 -0.39  0.020  3.20  1  0    0   -1

na_replace used above works as shown below.


test_data <- data.frame(A=c(1,2,NA,NA), B= c(1,3,4,NA))
# replace NAs with the mean of the non NA values
na_replace(test_data, how="get_mode")
  A B
1 1 1
2 2 3
3 1 4
4 1 1

The above is less useful since one might want to replace values by group. Using na_replace_grouped, one can achieve just that.


test_groups = data.frame(groups=c(1,1,1,2,2,2), values = c(2,NA,2,3,NA,3))

na_replace_grouped(test_groups,group_by="groups",how="get_mode")

 groups values
1      1      2
2      1      2
3      1      2
4      2      3
5      2      3
6      2      3

The use of mean,sd,etc is no longer supported. Use mde instead which is focused on missingness.

Space constraints mean that a detailed exploration of the package cannot be made. A more thorough walkthrough is provided in the vignettes that can be opened as shown below:


browseVignettes("manymodelr")

For a complete list of available functions, please use:

 
help(package="manymodelr")

Please note that the 'manymodelr' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Thank You and Happy Coding!

Name		Name	Last commit message	Last commit date
Latest commit History 370 Commits
.github		.github
R		R
images		images
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_config.yml		_config.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
manymodelr.Rproj		manymodelr.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Development Version 0.2.5.9000

About

Releases 9

Packages

Contributors 2

Languages

License

Nelson-Gon/manymodelr

Folders and files

Latest commit

History

Repository files navigation

Development Version 0.2.5.9000

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 2

Languages

Packages