Welcome to the Variable Selection Accelerator!
Business Needs for a Variable Selection Accelerator:
1.) We want to explain the data in the simplest way, therefore, redundant
predictors should be removed.
2.) Unnecessary predictors will add noise to the estimation of other quantities that we are interested in. Thereby not giving us an accurate estimation.
3.) If the model is to be used for prediction, we can save time and/or money by not measuring redundant predictors.
2.) Unnecessary predictors will add noise to the estimation of other quantities that we are interested in. Thereby not giving us an accurate estimation.
3.) If the model is to be used for prediction, we can save time and/or money by not measuring redundant predictors.
Test the App
To test the app, you can simply
select a pre-loaded dataset.
After which you can follow instructions from the next box.
After which you can follow instructions from the next box.
How to use the Variable Selection Accelerator:
1.) Start by
uploading your csv file
in the Side Menu Panel.
Note that if your data contains missing values, the rows containing said missing
values will be omitted.
2.) Select your Independent Variables from the Summary of your data under the `Data Summary` Tab.
3.) Then open the `Variable Selection` Tab and select your target variable (or independent variable).
4.) The algorithms will run on the uploaded dataset and provide the variables selected in a summary.
5.) You may change the controls of the algorithms from the Side Menu Panel, and also choose which results are visible to you.
2.) Select your Independent Variables from the Summary of your data under the `Data Summary` Tab.
3.) Then open the `Variable Selection` Tab and select your target variable (or independent variable).
4.) The algorithms will run on the uploaded dataset and provide the variables selected in a summary.
5.) You may change the controls of the algorithms from the Side Menu Panel, and also choose which results are visible to you.
Summary of Data
Select your independent variables from the table
Numerical Data
Loading...
Categorical Data
Loading...
Correlation Matrix for selected numeric variables
Correlation Matrix
Correlation Plot
Correlation Plot
You haven't selected sufficient number of numerical variables for a correlation matrix. Remember that a correlation matrix requires at least 2 variables.
First 5 lines of data
Select a Target Variable to begin
Comparison of Selected Algorithms of Classification
Loading...
Comparison of Selected Algorithms of Regression
Loading...
Step-wise Regression - p Value
Algorithm Results
Loading...
You can find out more at: NCSS Statistical Software Step-wise Regression write-up
The lower the p-value the higher the variable's rank.
p-value vs Attribute plot
Loading...
Backward Step-wise Regression - p Value
Algorithm Results
Loading...
The variable with the highest p-value above the critical alpha value is eliminated from the model during each iteration.
You can find out more at: NCSS Statistical Software Backward Step-wise Regression write-up
The lower the p-value the higher the variable's rank.
p-value vs Attribute plot
Loading...
Forward Step-wise Regression - p Value
Algorithm Results
Loading...
The variable with the lowest p-value lesser than the critical alpha value is selected to be in the model during each iteration.
You can find out more at: NCSS Statistical Software Forward Step-wise Regression write-up
The lower the p-value the higher the variable's rank.
p-value vs Attribute plot
Loading...
Step-wise Regression - AIC
Algorithm Results
Loading...
AIC captures the trade-off between goodness of fit and complexity of a model.
You can find out more about AIC at: University of Wisconsin - Explanation on AIC
The lower the AIC value the higher the variable's ranking
p-value vs Attribute plot
Loading...
Backward Step-wise Regression - AIC
Algorithm Results
Loading...
AIC captures the trade-off between goodness of fit and complexity of a model.
You can find out more about AIC at: University of Wisconsin - Explanation on AIC
The lower the AIC value the higher the variable's ranking
p-value vs Attribute plot
Loading...
Forward Step-wise Regression - AIC
Algorithm Results
Loading...
AIC captures the trade-off between goodness of fit and complexity of a model.
You can find out more about AIC at: University of Wisconsin - Explanation on AIC
The lower the AIC value the higher the variable's ranking
p-value vs Attribute plot
Loading...
cForest
Algorithm Results
Loading...
cForest follows the principle of 'mean decrease in accuracy' importance.
The higher the drop in the mean decrease in accuracy the more significant is the variable.
Importance vs Attribute plot
Loading...
Random Forest for Regression
Algorithm Results
Loading...
For Regression, the random Forest importance is based on the 'mean decrease in node impurity' principle which is measured by the residual sum of squares.
You can find out more about random forests at: A comprehensive write-up about random forests
The higher the node impurity, the more significant the variable.
Importance vs Attribute plot
Loading...
Lasso Procedure
Algorithm Results
Loading...
Lasso is a continuous subset selection algorithm, that can 'shrink' the effect of unimportant predictors, and thereby can set effects to zero.
Use lambda.1se when you want to select lambda within 1 standard error of the best model.
Use lambda.min when you want to select the lambda with minimum mean cross-validated error.
You can find out more about the Lasso Procedure at: A comprehensive write-up about the lasso net procedure
Lambda is the weight given to the regularization term (the L1 norm), so as lambda approaches zero, the loss function of your model approaches the OLS loss function. As you increase the L1 norm, variables will enter the model as their coefficients take non-zero values.
Coefficients vs. L1 Norm Plot
Loading...
Elastic Net Procedure
Algorithm Results
Loading...
Like lasso, elastic net can generate reduced models by generating zero-valued coefficients.
Empirical studies have suggested that the elastic net technique can outperform lasso on data with highly correlated predictors
Use lambda.1se when you want to select lambda within 1 standard error of the best model. Recommended for Elastic net.
Use lambda.min when you want to select the lambda with minimum mean cross-validated error.
You can find out more about the Elastic Net Procedure and its difference from Lasso Procedure at: Stanford University - Slides explaining the elastic net and lasso procedure
Lambda is the weight given to the regularization term (the L1 norm), so as lambda approaches zero, the loss function of your model approaches the OLS loss function. As you increase the L1 norm, variables will enter the model as their coefficients take non-zero values.
Coefficients vs. L1 Norm Plot
Loading...
Boruta Algorithm
Algorithm Results
Loading...
As boruta is based on random forest, it too uses the 'mean decrease gini' to calculate the importance of each variable.
You can find out more about the Boruta Algorithm at: A comprehensive write-up about the boruta algorithm
The higher the importance the more significant the variable.
Importance vs. Attributes Norm Plot
Loading...
Random Forest for Classification
Algorithm Results
Loading...
For Classification, the random Forest importance is based on the 'mean decrease in node impurity' which is measured by the Gini Index
You can find out more about random forests at: A comprehensive write-up about random forests
The higher the node impurity or Gini Index, the more significant the variable.
Importance vs. Attributes Norm Plot
Loading...