### Welcome to the Variable Selection Accelerator!

### Business Needs for a Variable Selection Accelerator:

1.) We want to explain the data in the simplest way, therefore, redundant
predictors should be removed.

2.) Unnecessary predictors will add noise to the estimation of other quantities that we are interested in. Thereby not giving us an accurate estimation.

3.) If the model is to be used for prediction, we can save time and/or money by not measuring redundant predictors.

2.) Unnecessary predictors will add noise to the estimation of other quantities that we are interested in. Thereby not giving us an accurate estimation.

3.) If the model is to be used for prediction, we can save time and/or money by not measuring redundant predictors.

### Test the App

To test the app, you can simply

After which you can follow instructions from the next box.

**select a pre-loaded dataset.**After which you can follow instructions from the next box.

### How to use the Variable Selection Accelerator:

1.) Start by

2.)

3.) Then open the `Variable Selection` Tab and

4.) The algorithms will run on the uploaded dataset and provide the variables selected in a summary.

5.) You may

**uploading your csv file**in the Side Menu Panel. Note that if your data contains missing values, the rows containing said missing values will be omitted.2.)

**Select your Independent Variables**from the Summary of your data under the `Data Summary` Tab.3.) Then open the `Variable Selection` Tab and

**select your target variable**(or independent variable).4.) The algorithms will run on the uploaded dataset and provide the variables selected in a summary.

5.) You may

**change the controls**of the algorithms from the Side Menu Panel, and also choose which results are visible to you.### Summary of Data

### Select your independent variables from the table

### Numerical Data

Loading...

### Categorical Data

Loading...

### Correlation Matrix for selected numeric variables

### Correlation Matrix

### Correlation Plot

### Correlation Plot

You haven't selected sufficient number of numerical variables for a correlation matrix. Remember that a correlation matrix requires at least 2 variables.

### First 5 lines of data

#### Select a Target Variable to begin

### Comparison of Selected Algorithms of Classification

Loading...

### Comparison of Selected Algorithms of Regression

Loading...

### Step-wise Regression - p Value

### Algorithm Results

Loading...

You can find out more at: NCSS Statistical Software Step-wise Regression write-up

**The lower the p-value the higher the variable's rank.**

### p-value vs Attribute plot

Loading...

### Backward Step-wise Regression - p Value

### Algorithm Results

Loading...

The variable with the highest p-value above the critical alpha value is eliminated from the model during each iteration.

You can find out more at: NCSS Statistical Software Backward Step-wise Regression write-up

**The lower the p-value the higher the variable's rank.**

### p-value vs Attribute plot

Loading...

### Forward Step-wise Regression - p Value

### Algorithm Results

Loading...

The variable with the lowest p-value lesser than the critical alpha value is selected to be in the model during each iteration.

You can find out more at: NCSS Statistical Software Forward Step-wise Regression write-up

**The lower the p-value the higher the variable's rank.**

### p-value vs Attribute plot

Loading...

### Step-wise Regression - AIC

### Algorithm Results

Loading...

AIC captures the trade-off between goodness of fit and complexity of a model.

You can find out more about AIC at: University of Wisconsin - Explanation on AIC

**The lower the AIC value the higher the variable's ranking**

### p-value vs Attribute plot

Loading...

### Backward Step-wise Regression - AIC

### Algorithm Results

Loading...

AIC captures the trade-off between goodness of fit and complexity of a model.

You can find out more about AIC at: University of Wisconsin - Explanation on AIC

**The lower the AIC value the higher the variable's ranking**

### p-value vs Attribute plot

Loading...

### Forward Step-wise Regression - AIC

### Algorithm Results

Loading...

AIC captures the trade-off between goodness of fit and complexity of a model.

You can find out more about AIC at: University of Wisconsin - Explanation on AIC

**The lower the AIC value the higher the variable's ranking**

### p-value vs Attribute plot

Loading...

### cForest

### Algorithm Results

Loading...

cForest follows the principle of 'mean decrease in accuracy' importance.

**The higher the drop in the mean decrease in accuracy the more significant is the variable.**

### Importance vs Attribute plot

Loading...

### Random Forest for Regression

### Algorithm Results

Loading...

For Regression, the random Forest importance is based on the 'mean decrease in node impurity' principle which is measured by the residual sum of squares.

You can find out more about random forests at: A comprehensive write-up about random forests

**The higher the node impurity, the more significant the variable.**

### Importance vs Attribute plot

Loading...

### Lasso Procedure

### Algorithm Results

Loading...

Lasso is a continuous subset selection algorithm, that can 'shrink' the effect of unimportant predictors, and thereby can set effects to zero.

Use lambda.1se when you want to select lambda within 1 standard error of the best model.

Use lambda.min when you want to select the lambda with minimum mean cross-validated error.

You can find out more about the Lasso Procedure at: A comprehensive write-up about the lasso net procedure

**Lambda is the weight given to the regularization term (the L1 norm), so as lambda approaches zero, the loss function of your model approaches the OLS loss function. As you increase the L1 norm, variables will enter the model as their coefficients take non-zero values.**

### Coefficients vs. L1 Norm Plot

Loading...

### Elastic Net Procedure

### Algorithm Results

Loading...

Like lasso, elastic net can generate reduced models by generating zero-valued coefficients.

Empirical studies have suggested that the elastic net technique can outperform lasso on data with highly correlated predictors

Use lambda.1se when you want to select lambda within 1 standard error of the best model. Recommended for Elastic net.

Use lambda.min when you want to select the lambda with minimum mean cross-validated error.

You can find out more about the Elastic Net Procedure and its difference from Lasso Procedure at: Stanford University - Slides explaining the elastic net and lasso procedure

**Lambda is the weight given to the regularization term (the L1 norm), so as lambda approaches zero, the loss function of your model approaches the OLS loss function. As you increase the L1 norm, variables will enter the model as their coefficients take non-zero values.**

### Coefficients vs. L1 Norm Plot

Loading...

### Boruta Algorithm

### Algorithm Results

Loading...

As boruta is based on random forest, it too uses the 'mean decrease gini' to calculate the importance of each variable.

You can find out more about the Boruta Algorithm at: A comprehensive write-up about the boruta algorithm

**The higher the importance the more significant the variable.**

### Importance vs. Attributes Norm Plot

Loading...

### Random Forest for Classification

### Algorithm Results

Loading...

For Classification, the random Forest importance is based on the 'mean decrease in node impurity' which is measured by the Gini Index

You can find out more about random forests at: A comprehensive write-up about random forests

**The higher the node impurity or Gini Index, the more significant the variable.**

### Importance vs. Attributes Norm Plot

Loading...