### Tuning Random Forest

8 Feb , 2016

Hi there , This blog Post will tell you how to tune R using different techniques, so lets get started:

We Would use the Sonar Dataset. This is the data set used by Gorman and Sejnowski in their study of the classification of sonar signals using a neural network. The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those bounced off a roughly cylindrical rock.

Each pattern is a set of 60 numbers in the range 0.0 to 1.0.

Each number represents the energy within a particular frequency band, integrated over a certain period of time. The integration aperture for higher frequencies occur later in time, since these frequencies are transmitted later during the chirp.

library(randomForest)
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
library(mlbench)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
# Load Dataset
data(Sonar)
dataset <- Sonar
x <- dataset[,1:60]
y <- dataset[,61]

Now lets play around with the algorithm and see if we can tune it to the right parameter

control <- trainControl(method="repeatedcv", number=10, repeats=3)
set.seed(123)
metric <- "Accuracy"
mtry <- sqrt(ncol(x))
tunegrid <- expand.grid(.mtry=mtry)
rf_first <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_first)
## Random Forest
##
## 208 samples
##  60 predictor
##   2 classes: 'M', 'R'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 187, 188, 186, 187, 187, 187, ...
## Resampling results
##
##   Accuracy   Kappa      Accuracy SD  Kappa SD
##   0.8366162  0.6684822  0.07008353   0.1430453
##
## Tuning parameter 'mtry' was held constant at a value of 7.745967
## 

Lets explore the training control parameter a little more before we conclude something:

ctrl <- trainControl(method="repeatedcv", number=10, repeats=4, search="random")
set.seed(123)
mtry <- sqrt(ncol(x))
rf_randomized <- train(Class~., data=dataset, method="rf", metric=metric, tuneLength=12, trControl=ctrl)
print(rf_randomized)
## Random Forest
##
## 208 samples
##  60 predictor
##   2 classes: 'M', 'R'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 4 times)
## Summary of sample sizes: 187, 188, 186, 187, 187, 187, ...
## Resampling results across tuning parameters:
##
##   mtry  Accuracy   Kappa      Accuracy SD  Kappa SD
##    7    0.8379058  0.6708118  0.07658054   0.1562855
##    9    0.8438636  0.6834194  0.07376081   0.1502520
##   10    0.8296266  0.6542165  0.07528650   0.1532870
##   11    0.8342749  0.6639375  0.07278761   0.1487760
##   17    0.8317208  0.6590220  0.07495766   0.1522259
##   31    0.8185660  0.6316219  0.07738559   0.1585961
##   32    0.8137446  0.6220772  0.08179097   0.1676857
##   33    0.8174838  0.6293132  0.08257270   0.1688494
##   38    0.8051623  0.6049023  0.08710847   0.1776217
##   41    0.8137933  0.6220501  0.08877583   0.1811089
##   46    0.8074351  0.6095327  0.09074714   0.1842144
##   52    0.8052273  0.6046432  0.08472759   0.1726686
##
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was mtry = 9.
plot(rf_randomized)

Random Forest package provides a default function that can learn and update the parameters to the most optimized scores.

# Algorithm Tune (tuneRF)
set.seed(123)
bestmtry <- tuneRF(x, y, stepFactor=1.5, improve=1e-5, ntree=500)
## mtry = 7  OOB error = 15.87%
## Searching left ...
## mtry = 5     OOB error = 14.9%
## 0.06060606 1e-05
## mtry = 4     OOB error = 16.35%
## -0.09677419 1e-05
## Searching right ...
## mtry = 10    OOB error = 15.38%
## -0.03225806 1e-05

print(bestmtry)
##        mtry  OOBError
## 4.OOB     4 0.1634615
## 5.OOB     5 0.1490385
## 7.OOB     7 0.1586538
## 10.OOB   10 0.1538462