Creating a Weak Learner with a Decision Stump – Machine Learning in Action

Machine Learning in Action
By Peter Harrington

This article, based on chapter 7 of Machine Learning in Action, shows how to use the Adaboost algorithm to create a decision stump that makes a decision on one feature only.

A decision stump is a very simple decision tree. We are going to create a decision stump that makes a decision on one feature only. It’s a tree with only one split, so it’s a stump.

While we are building the Adaboost code, we are going to work with a really simple data set to make sure we have everything straight. You can create a new file called adaboost.py and add the following code:

def loadSimpData():
    datMat = matrix([[ 1. ,  2.1],
        [ 2. ,  1.1],
        [ 1.3,  1. ],
        [ 1. ,  1. ],
        [ 2. ,  1. ]])
    classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]
  return datMat,classLabels

You can see this data in figure 1. Try choosing one value on one axis that totally separates the circles from the squares. It’s not possible. This is the famous 45 problem that decision trees are notorious for. Adaboost will need to use multiple decision stumps to properly classify this data set. By using multiple decision stumps, we will be able to build a classifier to completely classify the data.

Figure 1 Simple data used to check the Adaboost building functions. It is not possible to choose one threshold on one axis that separates the squares from the circles. Adaboost will need to combine multiple decision stumps to classify this set without an error.

You can load the data set and class labels by typing in:

>>> import adaboost
>>> datMat,classLabels=adaboost.loadSimpData()

Now that we have the data set loaded, we can create a few functions to build our decision stump. Enter the code from listing 1 into adaboost.py and save the file.

Listing 1 Decision stump-generating functions

def stumpClassify(dataMatrix,dimen,threshVal,threshIneq):
    retArray = ones((shape(dataMatrix)[0],1))
    if threshIneq == 'lt':
        retArray[dataMatrix[:,dimen] <= threshVal] = -1.0
    else:
        retArray[dataMatrix[:,dimen] > threshVal] = -1.0
    return retArray
    

def buildStump(dataArr,classLabels,D):
    dataMatrix = mat(dataArr); labelMat = mat(classLabels).T
    m,n = shape(dataMatrix)
    numSteps = 10.0; bestStump = {}; bestClasEst = mat(zeros((m,1)))
    minError = inf     
    for i in range(n):
        rangeMin = dataMatrix[:,i].min(); rangeMax = dataMatrix[:,i].max();
        stepSize = (rangeMax-rangeMin)/numSteps
        for j in range(-1,int(numSteps)+1):
            for inequal in ['lt', 'gt']: 
                threshVal = (rangeMin + float(j) * stepSize)
                predictedVals = stumpClassify(dataMatrix,i,threshVal,inequal)
                errArr = mat(ones((m,1)))
                errArr[predictedVals == labelMat] = 0
                weightedError = D.T*errArr  
                #print "split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % (i, threshVal, inequal, weightedError)
                if weightedError < minError:
                    minError = weightedError
                    bestClasEst = predictedVals.copy()
                    bestStump['dim'] = i
                    bestStump['thresh'] = threshVal
                    bestStump['ineq'] = inequal
    return bestStump,minError,bestClasEst

The code in listing 1 contains two functions. The first function, stumpClassify(), simply performs a threshold comparison to classify data. Everything on one side of the threshold is thrown into class -1, and everything on the other side is thrown into class +1. This is done using array filtering by first setting the return array to all 1s, then setting values that do not meet the inequality to -1. We can make this comparison on any feature in the data set, and we can also switch the inequality from greater than to less than. The next function, buildStump(), will iterate over all of the possible inputs to stumpClassify() and find the best decision stump for our data set. Best here will be with respect to the data weight vector D. We’ll see how this is done in a little bit. The function starts out by making sure the data input data is in the proper format for matrix math. Then, it creates an empty dictionary called bestStump, which we will use to store the classifier information corresponding to the best choice of a decision stump given this weight vector D. The variable, numSteps, will be used to iterate over the possible values of the features. We also initialize the variable minError to positive infinity; this variable is used in finding the minimum possible error later. The main portion of the code is three nested for loops. The first one goes over all the features in our data set. We are considering numeric values and we calculate the minimum and maximum to see how large our step size should be. Then, the next for loop loops over these values. It might make sense to set the threshold outside of the extremes of our range, so there are actually two extra steps outside of the range. The last for loop toggles our inequality between greater than and less than. Inside the nested three for loops, we call stumpClassify() with the data set and our three loop variables. stumpClassify() returns its class prediction based on these loop variables. We next create the column vector errArr, which contains a 1 for any value in predictedVals that is not equal to the actual class in labelMat. We multiply these errors by the weights in D and sum the results to give us a single number, weightedError. This is the line where Adaboost interacts with the classifier. We are evaluating our classifier based on the weights D, not on another error measure. If you want to use another classifier, you would need to include this calculation to define the best classifier for D. We next print out all the values. This line can be commented out later, but it is helpful in understanding how this function works. Last, we compare the error to our known minimum error, and if the error is below it, we save this decision stump in our dictionary bestStump. The dictionary, the error, and the class estimates are all returned to the Adaboost algorithm. To see this in action, enter the following in the Python shell:

>>> D = mat(ones((5,1))/5)
>>> adaboost.buildStump(datMat,classLabels,D)
split: dim 0, thresh 0.90, thresh ineqal: lt, the weighted error is 0.400
split: dim 0, thresh 0.90, thresh ineqal: gt, the weighted error is 0.600
split: dim 0, thresh 1.00, thresh ineqal: lt, the weighted error is 0.400
split: dim 0, thresh 1.00, thresh ineqal: gt, the weighted error is 0.600
     .
     .
split: dim 1, thresh 2.10, thresh ineqal: lt, the weighted error is 0.600
split: dim 1, thresh 2.10, thresh ineqal: gt, the weighted error is 0.400
({'dim': 0, 'ineq': 'lt', 'thresh': 1.3}, matrix([[ 0.2]]), array([[-1.],
       [ 1.],
       [-1.],
       [-1.],
       [ 1.]]))

As buildStump iterates over all of the possible values you can see the output, and finally you can see the dictionary returned.

Summary

We built a decision stump classifier, which is a single node decision tree, and applied the Adaboost algorithm. Here are some other Manning titles you might be interested in: