When working with machine learning, and especially when you start learning machine learing one of the first things you will encounter is supervised machine learning and writing decision tree based classifiers. A supervised classifier which will leverage a decission tree to classify an object into a group will base itself on provided and already labled data.
The data will (in most cases) consist out of a number of features all describing labled objects in your training data. As an example, provided by Google, we could have a set of objects all havign the label apple or orange. In our case we have mapped apple to the numeric value 0 and orange to the numeric value 1.
The features in oour dataset are the weight of the object (it being an apple or an orange) and the type of skin. The skin of the object is either smooth (like that of an apple) or bumpy (like that of an orange). We have mapped bumpy to the value 0 and smooth to the value 1.
The below code showcases the implementation and also a prediction for a new object compared to the data we have in the learning data.
from sklearn import tree features = [[140, 1], [130, 1], [150, 0], [170, 0]] labels = [0, 0, 1, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(features, labels) print clf.predict([[150, 0]])
As you can see in the above example we predict what the object with [150, 0] will be, will it be an apple or an orange. The above example is just a couple of lines and is already a first example of a simple machine learning implementation in Python. The reason it will only take this limited amount of lines is that we can leverage all the work already done by the developers of scikit-learn.
You can find the above code example and more examples on my Github project page.