Decision Tree in Machine Learning

5 min readMay 9, 2020

Data Science: Decision Tree in machine learning.

What is a Decision tree?

Generally, we can define decision tree as a decision supporting tool that uses a tree-like model of decisions with regards to the possibility of consequences, including resource cost, utility and outcomes. It is one of the many ways to visualize algorithm that are based on conditions and statements.

Machine Learning with Decision tree?

When dealing with machine learning, we view decision tree as not just a weak learner but also uses a non-parametric supervised learning method which is used for both classification and regression task or analysis. This has a specific goal which is to predict the value of target variable by learning the rules involved in simple decision inferred from data features.

A non-parametric method simply means that the model does not have any assumptions of data distribution of errors or the data itself.

A good example of decision tree is when a bank or financial institution wants to offer someone a credit card, they often follow a sequential list of items to find out the risk involved in doing business with that potential customer, in other words, they want to know if it is safe to offer a credit card to that individual.

Let take a look at the diagram

Like mentioned earlier on, decision tree is mostly a non-parametric machine learning modeling technique for both regression and classification problems. It is also based on hierarchical sequence meaning there are question asked in sequence that determine the classification of the outcome. Basically model based on observed data.

Decision tree machine learning has the ability to enhance our decisions based on factual data. It help us create a logical and mathematical rules to generate selection based on intuition and subjectivity.

Algorithm

When dealing with decision tree, algorithm becomes the next question.

We can define algorithm as the set of rules used to solve a particular problem.

What is decision tree algorithm?

Decision tree algorithm is simply a supervised learning algorithm that can be used for solving regression and classification problems.

Types of decision trees:

Decision trees are typically based on two types:

1. Categorical variable decision tree: which is simply a decision tree that has a categorical variable as the target variable.

2. Continuous variable decision: this is the decision tree that has continuous variable as a target variable.

Decision tree terminologies:

1. Decision Node: when a sub-node splits into more sub-nodes then it is called decision node.

2. Root Node: the root node is the representation of the entire population or samples and can be further divided into more homogeneous sets.

3. Parent and Child Node: any node that is divided into sub-nodes is referred to as a parent node and the sub-nodes are called child of the parent node.

4. Leaf/Terminal Node: any node that does not split any further is called the leaf or terminal node.

5. Branch? Sub-tree: the subsection of the entire tree can be referred to as branch or sub-tree.

6. Splitting: splitting is the process of dividing the node into two or more sub-nodes.

7. Pruning: pruning is simply the removal of sub-nodes of the decision tree, it is practically the opposite of splitting.

Some assumptions about decision tree

These are some of the assumptions we usually make when using decision tree:

1. The training set is considered as the root.

2. Most of the feature values are preferred to be categorical, in case they are continuous, they are discretized before building a decision tree model.

3. Records are distributed recursively based on the attribute’s values.

4. A statistical approach is used to determine the placing of attributes as roots or internal node on the tree.

Normally we have two attribute selection measures:

1. Information gain

2. Gini index

Information Gain

when a node is used in a decision tree to partition the training instances in to smaller subsets, there is a change in entropy. This will lead us to ask what entropy is?

Entropy is the measure of uncertainty of a random variable. So, information gain is the measure of change in entropy.

Let us see how it looks like mathematically:

Suppose S is a set of instances, A is an attribute, Sv is the subset of S with A = v, and Values (A) is the set of all possible values of A, then

Gini Index

Gini index is a metric to measure the frequency of randomly chosen element would be incorrectly identified. The lower the “Gini” the better or the more preferred.

Let us see how it works mathematically;

Advantages of Decision tree:

Decision trees are very easy to interpret because it comes as a result of some set of rules.
It has about the same approach when it comes to the process used by human beings in decision making.

3. It has practically no hyper-parameter tuning.

4. Decision tree can handle both numerical and categorical data.

5. Decision tree is resistant to outliers and can require little data processing.

Disadvantages of Decision Tree:

1. The probability of over-fitting is very high if care is not taking, which will fail to generalize.

2. A categorical variable can give a biased response for attributes with a greater number of categories in the information gain.

3. In general, decision tree has a lower prediction accuracy for a dateset compared to other machine learning algorithms.

4. If there are many class labels, calculation might become complex and time consuming.

Conclusion

There are many applications of decision tree in machine learning that have real life direct impact. This blog is only an introduction to the basics of decision tree, the important terminologies, the instances and basically how decision tree works.

Decision Tree in Machine Learning

Written by Ice Asortse

No responses yet