Learn Machine Learning Algorithms
Introduction
Before we dive into machine learning algorithms, we need to first know what machine learning is!
“Machine learning is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence.” — Wikipedia
Machine Learning, a subset of Artificial Intelligence, has been responsible for a lot of what we experience in technologies today. From voice assistant to email spam filtering, it accesses a lot of previous data and uses it to learn from experience. For a formal definition, machine learning is
In order to create machine learning models, we would implement algorithms to solve problems and figure out which algorithm works best. The main goal of an algorithm is to take an input and perform an action in a specific way. Machine learning can be broken down into two categories: Supervised Learning and Unsupervised Learning.
- Supervised Learning — Machine learns from using labeled training data, also known as ground truth data. The algorithm compares the predicted results to the ground truth data in order to evaluate the accuracy.
- Unsupervised Learning — Compared to supervised, there is no labeled data. The machine learns by finding patterns in the data which can be further grouped into different clusters.
Machine learning algorithms are machines that learn from data and improve from experience so that it can make predictions. In short, you feed the machine some data, learns from the data, and spits out their predictions based on what you are trying to predict. If the goal is to predict the model of a car the algorithm will be fed input features such as weight, acceleration, and speed. Once the algorithm performs their individual task, it will give predictions based on those features. Each algorithm is performed differently and has their own advantages and disadvantages.
Top Machine Learning Algorithms
- Linear Regression (Ordinary Least Squares)
Supervised machine learning algorithm used to predicts for a continuous target variable by finding the best linear relationship between a dependent variable and independent variable(s). There are two types of regression, simple linear regression and multiple linear regression.
- Simple linear regression uses one independent variable to predict for a target variable. The goal is to find the coefficient of the independent variable and the slope of the line.

- Multiple linear regression examines the relationship of many independent variable with the dependent variable. Once we have the effect for each independent variable, we can use that information to predict for the dependent variable. Below the beta coefficients is the effect multiplied by each independent variable.

- In order to calculate the line of best fit, sum of distances between the actual values and predicted value should be at small as possible. The line of best fit with the smallest error would provide the best predictions.
2. Logistic Regression

Supervised machine learning algorithm is a classification problem. It is used to predict the probability for a categorical dependent variable. The dependent variable is a binary variable that is either 0 or 1. Instead of a straight line like linear regression, we fit a S-shaped curve called sigmoid function.
- The y-axis shows the probability between 0 to 1 because the sigmoid function will only show the minimum and maximum values of the probability.
- If the probability result of the sigmoid function is more than 0.5 then the predicted value will become 1, and if less than 0.5 then predicted value will become 0. There is no middle ground.
3. K-Nearest Neighbors (KNN)
Supervised machine learning algorithm is used for both classification and regression tasks, but mainly for classification. KNN is a distance-based classifier using nearby points to make a prediction. In order to predict which class a data point belongs, we take k nearest data points and calculate the distances. The smaller the distance, the more similar they are.

- KNN stores all the training data in the “fit” step where no distances are calculated. In the “predict” step, we show the KNN algorithm new data points and then distances are calculated on every single point in the training set.
- When finding the best value for k, the higher the number the better the predictions. Inversely, too high of a value can increase the error and is computational expensive. There is no right way of finding the best value for k except trial and error.
- Using KNN for smaller datasets works really well, but when using KNN for larger datasets with high dimensionality the time complexity will be exponential.
4. Decision Trees
Supervised machine learning algorithm is built in the form of a tree structure used for classification and regression, but mainly for classification. Decision tree follows a flowchart where the top of the tree structure takes a dataset and is continuously broken down into smaller subsets of data. When we talk about nodes in the following definitions, it can be described as the point where the path splits.

Root Node: Beginning of the tree
Splitting: Divides a node into a multiple decision nodes
Decision Node: Node that is resulted from a split
Leaf Node: Node that can not be split any further
- The process of training a decision tree include presenting a dataset of training examples containing features and a target, train the tree model by making splits for the target using the features until it can not be split any further. Show new test data following the tree structure from top down.
- First step is to choose the root node by finding the maximum information of all the features. In other words, choosing the best feature that gives us the biggest jump to our answer. To find the features that give the maximum information, also known as information gain, we use entropy which is the main concept of the algorithm. Entropy is measured between 0 and 1 and is described as the level of impurity or uncertainty— i.e. node split 70/30 (high entropy), 50/50 (entropy of 1 which is considered high, impure), 0/100 (entropy of 0 which is considered low, pure). The feature with the highest information gain, or entropy, is used as the root node and so on.
- Decision trees are built by recursively evaluating different features by checking to see which feature best split the data and selects the tree with the best accuracy.
5. Random Forests
Random Forest is a supervised regression and classification algorithm that utilizes decision trees. The idea behind random forest is the larger the number of trees, the more accurate the result. Random forest is an ensemble method where it makes use of more than one decision tree to make a prediction, and typically more effective than single decision trees.
The concept of ensemble methods are based on bagging which consist of bootstrap resampling and aggregation. Bootstrap resampling takes samples with replacement and aggregates all samples into a single prediction. The process of training an ensemble through bagging is as follows:
- Grab a sample with replacement from the dataset
- Train a decision tree on the sample
- Have each decision tree in each ensemble generate a prediction
- Aggregate predictions from all decision trees into single prediction

- The advantage of random forests algorithm is that it can handle large data sets due to how it can work with many variables. Also, random forests is more accurate than a single decision tree as more trees are added.