Image for post
Image for post
Photo by Atharva Tulsi on Unsplash

MACHINE LEARNING THEORY

The basic linear regression model

As we start with machine learning, for me, the first model to understand is the least squares. A simple model that is easy to perform and gives a lot of insights about your datasets.

This model assumes that the expected E(Y|X) values of the dependent variable are linear to the inputs X1,…, Xn.

Least Squares

It’s a supervised learning algorithm that takes an input vector, X^T = (X1, X2, …, Xp), and want to predict the output Y. The mathematical expression has the form


Image for post
Image for post
Photo by Matt Ridley on Unsplash

MACHINE LEARNING PROJECT

Structuring Machine Learning Projects

When we build any type of project, there are checks that the project should accomplish. In the case of machine learning projects are nos distinct.

Since now we have been explaining the mathematics(Statistics, Probability, Linear algebra, Calculus) that will allow us to understand how machine learning models work. But machine learning is more than an algorithm, it’s easy to train a model by itself, the difficult part is making it useful!

A basic structure for your machine learning projects


Image for post
Image for post
Photo by Mihály Köles on Unsplash

MACHINE LEARNING THEORY

The origins of Deep Learning and Support Vector Machines

The separating hyperplanes procedure constructs linear decision boundaries that explicitly try to separate the data into different classes as well as possible. With them, we will define the Support Vector Classifier.

Sometimes LDA and logistic regression explained in the previous post make avoidable errors, this can be solved using the following methods.

Rosenblatt's Perceptron

This algorithm is the predecessor of the modern Deep learning advances, it tries to find a separating hyperplane by minimizing the distance of misclassified points to the decision boundary. The objective is to minimize the following function:


Image for post
Image for post
Photo by Tim Trad on Unsplash

MACHINE LEARNING THEORY

Linear methods for classification

As we try to classificate our data into distinct groups, our predictor G(x) takes values in a discrete set ζ and we can divide the input space into a collection of regions labeled according to the classification. With linear methods, we mean that the decision boundaries between our predicted classes are linear.

Linear regression of an Indicator Matrix

All the response categories have an indicator variable. Thus if ζ has K classes, there will be K such indicators Yk: k = 1,…,k with Yk=1 if G=K, else 0, these are called dummy variables. …


Image for post
Image for post
Photo by Ryan Stone on Unsplash

MACHINE LEARNING THEORY

Tuning the basic linear regression model

In previous posts, we introduced least-squares and explained some subset selection techniques. Today we are here to introduce more subset selection and shrinkage models.

Least angle regression(LAR)

Least angle regression is a kind of forward stepwise that only enters as much of a predictor as it deserves. At the first step, it identifies the variable mos correlated with the response.

LAR adjusts the coefficient of a variable until another variable catches up in terms of correlation with the residual. Then the second variable is added and the coefficients are adjusted again. We repeat the process until all the variables are in the model.

LAR Algorithm

  • Standardize the predictions to have mean zero (μ= 0) and unit norm N(0,1). …


Image for post
Image for post
Photo by Sebastian Kanczok on Unsplash

MACHINE LEARNING EXAMPLES

Comparing distinct linear regression subset selection methods Ridge, Lasso, and Enet applied.

In the last two posts, we have been explaining the theory of linear regression and some subset selection techniques. It’s been very math-focused, but now we know all that we need to apply them using python.

To know how this application works, don’t miss the last two posts:

Introducing the python libraries and the dataset

This example will use a dataset widely know between all data scientists and data scientist aspirants. First, we need to import the libraries that we will be using:

import numpy as np
import pandas as pd
from itertools import cycle
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression, RidgeCV, Ridge, LassoCV, ElasticNetCV, lasso_path, enet_path
np.set_printoptions(suppress=True)
rng = np.random.RandomState(seed=42)
pd.options.display.float_format …


Image for post
Image for post
Photo by Greg Rakozy on Unsplash

MACHINE LEARNING THEORY

Tuning the basic linear regression model

In the last post, we explained the most used linear regression machine learning technique, the least-squares. We explained distinct approaches to multiple linear regressions and regressions with multiple outputs.

But we assumed that we use all variables in the regression, today we will explain some techniques to select only a subset of variables. We do that because of two reasons:

Reasons to use a subset selection

Prediction Accuracy

As we explained in the last post, the least-squares model minimizes the bias of the data, but not the variance. Here is where the bias-variance trade-off enters the game. Estimating a model, the expected prediction error at point x is:


Image for post
Image for post
Photo by Matt Duncan on Unsplash

CALCULUS FOR DATA SCIENCE AND MACHINE LEARNING

Formulas to accelerate integration

In this post, we will summarise some of the most useful techniques to calculate integrals. This will allow you to avoid the limit notation, but there will always be some difficult integrals.

Primitive functions

A function F satisfying F’ = f is called a primitive of f. Of course, a continuous function f always has a primitive.


Image for post
Image for post
Photo by Anthony Cantin on Unsplash

CALCULUS FOR DATA SCIENCE AND MACHINE LEARNING

The relation between derivatives and integrals

The first theorem relates derivation with the integration of functions, it is divided between two theorems, let’s explain them.

The first fundamental theorem of calculus

Let f be integrable on [a,b], and define F on [a,b] by


Image for post
Image for post
Photo by Max van den Oetelaar on Unsplash

CALCULUS FOR DATA SCIENCE AND MACHINE LEARNING

The area under the functions

After defining derivatives, we introduce the integrals. Not easy to define, by now we can understand them as the area between the function and the x-axis.

Integral of a bounded region

First, we will define the integrals for bounded regions, assigning the integral of f on [a, b] to the area R(f, a, b). In this example, we use it for an always positive interval, but it is defined for negative and intervals having positive and negative values.

In the next gif, we show how to apply the idea, [a,b] is divided into subintervals, then the minimum(m_i) and maximum(M_i) value of the function for each interval. …

About

Adrià Serra

Data scientst, this account will share my blog post about statistic, probability, machine learning and deep learming. #100daysofML

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store