The post Tutorial on Linear Regression using Gradient Descent appeared first on DPhi.

]]>In statistics, linear regression is a linear approach to modelling the relationship between a dependent variable and one or more independent variables. Let **X** be the independent variable and **Y** be the dependent variable. We will define a linear relationship between these two variables as follows:

The loss is the error in our predicted value of **m** and **c**. Our goal is to minimize this error to obtain the most accurate value of **m** and **c**.

We will use the Mean Squared Error function to calculate the loss. There are three steps in this function:

- Find the difference between the actual y and predicted y value(y = mx + c), for a given x.
- Square this difference.
- Find the mean of the squares for every value in X.

Here yᵢ is the actual value and ȳᵢ is the predicted value. Lets substitute the value of ȳᵢ:

So we square the error and find the mean. Hence, the name is Mean Squared Error. Now that we have defined the loss function, let’s get into the interesting part — minimizing it and finding **m** and **c.**

Gradient descent is an iterative optimization algorithm to find the minimum of a function. Here that function is our Loss Function.

**Understanding Gradient Descent**

Imagine a valley and a person with no sense of direction who wants to get to the bottom of the valley. He goes down the slope and takes large steps when the slope is steep and small steps when the slope is less steep. He decides his next position based on his current position and stops when he gets to the bottom of the valley which was his goal.

Let’s try applying gradient descent to **m** and **c** and approach it step by step:

- Initially let m = 0 and c = 0. Let L be our learning rate. This controls how much the value of
**m**changes with each step. L could be a small value like 0.0001 for good accuracy. - Calculate the partial derivative of the loss function with respect to m, and plug in the current values of x, y, m and c in it to obtain the derivative value
**D**.

Dₘ is the value of the partial derivative with respect to **m**. Similarly lets find the partial derivative with respect to **c**, Dc :

3. Now we update the current value of **m** and **c** using the following equation:

4. We repeat this process until our loss function is a very small value or ideally 0 (which means 0 error or 100% accuracy). The value of **m** and **c** that we are left with now will be the optimum values.

Now going back to our analogy, **m** can be considered the current position of the person. **D** is equivalent to the steepness of the slope and **L** can be the speed with which he moves. Now the new value of **m** that we calculate using the above equation will be his next position, and **L×D** will be the size of the steps he will take. When the slope is more steep (**D** is more) he takes longer steps and when it is less steep (**D** is less), he takes smaller steps. Finally he arrives at the bottom of the valley which corresponds to our loss = 0.

Now with the optimum value of **m** and **c** our model is ready to make predictions!

Now let’s convert everything above into code and see our model in action!

1.4796491688889395 0.10148121494753726

Gradient descent is one of the simplest and widely used algorithms in machine learning, mainly because it can be applied to any function to optimize it. Learning it lays the foundation to mastering machine learning.

The article can be found in this video tutorial –

*Find the data set and code here: *https://github.com/chasinginfinity/ml-from-scratch/tree/master/02%20Linear%20Regression%20using%20Gradient%20Descent

**Note: **T*his article was originally published on towardsdatascience.com, and kindly contributed to DPhi to spread the knowledge.*

Become a guide. Become a mentor.We at DPhi, welcome you to share your experience in data science – be it your learning journey, experience while participating in Data Science Challenges, data science projects, tutorials and anything that is related to Data Science. Your learnings could help a large number of aspiring data scientists! Interested? Submit here.

The post Tutorial on Linear Regression using Gradient Descent appeared first on DPhi.

]]>The post Linear Regression in 6 lines of Python appeared first on DPhi.

]]>Today to perform Linear Regression quickly, we will be using the library scikit-learn. If you don’t have it already you can install it using pip: **pip install scikit-learn**

So now lets start by making a few imports:

We need numpy to perform calculations, pandas to import the data set which is in .csv format in this case, and matplotlib to visualize our data and regression line. We will use the LinearRegression class to perform the linear regression.

Now lets perform the regression:

We have our predictions in Y_pred. Now lets visualize the data set and the regression line:

That’s it! You can use any data set of you choice, and even perform Multiple Linear Regression (more than one independent variable) using the LinearRegression class in sklearn.linear_model. Also this class uses the ordinary Least Squares method to perform this regression. So accuracy wont be high, when compared to other techniques. But if you want to make some quick predictions and get some insight into the data set given to you, then this is a very handy tool.

You can see the video tutorial of the same in this video:

*Find the data set and code here: *https://github.com/chasinginfinity/ml-from-scratch/tree/master/03%20Linear%20Regression%20in%202%20minutes

**Note:** This article was originally published on towardsdatascience.com, and kindly contributed to DPhi to spread the knowledge.

Become a guide. Become a mentor.We at DPhi, welcome you to share your experience in data science – be it your learning journey, experience while participating in Data Science Challenges, data science projects, tutorials and anything that is related to Data Science. Your learnings could help a large number of aspiring data scientists! Interested? Submit here.

The post Linear Regression in 6 lines of Python appeared first on DPhi.

]]>The post Tutorial on Logistic Regression using Gradient Descent with Python appeared first on DPhi.

]]>In statistics, logistic regression is used to model the probability of a certain class or event. I will be focusing more on the basics and implementation of the model, and not go too deep into the math part in this post. Just to give you a heads up, this article is a written version of the video tutorial that can found here.

Logistic regression is similar to linear regression because both of these involve estimating the values of parameters used in the prediction equation based on the given training data. Linear regression predicts the value of a continuous dependent variable. Whereas logistic regression predicts the probability of an event or class that is dependent on other factors. Thus the output of logistic regression always lies between 0 and 1. Because of this property, it is commonly used for classification purpose.

[Learn Data Science from this 5-Week Online Bootcamp materials.]

Consider a model with features *x1, x2, x3 … xn*. Let the binary output be denoted by *Y*, that can take the values 0 or 1.

Let *p* be the probability of *Y = 1*, we can denote it as *p = P(Y=1)*.

The mathematical relationship between these variables can be denoted as

Here the term *p/(1−p)* is known as the *odds* and denotes the likelihood of the event taking place. Thus* ln(p/(1−p))* is known as the *log odds* and is simply used to map the probability that lies between 0 and 1 to a range between (−∞,+∞). The terms *b0, b1, b2… *are parameters (or weights) that we will estimate during training.

So this is just the basic math behind what we are going to do. We are interested in the probability *p* in this equation. So we simplify the equation to obtain the value of p:

1. The log term* ln* on the LHS can be removed by raising the RHS as a power of* e*:

2. Now we can easily simplify to obtain the value of *p *:

This actually turns out to be the equation of the *Sigmoid Function* which is widely used in other machine learning applications. The *Sigmoid Function* is given by:

Now we will be using the derived equation above to make our predictions. Before that, we will train our model to obtain the values of our parameters b0, b1, b2… that results in the least error. This is where the error or loss function comes in.

The loss is basically the error in our predicted value. In other words, it is a difference between our predicted value and the actual value. We will be using the L2 Loss Function to calculate the error. Theoretically, you can use any function to calculate the error. This function can be broken down as:

- Let the actual value be yᵢ. Let the value predicted using our model be denoted as ȳᵢ. Find the difference between the actual and predicted value.
- Square this difference.
- Find the sum across all the values in training data.

Now that we have the error, we need to update the values of our parameters to minimize this error. This is where the “learning” actually happens since our model is updating itself based on its previous output to obtain a more accurate output in the next step. Hence with each iteration, our model becomes more and more accurate. We will be using the *Gradient Descent Algorithm* to estimate our parameters. Another commonly used algorithm is the Maximum Likelihood Estimation.

You might know that the partial derivative of a function at its minimum value is equal to 0. So gradient descent basically uses this concept to estimate the parameters or weights of our model by minimizing the loss function. Check out the below video for a more detailed explanation on how gradient descent works.

[Learn Data Visualization with Matplotlib and Exploratory Data Analysis]

For simplicity, for the rest of this tutorial let us assume that our output depends only on a single feature *x*. So we can rewrite our equation as:

Thus we need to estimate the values of weights b0 and b1 using our given training data.

- Initially let b0=0 and b1=0. Let L be the learning rate. The learning rate controls by how much the values of b0 and b1 are updated at each step in the learning process. Here let L=0.001.
- Calculate the partial derivative with respect to b0 and b1. The value of the partial derivative will tell us how far the loss function is from its minimum value. It is a measure of how much our weights need to be updated to attain minimum or ideally 0 error. In case you have more than one feature, you need to calculate the partial derivative for each weight b0, b1 … bn where n is the number of features. For a detailed explanation on the math behind calculating the partial derivatives, check out my video.

3. Next we update the values of b0 and b1:

4. We repeat this process until our loss function is a very small value or ideally reaches 0 (meaning no errors and 100% accuracy). The number of times we repeat this learning process is known as iterations or epochs.

Import the necessary libraries and download the data set here. The data was taken from kaggle and describes information about a product being purchased through an advertisement on social media. We will be predicting the value of *Purchased *and consider a single feature, *Age* to predict the values of *Purchased*. You can have multiple features as well.

We need to normalize our training data and shift the mean to the origin. This is important to get accurate results because of the nature of the logistic equation. This is done by the *normalize *method. The *predict* method simply plugs in the value of the weights into the logistic model equation and returns the result. This returned value is the required probability.

The model is trained for 300 epochs or iterations. The partial derivatives are calculated at each iteration and the weights are updated. You can even calculate the loss at each step and see how it approaches zero with each step.

Since the prediction equation returns a probability, we need to convert it into a binary value to be able to make classifications. To do this, we select a threshold, say 0.5 and all predicted values above 0.5 will be treated as 1 and everything else will be 0. You can choose a suitable threshold depending on the problem you are solving.

Here for each value of age in the testing data, we predict if the product was purchased or not and plot the graph. The accuracy can be calculated by checking how many correct predictions we made and dividing it by the total number of test cases. Our accuracy seems to be 85%.

[Get Started with Deep Learning with this free Bootcamp materials.]

The library sklearn can be used to perform logistic regression in a few lines as shown using the *LogisticRegression *class. It also supports multiple features. It requires the input values to be in a specific format hence they have been reshaped before training using the *fit *method.

The accuracy using this is 86.25%, which is very close to the accuracy of our model that we implemented from scratch!

Thus we have implemented a seemingly complicated algorithm easily using python from scratch and also compared it with a standard model in sklearn that does the same. I think the most crucial part here is the gradient descent algorithm, and learning how to the weights are updated at each step. Once you have learned this basic concept, then you will be able to estimate parameters for any function.

**Click Here**** for the entire code and explanation in a Google Colaboratory. You can use it to explore and play around with the code easily.**

**Note:** This article was originally published on towardsdatascience.com, and kindly contributed to DPhi to spread the knowledge.

[Join our community solve problem based on real-world datasets.]

**References**

- Artificial Intelligence, a modern approach — pg 726, 727
- https://machinelearningmastery.com/logistic-regression-for-machine-learning/
- https://towardsdatascience.com/logit-of-logistic-regression-understanding-the-fundamentals-f384152a33d1
- https://en.wikipedia.org/wiki/Logistic_regression

Become a guide. Become a mentor.We at DPhi, welcome you to share your experience in data science – be it your learning journey, experience while participating in Data Science Challenges, data science projects, tutorials and anything that is related to Data Science. Your learnings could help a large number of aspiring data scientists! Interested? Submit here.

The post Tutorial on Logistic Regression using Gradient Descent with Python appeared first on DPhi.

]]>The post Face Detection in 2 Minutes using OpenCV & Python appeared first on DPhi.

]]>*First of all make sure you have OpenCV installed. You can install it using pip: pip install opencv-python *

Face detection using Haar cascades is a machine learning based approach where a cascade function is trained with a set of input data. OpenCV already contains many pre-trained classifiers for face, eyes, smiles, etc.. Today we will be using the face classifier. You can experiment with other classifiers as well.

You need to download the trained classifier XML file (haarcascade_frontalface_default.xml), which is available in OpenCv’s GitHub repository. Save it to your working location.

To detect faces in images:

A few things to note:

- The detection works only on grayscale images. So it is important to convert the color image to grayscale. (line 8)
**detectMultiScale**function (line 10) is used to detect the faces. It takes 3 arguments — the input image,*scaleFactor*and*minNeighbours*.*scaleFactor*specifies how much the image size is reduced with each scale.*minNeighbours*specifies how many neighbors each candidate rectangle should have to retain it. You can read about it in detail here. You may have to tweak these values to get the best results.*faces*contains a list of coordinates for the rectangular regions where faces were found. We use these coordinates to draw the rectangles in our image.

Results:

The only difference here is that we use an infinite loop to loop through each frame in the video. We use *cap.read() *to read each frame. The first value returned is a flag that indicates if the frame was read correctly or not. We don’t need it. The second value returned is the still frame on which we will be performing the detection.

*Find the code here: **https://github.com/adarsh1021/facedetection*

Also, you can see the video tutorial here.

**Note: **T*his article was originally published on towardsdatascience.com, and kindly contributed to DPhi to spread the knowledge.*

*Featured Image Credit – https://www.kairos.com/blog/face-detection-explained*

Become a guide. Become a mentor.

The post Face Detection in 2 Minutes using OpenCV & Python appeared first on DPhi.

]]>The post COVID-19: Visualising the Impact of Social Distancing in Python appeared first on DPhi.

]]>COVID-19 has taken over the world and brought the entire world to a stand still in just a few months. Total cases in the world will be half a million soon and over 20,000 deaths have been confirmed (these figures are as per 26th March). The worrying part is the graph of total cases is still exponentially increasing, and showing no signs of slowing down.

Flattening the curve by social distancing seems to be the only way out of this. Many countries have been locked down in the past few weeks, and people have been asked to strictly stay at home. All these measures will not eliminate the virus, but will help to slow down it’s spread, thus reducing the pressure on the health care system, thus reducing the fatality rate.

But many people still don’t seem to understand the seriousness of social distancing, and how big of an impact even a single person could have. The point is if you are a healthy individual, and the virus may not affect you much, but you could spread it to other people who may be adversely affected by it.

So in this quick post, I will try and visualise the effect of social distancing using python, to see the huge impact every single person could have in stopping the spread of COVID-19, and potentially save thousands of lives.

The goal of this experiment is not to model the spread of the virus, but to understand the impact social distancing has in reducing it’s spread and realise its importance.

First let us import the essentials and define a few parameters.

Let me explain each parameter:

- DAYS: This is simply the number of days we carry out the simulation
- POPULATION: The population of our simulated city.
- SPREAD_FACTOR: It is the number of people an infected person comes in contact with. In a city, an average person is said to be in contact with at least 16 people in a day. Assuming that only a quarter of those people will get infected, I have chosen the SPREAD_FACTOR to be 4. Something to note is that the spread factor depends on many variables and does not stay constant in real life.
- DAYS_TO_RECOVER: The number of days it takes for an infected person to recover. In real life this is also not a constant, but 10 is a good average.
- INITIALLY_AFFECTED: The number of people who were initially affected by the virus. They are the carries who carry the virus from an infected region to a new region, like our hypothetical
*city*.

We will use a DataFrame to model a city where each row corresponds to a citizen, and keep track of infected and recovered people. Using the sample function can randomly select people from the DataFrame. Here is what we will do:

- Create a DataFrame called city, where each row corresponds to a person in the city. It also contains columns to mark when a person is infected and recovered. Initially random INITIALLY_AFFECTED people, using sample and mark them as infected. Also mark their recovery day.
- Run a for loop DAYS times to simulate each passing day.
- Check the number of people who have recovered on this day, and mark them as recovered. These people won’t spread the virus anymore.
- On each day, count the number of infected people, use the SPREAD_FACTOR to calculate the newly infected people on that day. So the number of new cases on a day = SPREAD_FACTOR * number of active cases.
- Keep track of the number of active cases and people who recovered for visualising later.

You can see that in around 10 to 15 days, the entire population of 100,000 has been affected and recovered. This is assuming that the city was capable of treating 100,000 patients at the same time, and everyone recovered at the same rate — in 10 days. But do you think this hypothetical city of 100,000 will have a health care system that can take care of 100,000 active cases per day, for about a week ? Now in reality, the growth may not be so drastic, but it can easily lead to something like this if we take no action at all.

Now let’s take a look at the graph for different values of SPREAD_FACTOR.

Observations:

- SPREAD_FACTOR = 1 (top left): This means every infected person comes in contact with one random person, who is infected if not already infected. Almost all the population has been affected.
- SPREAD_FACTOR = 0.5 (bottom left): For every two infected people one new person is infected per day. Note that the selection of this new person is done at random, and infected only if this person is not already infected. Here the curve is still almost the same as in the first case, but the total cases have gone down by around 20,000.
- SPREAD_FACTOR = 0.25 (top right): For every 4 infected people, one person is infected (if not already infected). Or in other words, 1 one out of these 4 infected people came in contact with a new person who got infected (the other 3 were practicing social distancing !). This could be a state when all the people are consciously quarantining themselves and practicing social distancing. From the previous case, just by reducing the spread factor by half, the spread has decreased exponentially, and the curve is significantly flatter. Here the health care system should be able to provide good care since at the peak, there are only 40,000 active cases.
- SPREAD_FACTOR = 0.2 (bottom right): Here one out of every 5 infected people came in contact with a new person and spread the infection. The other 4 were in isolation. Not too different from the previous case, but the curve is significantly flatter, and the peak active cases have gone down by almost half !

In the last two cases, you can observe the impact a single person can make on the entire spread of the virus ! From this we can conclude that, although the virus spreads exponentially, social distancing also works exponentially, andevery single isolated person has an exponential impact on flattening the curve!

Note: I am aware that this is an oversimplification of the real world scenario, but I think it gives us a good understanding of the relationship between the SPREAD_FACTOR and the number of active cases. Also, an exponential function can be simulated easily with math equations, but I think this is more intuitive and easier to understand.

Well now you why exactly social distancing is given so much importance! Basically you are saving lives by sitting at home.

You can **find the code here in this Google Colab**. You can try and experiment with different values for the parameters. Also try visualising the other metrics like recovery per day. Instead of having a constant value for the spread factor throughout the simulation, you can try to reduce it at different intervals, and observer the effects. I noticed that once the damage has been done, then there is no going back.

So practice social distancing, wash your hands and remember, we are all in this together!

References:

https://www.worldometers.info/coronavirus/

https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca

https://www.washingtonpost.com/graphics/2020/world/corona-simulator/

**Note**: *This article was originally published on towardsdatascience.com, and kindly contributed to DPhi to spread the knowledge.*

Become a guide. Become a mentor.

The post COVID-19: Visualising the Impact of Social Distancing in Python appeared first on DPhi.

]]>The post Tutorial on Linear Regression Using Least Squares appeared first on DPhi.

]]>In statistics, linear regression is a linear approach to modelling the relationship between a dependent variable and one or more independent variables (To learn more about dependent and independent variables, read this article). In the case of one independent variable, it is called simple linear regression. For more than one independent variable, the process is called multiple linear regression. We will be dealing with simple linear regression in this tutorial.

Let **X** be the independent variable and **Y** be the dependent variable. We will define a linear relationship between these two variables as follows:

This is the equation for a line that you might have studied in high school. **m** is the slope of the line and **c** is the y-intercept. Today we will use this equation to train our model with a given dataset and predict the value of **Y** for any given value of **X**.

Now the challenge here is to determine the optimal values for **m** and **c**, that would give a minimum error for the given dataset. We will be doing this by using the **Least Squares** method.

So to minimize the error we need a way to calculate the error in the first place. A **loss function** in machine learning is simply a measure of how different the predicted value is from the actual value.

In this tutorial, we will be using the **Quadratic Loss Function** to calculate the loss or error in our regression model. It is defined as mentioned below:

We are squaring it because, for the points below the regression line **y — p** (red line in the above graph), the difference of y and p will be negative whereas the difference will be positive for the values above the regression line… Summation of these differences (negative values and positives values) might nullify the error or not give a true picture of the total error of the model. Hence, we sum the square of the difference of actual value (y) and predicted value (p) while calculating the loss function.

Now that we have determined the loss function, the only thing left to do is minimize it. This is done by finding the partial derivative of **L**, equating it to 0 and then finding an expression for **m** and **c**. After we do the math, we are left with these equations:

Here x̅ is the mean of all the values in the input **X** and ȳ is the mean of all the values in the desired output **Y**. This is the Least Squares method. Now we will implement this in python and make predictions.

1.287357370010931 9.908606190326509

There won’t be much accuracy because we are simply taking a straight line and forcing it to fit into the given data in the best possible way. But you can use this to make simple predictions or get an idea about the magnitude/range of the real value. Also, this is a good first step for beginners in Machine Learning.

*Find the dataset and the code used in the tutorial here: **https://github.com/chasinginfinity/ml-from-scratch/tree/master/01%20Linear%20Regression%20using%20Least%20Squares*

Got questions ? Need help ? Contact me!

Email: adarsh1021@gmail.com

LinkedIn: https://www.linkedin.com/in/adarsh-menon-739573146/

Twitter: https://twitter.com/adarsh_menon_

Instagram: https://www.instagram.com/adarsh_menon_/

This article was originally published on towardsdatascience.com and kindly contributed to DPhi to spread the knowledge.

Become a guide. Become a mentor.

The post Tutorial on Linear Regression Using Least Squares appeared first on DPhi.

]]>