Linear Regressions

https://mrobeidat.github.io/reading-notes/


Linear Regressions

How to Run Linear Regression in Python

There are several ways in which you can do that, you can do linear regression using numpy, scipy, stats model and sckit learn.

  • Scikit-learn

is a powerful Python module for machine learning. It contains function for regression, classification, clustering, model selection and dimensionality reduction.

Exploring Boston Housing Data Set

  • The first step is to import the required Python libraries into Ipython Notebook.

import

  • This data set is available in sklearn Python module, so we will access it using scikitlearn.

access

  • The object boston is a dictionary, so you can explore the keys of this dictionary.

explore the kyes

explore the kyes

  • print the feature names of boston data set.

print

  • In this data set I have 506 instances(rows) and 13 attributes or parameters(columns).

check

  • convert boston.data into a pandas data frame.

convert

  • replace those numbers with the feature names.

rplace

*boston.target contains the housing prices.

contains

  • add these target prices to the bos data frame.

add

Scikit Learn

fit a linear regression model and predict the Boston housing prices. Use the least squares method as the way to estimate the coefficients.

  • Y = boston housing price(also called “target” data in Python)

  • X = all the other features (or independent variables)

import linear regression from sci-kit learn module.

import1

Important functions to keep in mind while fitting a linear regression model are:

  • lm.fit() -> fits a linear model

  • lm.predict() -> Predict Y using the linear model with estimated coefficients

  • lm.score() -> Returns the coefficient of determination (R^2). A measure of how well observed outcomes are replicated by the model, as the proportion of total variation of outcomes explained by the model.

Fitting a Linear Model

use all 13 parameters to fit a linear regression model. Two other parameters that you can pass to linear regression object are fit_intercept and normalize.

parameters

then construct a data frame that contains features and estimated coefficients.

dataframe

from the data frame that there is a high correlation between RM and prices. Lets plot a scatter plot between True housing prices and True RM.

dataframe1

plt

Predicting Prices

calculate the predicted prices (Y^i) using lm.predict. Then display the first 5 housing prices. These are my predicted housing prices.

cal

plot a scatter plot to compare true prices and the predicted prices.

plt1

plt2

  • Lets calculate the mean squared error.

cal1

cal2

cal3

How to do train-test split:

divide your data sets randomly. Scikit learn provides a function called train_test_split to do this.

divide

build a linear regression model using my train-test data sets.

build

  • Input:

      print “Fit a model X_train, and calculate MSE with Y_train:”, np.mean((Y_train – lm.predict(X_train)) ** 2)
    
      print “Fit a model X_train, and calculate MSE with X_test, Y_test:”, np.mean((Y_test – lm.predict(X_test)) ** 2)
    
  • Output:

      Fit a model X_train, and calculate MSE with Y_train: 19.5467584735 Fit a model X_train, and calculate MSE with X_test, Y_test: 28.5413672756
    
  • Residual Plots

plt5