Logistics Regression

Posted by GwanSiu on June 17, 2018

1. Logistic Regression

Data: Inputs are continuous vectors of length $K$. Outputs are discrete.

Model: Logistic function applied to dot product of parameters with input vector.

Learning: Finds the parameters that minimized some objective function.

Usually, we minimize the negative log conditional likelihood:

Why can’t maximize likelihood(as in naive bayes)

Prediction: Output is the most probable class.

2. Learning Methods

learning: Three approaches to solve $\theta^{\ast}=\arg\min_{\theta}J(\theta)$

  • Approach 1: Gradient Descent(Take larger-more certain-steps apposite the gradient)
  • Approach 2: Stochastic Gradient Descent(SGD)(Take many small steps apposite the gradient)
  • Approach 3: Newton’s Method(use second derivatives to better follow curvature)

3. Newton’s Method and Logistic Regression

The motivation behind Newton’s method is to use a quadreatic approximation of our function to make a good guess where we should step next.

The Taylor series expansion for an infinitely differentiable function $f(x),x\in\mathbb{R},$ about a pint $\nu\in\mathbb{R}$ is:

The 2nd-order Taylor series approximation cuts off the expansion after the quadratic term:

The vector version of Taylor series expansion for an infinitely differentiable function $f(x),\mathbf{x}\in \mathbb{R}^{K}$, about a point $\mathbf{v}\in\mathbb{R}^{K}$ is:

The 2nd-order Taylor series approximation cuts off the expansion after the quadratic term:

Taking the derivative of $\tilde{f}(v)$ and setting to 0 gives us the closed form minimizer of this(convex) quadratic function:

The added term $\nabla x_{nt}=-(\nabla^{2}f(x))^{-1}\nabla f(x)$ is called Newton’s step.

The algorithm of Newton’s Method(Newton-Raphson method)

Goal: $x^{\ast}=\arg\min_{x}f(x)$

  1. Approximate the function with the 2nd order Taylor series:
  1. Compute its minimizer
  1. Step to that minimizer:
  1. Repeat

3. Iterative Weighted Least Square(Newton’s Method for Logistic Regression)

As we know, linear regression has closed-form solution, but there are no longer closed-form solution for logistic regression due to nonlinearity of the logistic sigmoid function. In fact, the lost function of logistic regression is convex, and it is guaranteed to find a global optimal solution. Furthermore, the error funciton can be solved by an efficient iterative optimization algorithm based on Newton-Rashon iterative optimization scheme.

For Logistic Regression:

take $-H^{-1}g$ back to $\theta\leftarrow \theta - H^{-1}g$, we obtain:

where $z=(X\theta-S^{-1}(\mu-y)$

Compare with the closed-form in linear regression: $\theta^{\ast}=(X^{T}X)^{-1}X^{T}y$, the step $\theta\leftarrow \theta-H^{-1}g$ can be reformlized as $\theta=(X^{T}SX)^{-1}X^{T}Sz$, where $z=(X\theta-S^{-1}(\mu-y)$. It’s the same form as the closed-form of linear regression.

Therefore, we can see the above update yields the minimizer for the weighted least squares problem:

where $S_{ii}$ is the weight of the $i\text{th}$ “training example” consisting of the pair $(x^{(i)},z_{i})$.

The explaination in wiki:

The method of iteratively reweighted least squares (IRLS) is used to solve certain optimization problems with objective functions of the form:

by an iterative method in which each step involves solving a weighted least squares problem of the form:

IRLS is used to find the maximum likelihood estimates of a generalized linear model, and in robust regression to find an M-estimator, as a way of mitigating the influence of outliers in an otherwise normally-distributed data set. For example, by minimizing the least absolute error rather than the least square error.