Coursera 7 - Support Vector Machines

blairchen

Publish：Oct 13, 2017

From Logistic Regression to Support Vector Machines

1. Large Margin Classification

Alternation view of logistic regression

$ \begin{align} h_\theta (x) = g({\theta^T x}) = \dfrac{1}{1 + e^{-\thetaT x}} \end{align} ; , ; h_\theta (x) \in [0, 1] $

$ y = 1 ; when ; h_\theta(x) = g(\theta^T x) \geq 0.5 ; when ; \theta^T x \geq 0 $.

$ y = 0 ; when ; h_\theta(x) = g(\theta^T x) \le 0.5 ; when ; \theta^T x \le 0 $

We can compress our cost function’s two conditional cases into one case:

$ \mathrm{Cost}(h_\theta(x),y) = - y \cdot \log(h_\theta(x)) - (1 - y) \cdot \log(1 - h_\theta(x))$

We can fully write out our entire cost function as follows:

$J(\theta) = - \frac{1}{m} \displaystyle \sum\_{i=1}^m [y^{(i)}\log (h\_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h\_\theta(x^{(i)}))]$

$J(\theta) = \mathop{min}\limits\_{\_\theta} \frac{1}{m} \left[ \displaystyle \sum\_{i=1}^m y^{(i)}\ \left(-\log h\_\theta (x^{(i)}) \right) + (1 - y^{(i)}) \left( - \log (1 - h\_\theta(x^{(i)})) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum\_{j=1}^n \theta\_j^2$

$cost\_1(\theta^T x^{i}) = -\log h\_\theta (x^{(i)})$
$cost\_0(\theta^T x^{i}) = - \log (1 - h\_\theta(x^{(i)}))$

$J(\theta) = \mathop{min}\limits\_{\_\theta} \frac{1}{m} \left[ \displaystyle \sum\_{i=1}^m y^{(i)}\ \left(cost\_1(\theta^T x^{i}) \right) + (1 - y^{(i)}) \left( cost\_0(\theta^T x^{i}) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum\_{j=1}^n \theta\_j^2$

1.1 Optimization Objective

令 $C = \frac{1}{\theta}$

$J(\theta) = \mathop{min}\limits\_{\_\theta} C \displaystyle \sum\_{i=1}^m \left[ y^{(i)}\ cost\_1(\theta^T x^{i}) + (1 - y^{(i)}) cost\_0(\theta^T x^{i}) \right]+ \frac{1}{2m} \displaystyle \sum\_{j=1}^n \theta\_j^2$