抱歉,您的浏览器无法访问本站

本页面需要浏览器支持(启用)JavaScript


了解详情 >

From Logistic Regression to Support Vector Machines

1. Large Margin Classification

Alternation view of logistic regression

$ \begin{align} h_\theta (x) = g({\theta^T x}) = \dfrac{1}{1 + e{-\thetaT x}} \end{align} ; , ; h_\theta (x) \in [0, 1] $

<img src="/images/ml/coursera/ml-ng-w3-02.png" width=“820” height=“500” align=“middle” /img>

$ y = 1 ; when ; h_\theta(x) = g(\theta^T x) \geq 0.5 ; when ; \theta^T x \geq 0 $.

$ y = 0 ; when ; h_\theta(x) = g(\theta^T x) \le 0.5 ; when ; \theta^T x \le 0 $

We can compress our cost function’s two conditional cases into one case:

$ \mathrm{Cost}(h_\theta(x),y) = - y \cdot \log(h_\theta(x)) - (1 - y) \cdot \log(1 - h_\theta(x))$

We can fully write out our entire cost function as follows:

J(θ)=1m_i=1m[y(i)log(h_θ(x(i)))+(1y(i))log(1h_θ(x(i)))] J(\theta) = - \frac{1}{m} \displaystyle \sum\_{i=1}^m [y^{(i)}\log (h\_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h\_\theta(x^{(i)}))]

J(θ)=min__θ1m[_i=1my(i) (logh_θ(x(i)))+(1y(i))(log(1h_θ(x(i))))]+λ2m_j=1nθ_j2 J(\theta) = \mathop{min}\limits\_{\_\theta} \frac{1}{m} \left[ \displaystyle \sum\_{i=1}^m y^{(i)}\ \left(-\log h\_\theta (x^{(i)}) \right) + (1 - y^{(i)}) \left( - \log (1 - h\_\theta(x^{(i)})) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum\_{j=1}^n \theta\_j^2

cost_1(θTxi)=logh_θ(x(i))cost\_1(\theta^T x^{i}) = -\log h\_\theta (x^{(i)})
cost_0(θTxi)=log(1h_θ(x(i)))cost\_0(\theta^T x^{i}) = - \log (1 - h\_\theta(x^{(i)}))

J(θ)=min__θ1m[_i=1my(i) (cost_1(θTxi))+(1y(i))(cost_0(θTxi))]+λ2m_j=1nθ_j2 J(\theta) = \mathop{min}\limits\_{\_\theta} \frac{1}{m} \left[ \displaystyle \sum\_{i=1}^m y^{(i)}\ \left(cost\_1(\theta^T x^{i}) \right) + (1 - y^{(i)}) \left( cost\_0(\theta^T x^{i}) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum\_{j=1}^n \theta\_j^2

1.1 Optimization Objective

J(θ)=min__θ1m[_i=1my(i) (cost_1(θTxi))+(1y(i))(cost_0(θTxi))]+λ2m_j=1nθ_j2 J(\theta) = \mathop{min}\limits\_{\_\theta} \frac{1}{m} \left[ \displaystyle \sum\_{i=1}^m y^{(i)}\ \left(cost\_1(\theta^T x^{i}) \right) + (1 - y^{(i)}) \left( cost\_0(\theta^T x^{i}) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum\_{j=1}^n \theta\_j^2

C=1θC = \frac{1}{\theta}

J(θ)=min__θC_i=1m[y(i) cost_1(θTxi)+(1y(i))cost_0(θTxi)]+12m_j=1nθ_j2 J(\theta) = \mathop{min}\limits\_{\_\theta} C \displaystyle \sum\_{i=1}^m \left[ y^{(i)}\ cost\_1(\theta^T x^{i}) + (1 - y^{(i)}) cost\_0(\theta^T x^{i}) \right]+ \frac{1}{2m} \displaystyle \sum\_{j=1}^n \theta\_j^2

1.2 Large Margin Intuition

<img src="/images/ml/coursera/ml-ng-w7-svm-1.png" width=“620” height=“400” align=“middle” /img>

<img src="/images/ml/coursera/ml-ng-w7-svm-2.png" width=“620” height=“400” align=“middle” /img>

<img src="/images/ml/coursera/ml-ng-w7-svm-3.png" width=“620” height=“400” align=“middle” /img>

1.3 Mathematics Behind Large Margin Classification

<img src="/images/ml/coursera/ml-ng-w7-svm-4.png" width=“620” height=“400” align=“middle” /img>

<img src="/images/ml/coursera/ml-ng-w7-svm-5.png" width=“620” height=“400” align=“middle” /img>

<img src="/images/ml/coursera/ml-ng-w7-svm-6.png" width=“620” height=“400” align=“middle” /img>

2. Kernels

2.1 Kernels I

2.2 Kernels II

3. SVMs in Practice

3.1 Using An SVM

Reference

Comments