抱歉,您的浏览器无法访问本站

本页面需要浏览器支持(启用)JavaScript


了解详情 >

From Logistic Regression to Support Vector Machines

1. Large Margin Classification

Alternation view of logistic regression

$ \begin{align} h_\theta (x) = g({\theta^T x}) = \dfrac{1}{1 + e^{-\theta^T x}} \end{align} \; , \; h_\theta (x) \in [0, 1] $

$ y = 1 \; when \; h_\theta(x) = g(\theta^T x) \geq 0.5 \; when \; \theta^T x \geq 0 $.

$ y = 0 \; when \; h_\theta(x) = g(\theta^T x) \le 0.5 \; when \; \theta^T x \le 0 $

We can compress our cost function’s two conditional cases into one case:

$ \mathrm{Cost}(h_\theta(x),y) = - y \cdot \log(h_\theta(x)) - (1 - y) \cdot \log(1 - h_\theta(x))$

We can fully write out our entire cost function as follows:

$
J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]
$

$
J(\theta) = \mathop{min}\limits_{_\theta} \frac{1}{m} \left[ \displaystyle \sum_{i=1}^m y^{(i)}\ \left(-\log h_\theta (x^{(i)}) \right) + (1 - y^{(i)}) \left( - \log (1 - h_\theta(x^{(i)})) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum_{j=1}^n \theta_j^2
$

$cost_1(\theta^T x^{i}) = -\log h_\theta (x^{(i)})$
$cost_0(\theta^T x^{i}) = - \log (1 - h_\theta(x^{(i)}))$

$
J(\theta) = \mathop{min}\limits_{_\theta} \frac{1}{m} \left[ \displaystyle \sum_{i=1}^m y^{(i)}\ \left(cost_1(\theta^T x^{i}) \right) + (1 - y^{(i)}) \left( cost_0(\theta^T x^{i}) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum_{j=1}^n \theta_j^2
$

1.1 Optimization Objective

$
J(\theta) = \mathop{min}\limits_{_\theta} \frac{1}{m} \left[ \displaystyle \sum_{i=1}^m y^{(i)}\ \left(cost_1(\theta^T x^{i}) \right) + (1 - y^{(i)}) \left( cost_0(\theta^T x^{i}) \right) \right]+ \frac{\lambda}{2m} \displaystyle \sum_{j=1}^n \theta_j^2
$

令 $C = \frac{1}{\theta}$

$
J(\theta) = \mathop{min}\limits_{_\theta} C \displaystyle \sum_{i=1}^m \left[ y^{(i)}\ cost_1(\theta^T x^{i}) + (1 - y^{(i)}) cost_0(\theta^T x^{i}) \right]+ \frac{1}{2m} \displaystyle \sum_{j=1}^n \theta_j^2
$

1.2 Large Margin Intuition

1.3 Mathematics Behind Large Margin Classification

2. Kernels

2.1 Kernels I

2.2 Kernels II

3. SVMs in Practice

3.1 Using An SVM

Reference