Not logged in. Login

Brief Summary of SVM - ANN Equivalence

The paper considers only a shallow network architecture with one hidden layer.

Very short and rough summary: The paper establishes the following correspondence:

  • activation functions correspond to kernels
  • hidden nodes correspond to support vectors

A kernel method fixes the activation functions but learns the number of hidden nodes. Neural net learning fixes the number of hidden nodes but learns the activation functions (kernel).

More detailed version: Fix n datapoints.

  1. Given an SVM, define a neural network where there is a hidden unit for each support vector. The activation function of a hidden unit is the value of the kernel applied to the input data point and the support vector represented by the hidden unit. The weights from hidden units to the single output unit are derived from the SVM weights obtained in the dual formulation.

  2. Consider a neural network with n hidden units, one for each datapoint. The error function is squared error plus a regularization term. From the regularization operator we can derive Green functions. The Green functions define the activation functions for the hidden units. They also determine the weights from hidden to output units via linear equations.

  3. The main theorem now states, roughly, that for every kernel, there is a corresponding regularization operator, and vice versa, s.t. solving the SVM problem for the kernel is equivalent to finding weights that minimize the regularized squared error criterion. Rough summary follows.

    1. Mapping from the kernel, the regularization term becomes the weighted average of entries in the kernel matrix, using the support vector weights. So the more of these weights are 0, the smaller the complexity of the neural network.
    2. Mapping from the regularization term, think of a vector space of activation functions. A given input vector maps to a member of this space. Define the kernel for two data points as the norm of their activation functions. This turns out to satisfy Mercer's conditions, hence is a valid kernel.
Updated Tue April 08 2014, 09:47 by oschulte.