{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Intuitions about CNN, based on simple and complex cells. See https://en.wikipedia.org/wiki/Simple_cell and https://en.wikipedia.org/wiki/Complex_cell ."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "On the tensor concept: generalizes vector and matrix concepts.\n",
      "\n",
      "   * Vector = 1D-Tensor: specify one number, i.e., position, get back a real number.\n",
      "   *  Matrix = 2D-Tensor: specify two numbers (row, column), get back a real number.\n",
      "   * D-dimensional tensor: specify D numbers, get back a real number.\n",
      "\n",
      "Typical notation: $Tijk=x$ for 3D. More readable, first=order notation: $T(X,Y,Z)=x$ where $X,Y,Z$ range over the appropriate domains."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import cPickle\n",
      "import gzip\n",
      "import os\n",
      "import sys\n",
      "import time\n",
      "\n",
      "import numpy\n",
      "\n",
      "import theano\n",
      "import theano.tensor as T\n",
      "from theano.tensor.signal import downsample\n",
      "from theano.tensor.nnet import conv"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 43
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "compute a single convolution"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from theano.tensor.nnet import conv\n",
      "rng = numpy.random.RandomState(23455)\n",
      "\n",
      "# instantiate 4D tensor for input\n",
      "input = T.tensor4(name='input')\n",
      "\n",
      "# initialize shared variable for weights.\n",
      "w_shp = (2, 3, 9, 9)\n",
      "w_bound = numpy.sqrt(3 * 9 * 9)\n",
      "W = theano.shared( numpy.asarray(\n",
      "            rng.uniform(\n",
      "                low=-1.0 / w_bound,\n",
      "                high=1.0 / w_bound,\n",
      "                size=w_shp),\n",
      "            dtype=input.dtype), name ='W')\n",
      "\n",
      "# initialize shared variable for bias (1D tensor) with random values\n",
      "# IMPORTANT: biases are usually initialized to zero. However in this\n",
      "# particular application, we simply apply the convolutional layer to\n",
      "# an image without learning the parameters. We therefore initialize\n",
      "# them to random values to \"simulate\" learning.\n",
      "b_shp = (2,)\n",
      "b = theano.shared(numpy.asarray(\n",
      "            rng.uniform(low=-.5, high=.5, size=b_shp),\n",
      "            dtype=input.dtype), name ='b')\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 44
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# build symbolic expression that computes the convolution of input with filters in w\n",
      "conv_out = conv.conv2d(input, W)\n",
      "\n",
      "\n",
      "output = T.nnet.sigmoid(conv_out + b.dimshuffle('x', 0, 'x', 'x'))\n",
      "\n",
      "# create theano function to compute filtered images\n",
      "f = theano.function([input], output)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 45
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output\n",
      "# A few words on ``dimshuffle`` :\n",
      "#   ``dimshuffle`` is a powerful tool in reshaping a tensor;\n",
      "#   what it allows you to do is to shuffle dimension around\n",
      "#   but also to insert new ones along which the tensor will be\n",
      "#   broadcastable;\n",
      "#   dimshuffle('x', 2, 'x', 0, 1)\n",
      "#   This will work on 3d tensors with no broadcastable\n",
      "#   dimensions. The first dimension will be broadcastable,\n",
      "#   then we will have the third dimension of the input tensor as\n",
      "#   the second of the resulting tensor, etc. If the tensor has\n",
      "#   shape (20, 30, 40), the resulting tensor will have dimensions\n",
      "#   (1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor)\n",
      "#   More examples:\n",
      "#    dimshuffle('x') -> make a 0d (scalar) into a 1d vector\n",
      "#    dimshuffle(0, 1) -> identity\n",
      "#    dimshuffle(1, 0) -> inverts the first and second dimensions\n",
      "#    dimshuffle('x', 0) -> make a row out of a 1d vector (N to 1xN)\n",
      "#    dimshuffle(0, 'x') -> make a column out of a 1d vector (N to Nx1)\n",
      "#    dimshuffle(2, 0, 1) -> AxBxC to CxAxB\n",
      "#    dimshuffle(0, 'x', 1) -> AxB to Ax1xB\n",
      "#    dimshuffle(1, 'x', 0) -> AxB to Bx1xA"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 46
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can see an example of the original image and the detected features at http://deeplearning.net/tutorial/lenet.html#lenet. (I'm missing a library to run this in real time)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from theano.tensor.signal import downsample\n",
      "\n",
      "input = T.dtensor4('input')\n",
      "maxpool_shape = (2, 2)\n",
      "pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=True)\n",
      "f = theano.function([input],pool_out)\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 47
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Randomly assign pixel values for 5x5 image, then show the output of max-pooling. Observe which squares the selected values are coming from."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "invals = numpy.random.RandomState(1).rand(3, 2, 5, 5)\n",
      "print 'With ignore_border set to True:'\n",
      "print 'invals[0, 0, :, :] =\\n', invals[0, 0, :, :]\n",
      "print 'output[0, 0, :, :] =\\n', f(invals)[0, 0, :, :]\n",
      "\n",
      "pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=False)\n",
      "f = theano.function([input],pool_out)\n",
      "print 'With ignore_border set to False:'\n",
      "print 'invals[1, 0, :, :] =\\n ', invals[1, 0, :, :]\n",
      "print 'output[1, 0, :, :] =\\n ', f(invals)[1, 0, :, :]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "With ignore_border set to True:\n",
        "invals[0, 0, :, :] =\n",
        "[[  4.17022005e-01   7.20324493e-01   1.14374817e-04   3.02332573e-01\n",
        "    1.46755891e-01]\n",
        " [  9.23385948e-02   1.86260211e-01   3.45560727e-01   3.96767474e-01\n",
        "    5.38816734e-01]\n",
        " [  4.19194514e-01   6.85219500e-01   2.04452250e-01   8.78117436e-01\n",
        "    2.73875932e-02]\n",
        " [  6.70467510e-01   4.17304802e-01   5.58689828e-01   1.40386939e-01\n",
        "    1.98101489e-01]\n",
        " [  8.00744569e-01   9.68261576e-01   3.13424178e-01   6.92322616e-01\n",
        "    8.76389152e-01]]\n",
        "output[0, 0, :, :] =\n",
        "[[ 0.72032449  0.39676747]\n",
        " [ 0.6852195   0.87811744]]\n",
        "With ignore_border set to False:"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "invals[1, 0, :, :] =\n",
        "  [[ 0.01936696  0.67883553  0.21162812  0.26554666  0.49157316]\n",
        " [ 0.05336255  0.57411761  0.14672857  0.58930554  0.69975836]\n",
        " [ 0.10233443  0.41405599  0.69440016  0.41417927  0.04995346]\n",
        " [ 0.53589641  0.66379465  0.51488911  0.94459476  0.58655504]\n",
        " [ 0.90340192  0.1374747   0.13927635  0.80739129  0.39767684]]\n",
        "output[1, 0, :, :] =\n",
        "  [[ 0.67883553  0.58930554  0.69975836]\n",
        " [ 0.66379465  0.94459476  0.58655504]\n",
        " [ 0.90340192  0.80739129  0.39767684]]\n"
       ]
      }
     ],
     "prompt_number": 48
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we define one convolutional layer that combines convolution with max-pooling. The nodes in the neural net are replaced by feature maps. Question: how are the \"receptive fields\" chosen? That is, how are output feature maps linked to certain input feature maps? Are the receptive fields generated by a window of fixed size that moves over the input layer?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "class LeNetConvPoolLayer(object):\n",
      "\n",
      "    def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):\n",
      "        \"\"\"\n",
      "        Allocate a LeNetConvPoolLayer with shared variable internal parameters.\n",
      "\n",
      "        :type rng: numpy.random.RandomState\n",
      "        :param rng: a random number generator used to initialize weights\n",
      "\n",
      "        :type input: theano.tensor.dtensor4\n",
      "        :param input: symbolic image tensor, of shape image_shape\n",
      "\n",
      "        :type filter_shape: tuple or list of length 4\n",
      "        :param filter_shape: (number of filters, num input feature maps,\n",
      "                              filter height,filter width)\n",
      "\n",
      "        :type image_shape: tuple or list of length 4\n",
      "        :param image_shape: (batch size, num input feature maps,\n",
      "                             image height, image width)\n",
      "\n",
      "        :type poolsize: tuple or list of length 2\n",
      "        :param poolsize: the downsampling (pooling) factor (#rows,#cols)\n",
      "        \"\"\"\n",
      "        assert image_shape[1] == filter_shape[1]\n",
      "        self.input = input\n",
      "\n",
      "        # initialize weight values: the fan-in of each hidden neuron is\n",
      "        # restricted by the size of the receptive fields.\n",
      "        fan_in =  numpy.prod(filter_shape[1:])\n",
      "        W_values = numpy.asarray(rng.uniform(\n",
      "              low=-numpy.sqrt(3./fan_in),\n",
      "              high=numpy.sqrt(3./fan_in),\n",
      "              size=filter_shape), dtype=theano.config.floatX)\n",
      "        self.W = theano.shared(value=W_values, name='W')\n",
      "\n",
      "        # the bias is a 1D tensor -- one bias per output feature map\n",
      "        b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)\n",
      "        self.b = theano.shared(value=b_values, name='b')\n",
      "\n",
      "        # convolve input feature maps with filters\n",
      "        conv_out = conv.conv2d(input, self.W,\n",
      "                filter_shape=filter_shape, image_shape=image_shape)\n",
      "\n",
      "        # downsample each feature map individually, using maxpooling\n",
      "        pooled_out = downsample.max_pool_2d(conv_out, poolsize, ignore_border=True)\n",
      "\n",
      "        # add the bias term. Since the bias is a vector (1D array), we first\n",
      "        # reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will thus\n",
      "        # be broadcasted across mini-batches and feature map width & height\n",
      "        self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))\n",
      "\n",
      "        # store parameters of this layer\n",
      "        self.params = [self.W, self.b]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 49
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Finally we can build a CNN that has a single CNN layer, then a second sigmoid layer on top. See http://deeplearning.net/tutorial/lenet.html#lenet . There is also important discussion about tips and tricks. Computing activation in a CNN is more expensive than in a neural net: in an NN, in effect each hidden node is a single feature map for all nodes in the lower layer, whereas in a CNN the lower layer is in effect partitioned into different receptive fields."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}