{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to the toynn_2023 tool box" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Table of contents

\n", "\n", "
\n", " " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Ie13pHt3gyh8" }, "source": [ "### Standard libraries and the three classes of the tool box toynn_2022 (ToyPb, nD_data, ToyNN)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from toynn_2023 import *\n", "# performs the following:\n", "# import numpy as np\n", "# from numpy import random as nprd\n", "# from matplotlib import pyplot as plt\n", "# from matplotlib import cm as cm\n", "# from copy import deepcopy as dcp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "top\n", "          \n", "          \n", " \n", "1.\n", "          \n", "          \n", " \n", "2.\n", "          \n", "          \n", " \n", "3.\n", "          \n", "          \n", " \n", "4.\n", "          \n", "          \n", " \n", "5.\n", "          \n", "          \n", " \n", "bot." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Woacru-MVEgA" }, "source": [ "# 1. The class ToyPb " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Attributes of an object in the class ToyPb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An object in the class ToyPb contains some information about a classification problem for points in a rectangle.\n", "\n", "If _pb_ is in this class:
\n", "($*$) _pb.name_ is a chain.
\n", "($*$) _pb.bounds_ is a tuple of floats $(x_0^-,x_0^+,x_1^-,x_1^+)$ which defines the rectangle.
\n", "($*$) _pb.f_ is an implementation of a numerical function $f(x_0,x_1)$.
The classification problem is the following. Given $x=(x_0,x_1)\\in[x_0^-,x_0^+]\\times[x_1^-,x_1^+]$ determine whether $x$ belongs to $\\Omega$ where \n", "$$\n", "\\Omega:=\\left\\{x \\in[x_0^-,x_0^+]\\times[x_1^-,x_1^+]: f(x)<0\\right\\}.\n", "$$\n", "\n", "There are two other attributes.
\n", "($*$) _pb.loss_ is an implementation of a numerical function $\\ell$.
\n", "($*$) *pb.loss_prime* is an implementation of the derivative $\\ell'$ of $\\ell$.
\n", "The _``loss function''_ $\\ell:\\mathbb{R}\\to\\mathbb{R}$ is used to estimate the error of predictions.
\n", "Given a prediction $\\hat y\\in\\mathbb{R}$ and the correct classification:\n", "$$\n", "y= \\begin{cases}-1&\\text{if }x\\not\\in\\Omega,\\\\\n", "\\,\\,1&\\text{if }x\\in\\Omega,\n", "\\end{cases}\n", "$$\n", "the error (or cost) is measured by $\\ell(\\hat y y)$. The function $\\ell$ should be nondecreasing with $\\ell\\ge0$ and $\\ell(t)$ close to $0$ for large positive values of $t$. The ideal loss function would be \n", "$$\n", "\\ell_{ideal}(t)= \\begin{cases}\\,\\ 0&\\text{if }t>0,\\\\\n", "+\\infty&\\text{if }t\\le0.\n", "\\end{cases}\n", "$$\n", "This would give a zero cost to predictions $y$ with the correct sign and an infinite cost to the others.
\n", "However, to apply the gradient-descent methods we pick a smooth decreasing function for $\\ell$.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below, is an example of creation and manipulation of an obect in the class *ToyPb*.
\n", "__Remark:__ The method *show_border()* displays the boundary of the region $\\Omega$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "pb = ToyPb(name = \"disk\", bounds = (-1,1), loss_name = \"softplus\")\n", "\n", "\n", "print(f\"pb.name={pb.name}, pb.bounds={pb.bounds}\")\n", "\n", "\n", "pb.show_border()\n", "plt.title(f\"boundary of $\\Omega$\",fontsize=20)\n", "plt.show()\n", "\n", "loss, loss_prime = pb.loss, pb.loss_prime\n", "t=np.linspace(-3,3,300)\n", "\n", "\n", "plt.figure(figsize=(12,5))\n", "plt.subplot(121)\n", "plt.plot(t,loss(t),'b',label=r\"$\\ell$\")\n", "plt.legend(fontsize=20)\n", "\n", "plt.subplot(122)\n", "plt.plot(t,loss_prime(t),'r',label=r\"$\\ell'$\")\n", "plt.legend(fontsize=20)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can try with _name_ = \"square\", \"sin\" or \"ring\" and with *loss_name* = \"demanding\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The \\_\\_init\\_\\_ method of the class ToyPb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Parameters of the \\_\\_init\\_\\_ method**\n", "\n", "\\*\\*kwargs :
\n", "     \n", "_f_ : numerical function $(x_0,x_1)\\in\\mathbb{R}^2\\mapsto f(x_0,x_1)\\in\\mathbb{R}$ (optional if _name_ is given)
\n", "     \n", "_name_ : string (optional if _f_ is given)
\n", "     \n", "*bounds*=(-1,1) : a tuple of 2 or 4 floats
\n", "     \n", "*loss*, *loss_prime* : numerical functions: $\\mathbb{R}\\to\\mathbb{R}$ (optional)
\n", "     \n", "*loss_name*: string (optional)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Behaviour of the \\_\\_init\\_\\_ method.**\n", "\n", "The possible values for _name_ are \"sin\", \"affine\", \"disk\", \"square\", \"ring\".\n", "\n", "If _bounds_ is the tuple ($x_-$,$x_+$) with length 2, then *pb.bounds* receives the value ($x_-$,$x_+$,$x_-$,$x_+$).
\n", "If _bounds_ is a tuple with length 4, *pb.bounds* receives the value _bounds_. \n", "\n", "The possible values for *loss_name* are \"softplus\" and \"demanding\".
\n", "For \"softplus\", \n", "$$\n", "\\ell(t)=\\ln\\left(1 + e^{-t}\\right).\n", "$$\n", "For \"demanding\", \n", "$$\n", "\\ell(t)=\\sqrt{(t - 1)^2+1/10} - t + 1\n", "$$\n", "If the parameters *loss_name* and at least one of the parameters *loss* or *loss_prime* is not specified, then the deffect value for the loss function is:\n", "$$\n", "\\ell(t)=\\sqrt{t^2+1/10} - t.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. The class nD_data " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "top\n", "          \n", "          \n", "1.\n", "          \n", "          \n", "2.\n", "          \n", "          \n", "3.\n", "          \n", "          \n", "4.\n", "          \n", "          \n", "5.\n", "          \n", "          \n", "bot." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An object of the class *nD_data* essentially contains:
\n", "($*$) a set of points of the plane: $x^i=(x^i_0,x^i_1)$ for $i=0,\\dots,n-1$,
\n", "($*$) a set of labels $y^i\\in\\{-1,1\\}$ for $i=0,\\dots,n-1$ corresponding to the exact classification of the points $(x^i_0,x^i_1)$ with respect to a problem *pb* in the class _ToyPb_.,
\n", "($*$) possibly a set of predictions $y^i_{pred}\\in\\mathbb{R}$ for $i=0,\\dots,n-1$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If _data_ is in this class:
\n", "($*$) _data.n_ is an integer (the size of the sets).
\n", "($*$) _data.X_ is a numpy array of size $n\\times2$. With the above notation, *data.X*[i,0]$=x^i_0$, *data.X*[i,1]$=x^i_1$.
\n", "($*$) _data.Y_ is a numpy array of length $n$. With the above notation, *data.Y*[i]$=y^i$.
\n", "($*$) *data.Ypred* is also a numpy array of length $n$ and *data.Y*[i]$=y_{pred}^i$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Remarks:__
\n", "(a) *data.Ypred* is created only if *init_pred=True*. In this case it is initialized as a zero numpy array.
\n", "(b) For computing *data.Y* it is necessary to specify an object *pb* in the class *ToyPb*. The numpy array *data.Y* is then created according to the rule:\n", "$$\n", "\\textit{data.Y}\\text{[i]}:=\n", "\\begin{cases}\n", "-1&\\text{if }\\textit{pb.f}(\\textit{data.X}\\text{[i]})\\geq0,\\\\\n", "\\phantom{-}1&\\text{if }\\textit{pb.f}(\\textit{data.X}\\text{[i]})<0.\n", "\\end{cases}\n", "$$\n", "(c) The method *show_class()* displays the classification." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pb = ToyPb(name = \"disk\", bounds = (-1,1), loss_name = \"softplus\")\n", "\n", "ndata = 1000\n", "data = nD_data(n = ndata, pb = pb, init_pred=True)\n", "print(f\"data.n={data.n}\")\n", "print(f\"data.X.shape={data.X.shape}\")\n", "print(f\"data.Y.shape={data.Y.shape}\")\n", "print(f\"data.Ypred.shape={data.Ypred.shape}\")\n", "\n", "\n", "data.show_class()\n", "\n", "pb.show_border('k--')\n", "\n", "plt.legend(loc=1,fontsize=12)\n", "title1=\"Values of data.Y[i] on the points with\"\n", "title2=\" coordinates (data.X[i,0],data.X[i,1])\"\n", "plt.title(title1 + title2, fontsize=15)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The \\_\\_init\\_\\_ method of the class nD_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Parameters of the \\_\\_init\\_\\_ method**\n", "\n", "\\*\\*kwargs :
\n", "     \n", "($*$) _n_ : integer (>0)
\n", "     \n", "($*$) _X_ : numpy array of shape *n*$\\times$*2*
\n", "     \n", "($*$) _Y_ : numpy array of length _n_
\n", "     \n", "($*$) _f_ : numerical function $(x_0,x_1)\\in\\mathbb{R}^2\\mapsto f(x_0,x_1)\\in\\mathbb{R}$ (optional)
\n", "     \n", "($*$) _pb_ : object of type ToyPb
\n", "     \n", "($*$) *bounds*=(-1,1) : a tuple of 2 or 4 floats
\n", "     \n", "($*$) *init_pred*=None : boolean" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Behaviour of the \\_\\_init\\_\\_ method.**\n", "\n", "If *bounds*$=(x_0^-,x_0^+)$ then the attribute _bounds_ receives $(x_0^-,x_0^+,x_0^-,x_0^+)$.
\n", "If *bounds*$=(x_0^-,x_0^+,x_1^-,x_1^+)$ then the attribute _bounds_ receives _bounds_.
\n", "In the sequel, we denote _bounds_$=(x_0^-,x_0^+,x_1^-,x_1^+)$.\n", "\n", "If _X_ and _Y_ are given they are sent to the corresponding attributes of the object. \n", "\n", "If _X_ and _Y_ are not given, the \\_\\_init\\_\\_ method creates two atributes _X_ and _Y_.
\n", "_X_ is a numpy array of size _n_$\\times$_2_. The coefficients of _X_ are picked randomly *X*[i,0] is picked in $[x_0^-,x_0^+]$ and *X*[i,1] in $[x_1^-,x_1^+]$.
\n", "_Y_ is a numpy array of length _n_ which is defined with the rule.\n", "$$\n", "\\textit{Y}\\text{[i]}:=\n", "\\begin{cases}\n", "-1&\\text{if }\\textit{g}(\\textit{X}\\text{[i,0]},\\textit{X}\\text{[i,1]})\\geq0,\\\\\n", "\\ 1&\\text{if }\\textit{g}(\\textit{X}\\text{[i,0]},\\textit{X}\\text{[i,1]})<0,\n", "\\end{cases}\n", "$$\n", "where _g_=_f_ if _f_ is specified and _g_=_pb.f_ if not.\n", "\n", "\n", "If *init_pred=True* then an attribute *Ypred* is created which receives a zero numpy array of length _n_ (an array with the same shape as _Y_ and with zero entries)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. The class toyNN " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "top\n", "          \n", "          \n", " \n", "1.\n", "          \n", "          \n", " \n", "2.\n", "          \n", "          \n", " \n", "3.\n", "          \n", "          \n", " \n", "4.\n", "          \n", "          \n", " \n", "5.\n", "          \n", "          \n", " \n", "bot." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An object of the class _toyNN_ contains the characteristics of a neural network (number of hidden layers, number of nodes in each layer, activation function) __but__ not the coefficients of a specific neural network with this shape. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The number of layers and nodes in a neural network is described as a tuple\n", "$$\n", "\\text{CardNodes}=(a_0,a_1,\\dots,a_{N-1},a_N),\n", "$$\n", "where $a_n$ is the number of nodes in the $n^{\\text{th}}$ layer .
\n", "There are $N-1$ hidden layers.
\n", "The neural networks of interest for the classification problems of part __1__ have two input nodes and one output node. Hence \n", "$$\n", "a_0=2\\qquad\\text{ and }\\qquad a_N=1.\n", "$$\n", "(In the optimization process, the two input nodes will be fed with the coordinates $(x_0,x_1)$ of the points to classify. The output will be a real number that we want to be positive for input points in $\\Omega$ and negative in the other cases.)
\n", "The neural networks are also characterized by an activation function $\\chi$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The atributes of an object *nn* in this class are the following.
\n", "($*$) _nn.N_ is an integer. The number of hidden layers is *nn.N*$-1$.
\n", "($*$) _nn.card_ is a tuple of integers which contains the number of nodes in each layer.
\n", "($*$) _nn.Nparam_ is the number of free coefficients of a neural network of type *nn*. Denoting $N=$_nn.N_ and $(a_0,a_1,\\dots,a_{N-1},a_N)=$*nn.card*, we have \n", "$$\n", "\\textit{nn.Nparam}=\\sum_{n=0}^{N-1}a_na_{n+1} +\\sum_{n=1}^N a_n.\n", "$$\n", "($*$) *nn.coef_bounds* is a 4-tuple of floats. It may be used when the coefficients of the neural network (weights and biasses) are picked randomly in the method *nn.create_rand*.
\n", "($*$) *nn.chi* is an implementation of the activation function $\\chi$.
\n", "($*$) *nn.chi_prime* is an implementation of the derivative $\\chi'$ of $\\chi$.
\n", "($*$) *nn.xx*, *nn.yy* and *nn.zz* are three 2D numpy arrays used in the graphic representations of the neural networks' outputs (in the method *nn.show_pred*).
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below, we create a typical object in the class *ToyNN*. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "CardNodes = (2, 4, 6, 5, 1)\n", "nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n", "\n", "print(f\"nn.N={nn.N}\")\n", "print(f\"nn.card={nn.card}\")\n", "print(f\"nn.coef_bounds={nn.coef_bounds}\")\n", "print(f\"nn.Nparam={nn.Nparam}\")\n", "\n", "chi, chi_prime = nn.chi, nn.chi_prime\n", "t=np.linspace(-3,3,100)\n", "\n", "plt.figure(figsize=(12,5))\n", "plt.subplot(121)\n", "plt.plot(t,chi(t),'b',label=r\"$\\chi$\")\n", "plt.legend(fontsize=20)\n", "plt.subplot(122)\n", "plt.plot(t,chi_prime(t),'r',label=r\"$\\chi'$\")\n", "plt.legend(fontsize=20)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the method *nn.create_rand()* we can build lists *A*$=$[*W,Bias*] where *W* and *Bias* are both lists of _N_ numpy arrays. The coefficients in these arrays are the parameters of a neural network. _W_ contains the weights of the edges and _Bias_ the weights of the nodes.
\n", "More precisely, for $n=0,\\dots,N-1$ denoting $a_n$ the number of nodes in the $n^{\\text{th}}$ layer:
\n", "      \n", "($*$) *W*[n][i,j]$=:w^{n}_{i,j}$ is the weight of the edge from the $i^{\\text{th}}$ node of layer $n$ to the $j^{\\text{th}}$ node of layer $n+1$.
\n", "      \n", "($*$) *Bias*[n][i]$=:b^{n}_{i}$ is the weight on the $i^{\\text{th}}$ node of layer $n+1$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__How is computed the ouput $h$*(X,A)* provided by a neural network given an input *X*$=:(x_0,x_1)$ ?__\n", "\n", "Let us number the nodes of layer $n$ as $q_j^n$ for $j=0,\\cdots,a_n-1$.
\n", "We define for each node $q_j^n$ of the layers $n\\in\\{0,\\dots,N-1\\}$ an output value $O^n_j$ and for each node $q_j^n$ of the layers $n\\in\\{1,\\dots,N\\}$ an input value $I^n_j$. These quantities are defined as follows.
\n", "      \n", "The layer 0 has two nodes $q^0_0$ and $q^0_1$. We set (recall that *X*$=(x_0,x_1)$), \n", "$$\n", "(q^0_0,q^0_1)\\quad \\longleftarrow\\quad (x_0,x_1).\n", "$$\n", "Then for $n=0,\\dots,N-2$,
\n", "      we set for $j=0,\\dots,a_{n+1}-1$,
\n", "$$\n", "\\begin{array}{rl}\n", "I_j^{n+1}&\\longleftarrow\\ \\displaystyle\\sum_{i=0}^{a_n-1}w^n_{i,j} O_i^n + b_j^{n+1},\\\\\n", "O_j^{n+1}&\\longleftarrow\\ \\chi(I_j^{n+1}).\n", "\\end{array}\n", "$$\n", "The input value $I^N_0$ associated to the unique node of the last layer is given by\n", "$$\n", "I_0^N\\longleftarrow\\ \\sum_{i=0}^{a_{N-1}-1}w^{N-1}_{i,0} O_i^{N-1} + b_0^N.\n", "$$\n", "The output of the neural network with coefficients $A=$[*W,Bias*] for the input data $x=(x_0,x_1)$ is then defined as\n", "$$\n", "h(x,A):=I_0^N.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Example__: below we define an object _nn_ in the class _ToyNN_ and use it to build a list *A*=[*W,Bias*] which contains the weights of a neural network.
\n", "These weights are chosen randomly and uniformly in $[w_-,w_+]$ for the $w^n_{i,j}$'s and in $[b_-,b_+]$ for the $b^n_i$'s where $(w_-,w_+,b_-,b_+)=$*nn.coef_bounds*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "CardNodes = (2, 3, 4, 2, 1)\n", "nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n", "A=nn.create_rand()\n", "for n in range(nn.N):\n", " print(f\"W[{n}]={A[0][n]}\\n\")\n", "for n in range(nn.N) : \n", " print(f\"Bias[{n}]={A[1][n]}\\n\")\n", " \n", "nn.show(A)\n", "text1=\" The width of the edges is proportional to the absolute values\"\n", "text2=\" of the corresponding weights.\\n The color depends on their signs:\"\n", "text3=\" red if negative, green if positive.\\n The nodes are colored\"\n", "text4=\" according to the sign of the corresponding biasses\"\n", "text5=\" with the same convention.\"\n", "print(text1 + text2 + text3 + text4 + text5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4. Methods for basic operations on lists of weights " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "top\n", "          \n", "          \n", "1.\n", "          \n", "          \n", "2.\n", "          \n", "          \n", "3.\n", "          \n", "          \n", "4.\n", "          \n", "          \n", "5.\n", "          \n", "          \n", "bot." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us test the basic operations available in the library.
\n", "We start by creating an object _pb_ of type *ToyPb*, an object _nn_ of type *ToyNN* and then two lists of random weights _A_, _B_.
\n", "In the sequel such objects are called _coef-lists_. For shortness the weights stored in _A_ are denoted $A_i$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pb = ToyPb(name = \"square\", bounds = (-1,1), loss_name = \"softplus\")\n", "\n", "CardNodes = (2, 3, 4, 1)\n", "nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n", "A = nn.create_rand()\n", "B = nn.create_rand()\n", "\n", "print(\"A:\")\n", "nn.show(A)\n", "print(\"B:\")\n", "nn.show(B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We add _A_ and 2.5 times _B_ and put the result in a new coef-list _C_, that is\n", "$$\n", "\\textit{C}\\ \\leftarrow\\ \\textit{A}+2.5\\times\\textit{B}.\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "C=nn.add(A,B,c=2.5)\n", "print(\"A + 2.5 x B:\")\n", "nn.show(C)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also put tHe result in _A_." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nn.add(A,B,c=2.5,output=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After this _C_ and _A_ should be equal. Let us check this." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "D=nn.add(A,C,c=-1)\n", "print(\"coefficients of D=A - C\")\n", "print(D[0])\n", "print(D[1])\n", "nn.show(D)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It can be also usefull to be create a zero coef-list. This is done by:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "E=nn.create_zero() \n", "print(\"a zero coef-list:\")\n", "nn.show(E)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make a (true, deep) copy of a coef_list, we do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "D=nn.copy(A)\n", "print(\"A:\")\n", "nn.show(A)\n", "print(\"D (copy of A):\")\n", "nn.show(D)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We change _A_ and check that _D_ has not been modified" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A=nn.create_rand()\n", "print(\"A after modification:\")\n", "nn.show(A)\n", "print(\"D:\")\n", "nn.show(D)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other methods for operations on coef_lists:
\n", "      \n", "($*$) if _c_ is a scalar and _A_ is a coef_list, *nn.scal_mult(A,c)* returns the coef list with weights *c*$\\times A_i$.
\n", "      \n", "($*$) if _A_ and _B_ are coef_lists *nn.dot(A,B)* returns the dot products of the two vectors containing all the coefficients of _A_ and _B_, that is \n", "$$\n", "\\sum_i A_i B_i.\n", "$$
\n", "      \n", "($*$) if _A_ is a coef_list, *nn.square(A)* returns a coef-list with the same structure as _A_ and with weights ${A_i}^{\\!2}$.
\n", "      \n", "($*$) if _A_ is a coef-list and _f_ is a numerical function (compatible with numpy) then *nn.maps(f,A)* returns the coef-list with weights $f(A_i)$.
\n", "      \n", "($*$) if _A_ and _B_ are coef-lists and *f* is a numerical function of two variables, then *nn.maps2(A,B)* returns the coef-list with weights $f(A_i,B_i)$.\n", "\n", "In the methods *square*, *maps* and *maps2*, it is possible to precise the parameter *output*$=$False. In this case the result is not returned but put in _A_. \n", "\n", "In the method *maps* (respectively *maps2*), the function _f_ may depend on an additional parameter, precised by *param*$=p$. In this case the computed weigths are $f(A_i,p)$ (resp. $f(A_i,B_i,p)$). See the examples below. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A=nn.create_rand()\n", "B=nn.create_rand()\n", "\n", "print(\"Test of nn.scal_mult:\")\n", "fact=3\n", "C=nn.scal_mult(A,fact)\n", "D=nn.add(A,C,c=-1/fact)\n", "print(\"A-(1/3)*(3*A)=\\n\",D)\n", "\n", "print(\"\\nTest of nn.dot:\")\n", "print(f\"nn.dot(A,B)={nn.dot(A,B):1.5e}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Test of nn.square:\")\n", "A2=nn.square(A)\n", "A4=nn.square(A2)\n", "print(\"A\")\n", "nn.show(A)\n", "print(\"A^2\")\n", "nn.show(A2)\n", "print(\"A^4\")\n", "nn.show(A4)\n", "\n", "print(\"\\nTest of nn.maps:\")\n", "f = lambda x:x**2\n", "fA=nn.maps(f,A)\n", "D=nn.add(A2,fA,-1)\n", "print(\"A^2-f(A) with f(x)=x^2\")\n", "nn.show(D)\n", "\n", "\n", "print(\"\\nTest of nn.maps2:\")\n", "f = lambda x,y:x*y\n", "fAA=nn.maps2(f,A,A)\n", "D=nn.add(A2,fAA,-1)\n", "print(\"A^2-f(A,A) with f(x,y)=x*y\")\n", "nn.show(D)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Test nn.maps with additional parameters:\")\n", "f = lambda x, p : np.sin(p[0]*x)+np.sin(p[1]*x)\n", "p=(np.pi/2,np.pi/6)\n", "fAp=nn.maps(f,A,param=p)\n", "print(\"sin(π/2 A) + sin(π/6 A):\")\n", "nn.show(fAp)\n", "\n", "print(\"\\nTest nn.maps2:\")\n", "f = lambda x,y,p: np.exp(p[0]*x + p[1]*y)\n", "p=(.5,-1.5)\n", "fABp=nn.maps2(f,A,B,param=p)\n", "print(\"exp(1/2 A - 3/2 B):\")\n", "nn.show(fABp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "top\n", "          \n", "          \n", "1.\n", "          \n", "          \n", "2.\n", "          \n", "          \n", "3.\n", "          \n", "          \n", "4.\n", "          \n", "          \n", "5.\n", "          \n", "          \n", "bot." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5. Methods for optimization " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this part, we present the following methods associated with an object _nn_ of type _ToyNN_. They take as arguments a coef-list _A_ and depending on the method: a numpy array _x_ with lenth 2 and/or a float _y_, or an object _data_ of type *nD_data* and/or an object _pb_ of type _ToyPb_.
\n", "      \n", "($*$) The method _nn.output_ computes the output $h(x,A)$ produced by a neural network with weights *A*$=A_i$ for a given input $x=(x_0,x_1)$.
\n", "      \n", "($*$) the method _nn.descent_ computes the opposite gradient of the function\n", "$$\n", "A\\mapsto \\ell\\left(h(x,A)\\times y\\right)\n", "$$\n", "where $A$, $x$ are as above, $y$ is a tag associated to $x$ and $\\ell$ is a loss function associated with some object _pb_ of type _ToyPb_.
\n", "      \n", "($*$) The method _nn.prediction_ computes the outputs of _A_ at the points of a data set _data_ of type *nD_data* and put the result in the array _data.Ypred_.
\n", "      \n", "($*$) The method *data.show_class* with the argument *pred*=True displays this predicted classification.
\n", "      \n", "($*$) The method *show_pred* computes the outputs _nn.zz_ predicted by a coef-list _A_ on a grid (*nn.xx*,*nn.yy*) and displays the result as a heat map.
\n", "      \n", "($*$) The method *nn.total_loss* computes the mean loss \n", "$$\n", "\\dfrac1{n_d}\\sum_{j=0}^{n_d-1} \\ell\\left(h(X_j,A)\\times y_j\\right),\n", "$$\n", "where the $X_j$'s and $y_j$'s are the points and tags in a data set _data_ of type *nD_data*. Namely, $n_d=$*data.n*, $X_j=$*data.X*[j], $y_j=$*data.Y*[j].
\n", "      \n", "($*$) The method *nn.total_loss_and_prediction* combines the methods *nn.total_loss* and *nn.prediction*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In general, the user does not need to call _nn.ouput_." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "CardNodes = (2, 3, 4, 2, 1)\n", "nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n", "A=nn.create_rand()\n", "\n", "x=np.array([0.5,-0.3])\n", "o=nn.output(A,x)\n", "print(f\"x={x}\")\n", "print(f\"output(A,x)={o:1.5f}\")\n", "\n", "x=np.array([-0.75,0.25])\n", "o=nn.output(A,x)\n", "print(f\"x={x}\")\n", "print(f\"output(A,x)={o:1.5f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The method nn_descent is the heart of gradient descent algorithms. It computes the opposite gradient with respect to the coefficienst $A_i$ of _A_ of the mapping\n", "$$\n", "F_{x,y}:A\\mapsto \\ell\\left(h(x,A)\\times y\\right).\n", "$$\n", "It returns a coef-list _dA_ with coefficients \n", "$$\n", "(dA)_i =-\\dfrac{\\partial F_{x,y}}{\\partial A_i}(A).\n", "$$\n", "It takes as arguments: a coef-list _A_, a np.array $x$ with length 2, a float _y_ and an object _pb_ of type _ToyPb_ (the loss function $\\ell$ is then *pb.loss*)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pb = ToyPb(name = \"sin\", bounds = (-1,1), loss_name = \"softplus\")\n", "\n", "CardNodes = (2, 3, 4, 2, 1)\n", "nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n", "\n", "A=nn.create_rand()\n", "\n", "x=np.array([-0.75,0.25])\n", "y=1\n", "\n", "dA=nn.descent(A,x,y,pb=pb)\n", "\n", "print(f\"x={x}, y={y}\")\n", "print(f\"dA=-Gradient Fxy(A)\")\n", "nn.show(dA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are optional arguments _alpha_ and _B_.
\n", "If the float _alpha_ is specified, the weights of the returned coef-list are \n", "$$\n", "(dA)_i =-\\alpha\\dfrac{\\partial F}{\\partial A_i}(A).\n", "$$\n", "If a coef-list _B_ is specified, the result is not returned but added to _B_. This is handy when using a mini-batch method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x=np.array([-0.75,0.25])\n", "y=1\n", "\n", "print(\"Test of the parameter alpha.\")\n", "dA=nn.descent(A,x,y,pb=pb)\n", "dA_one_half=nn.descent(A,x,y,alpha=1/2, pb=pb)\n", "\n", "D=nn.add(dA,dA_one_half,-2)\n", "print(\"dA(alpha=1) - 2xdA(alpha=1/2)\")\n", "nn.show(D)\n", "\n", "print(\"Test of the parameter B.\")\n", "\n", "DA=nn.create_zero()\n", "\n", "x=np.array([-0.75,0.25])\n", "y=1\n", "nn.descent(A,x,y, B=DA, pb=pb)\n", "print(\"DA after one contribution\")\n", "nn.show(DA)\n", "\n", "x=np.array([0.5,-0.2])\n", "y=-1\n", "print(\"DA after two contributions\")\n", "nn.descent(A,x,y, B=DA, pb=pb)\n", "nn.show(DA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The method _nn.prediction_ uses the method _nn.output()_ to compute the predictions of the neural network on the point of a data set _data_ of type *nD_data* and store the result in _data.Ypred_ \n", "\n", "The method *data.show_class(pred=True)* displays these predictions.\n", "\n", "The method *nn.show_pred* compute the predictions of the neural network on a grid and displays these predictions as a heat map." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "pb = ToyPb(name = \"ring\", bounds = (-1,1), loss_name = \"softplus\")\n", "\n", "data = nD_data(n=500, pb=pb)\n", "\n", "CardNodes = (2, 3, 4, 2, 1)\n", "nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n", "A=nn.create_rand()\n", "\n", "\n", "data.show_class()\n", "pb.show_border('k--')\n", "plt.axis('off')\n", "plt.title(\"Correct answer\", fontsize=15)\n", "plt.show()\n", "\n", "\n", "nn.prediction(A, data)\n", "\n", "data.show_class(pred=True)\n", "nn.show_pred(A)\n", "pb.show_border('k--')\n", "plt.title(\"predictions of a random A\", fontsize=15)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To assess the performance of a coef-list _A_ for a given problem _pb_ on a given data set *data*, we use the method *total_loss*. It returns,\n", "$$\n", "\\dfrac1{n_d}\\sum_{i=0}^{n_d-1} \\ell\\left(h(X_i,A)\\times y_i\\right),\n", "$$\n", "where $n_d=$*data.n*, the $X_i$'s and $y_i$'s are the points and tags in _data.X_ and _data.Y_ and $\\ell$ is the function _pb.loss_.
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pb = ToyPb(name = \"ring\", bounds = (-1,1), loss_name = \"softplus\")\n", "\n", "data = nD_data(n=500, pb=pb)\n", "\n", "CardNodes = (2, 3, 4, 2, 1)\n", "nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n", "A=nn.create_rand()\n", "\n", "error = nn.total_loss(A,data,pb=pb)\n", "print(error)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The method *nn.total_loss_and_prediction* combines the effects of *nn.total_loss* and *nn.prediction*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "error2 = nn.total_loss_and_prediction(A,data,pb=pb)\n", "print(f\"error ={error},\\nerror2={error2}\")\n", "\n", "data.show_class(pred=True)\n", "nn.show_pred(A)\n", "pb.show_border('k--')\n", "plt.title(\"predictions of A\", fontsize=15)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "top\n", "          \n", "          \n", "1.\n", "          \n", "          \n", "2.\n", "          \n", "          \n", "3.\n", "          \n", "          \n", "4.\n", "          \n", "          \n", "5.\n", "          \n", "          \n", "bot.\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "colab": { "collapsed_sections": [ "GafO0zXoJ6Cx", "5l_mvC1OJ6Da", "ZzS5-IzwaKn3", "89AjhkJ2aKoB" ], "name": "ToyNN_class.ipynb", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 1 }