"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Ie13pHt3gyh8"
},
"source": [
"### Standard libraries and the three classes of the tool box toynn_2022 (ToyPb, nD_data, ToyNN)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from toynn_2023 import *\n",
"# performs the following:\n",
"# import numpy as np\n",
"# from numpy import random as nprd\n",
"# from matplotlib import pyplot as plt\n",
"# from matplotlib import cm as cm\n",
"# from copy import deepcopy as dcp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"top\n",
" \n",
" \n",
" \n",
"1.\n",
" \n",
" \n",
" \n",
"2.\n",
" \n",
" \n",
" \n",
"3.\n",
" \n",
" \n",
" \n",
"4.\n",
" \n",
" \n",
" \n",
"5.\n",
" \n",
" \n",
" \n",
"bot."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Woacru-MVEgA"
},
"source": [
"# 1. The class ToyPb "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Attributes of an object in the class ToyPb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An object in the class ToyPb contains some information about a classification problem for points in a rectangle.\n",
"\n",
"If _pb_ is in this class: \n",
"($*$) _pb.name_ is a chain. \n",
"($*$) _pb.bounds_ is a tuple of floats $(x_0^-,x_0^+,x_1^-,x_1^+)$ which defines the rectangle. \n",
"($*$) _pb.f_ is an implementation of a numerical function $f(x_0,x_1)$. The classification problem is the following. Given $x=(x_0,x_1)\\in[x_0^-,x_0^+]\\times[x_1^-,x_1^+]$ determine whether $x$ belongs to $\\Omega$ where \n",
"$$\n",
"\\Omega:=\\left\\{x \\in[x_0^-,x_0^+]\\times[x_1^-,x_1^+]: f(x)<0\\right\\}.\n",
"$$\n",
"\n",
"There are two other attributes. \n",
"($*$) _pb.loss_ is an implementation of a numerical function $\\ell$. \n",
"($*$) *pb.loss_prime* is an implementation of the derivative $\\ell'$ of $\\ell$. \n",
"The _``loss function''_ $\\ell:\\mathbb{R}\\to\\mathbb{R}$ is used to estimate the error of predictions. \n",
"Given a prediction $\\hat y\\in\\mathbb{R}$ and the correct classification:\n",
"$$\n",
"y= \\begin{cases}-1&\\text{if }x\\not\\in\\Omega,\\\\\n",
"\\,\\,1&\\text{if }x\\in\\Omega,\n",
"\\end{cases}\n",
"$$\n",
"the error (or cost) is measured by $\\ell(\\hat y y)$. The function $\\ell$ should be nondecreasing with $\\ell\\ge0$ and $\\ell(t)$ close to $0$ for large positive values of $t$. The ideal loss function would be \n",
"$$\n",
"\\ell_{ideal}(t)= \\begin{cases}\\,\\ 0&\\text{if }t>0,\\\\\n",
"+\\infty&\\text{if }t\\le0.\n",
"\\end{cases}\n",
"$$\n",
"This would give a zero cost to predictions $y$ with the correct sign and an infinite cost to the others. \n",
"However, to apply the gradient-descent methods we pick a smooth decreasing function for $\\ell$. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below, is an example of creation and manipulation of an obect in the class *ToyPb*. \n",
"__Remark:__ The method *show_border()* displays the boundary of the region $\\Omega$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"pb = ToyPb(name = \"disk\", bounds = (-1,1), loss_name = \"softplus\")\n",
"\n",
"\n",
"print(f\"pb.name={pb.name}, pb.bounds={pb.bounds}\")\n",
"\n",
"\n",
"pb.show_border()\n",
"plt.title(f\"boundary of $\\Omega$\",fontsize=20)\n",
"plt.show()\n",
"\n",
"loss, loss_prime = pb.loss, pb.loss_prime\n",
"t=np.linspace(-3,3,300)\n",
"\n",
"\n",
"plt.figure(figsize=(12,5))\n",
"plt.subplot(121)\n",
"plt.plot(t,loss(t),'b',label=r\"$\\ell$\")\n",
"plt.legend(fontsize=20)\n",
"\n",
"plt.subplot(122)\n",
"plt.plot(t,loss_prime(t),'r',label=r\"$\\ell'$\")\n",
"plt.legend(fontsize=20)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can try with _name_ = \"square\", \"sin\" or \"ring\" and with *loss_name* = \"demanding\"."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The \\_\\_init\\_\\_ method of the class ToyPb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Parameters of the \\_\\_init\\_\\_ method**\n",
"\n",
"\\*\\*kwargs : \n",
" \n",
"_f_ : numerical function $(x_0,x_1)\\in\\mathbb{R}^2\\mapsto f(x_0,x_1)\\in\\mathbb{R}$ (optional if _name_ is given) \n",
" \n",
"_name_ : string (optional if _f_ is given) \n",
" \n",
"*bounds*=(-1,1) : a tuple of 2 or 4 floats \n",
" \n",
"*loss*, *loss_prime* : numerical functions: $\\mathbb{R}\\to\\mathbb{R}$ (optional) \n",
" \n",
"*loss_name*: string (optional)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Behaviour of the \\_\\_init\\_\\_ method.**\n",
"\n",
"The possible values for _name_ are \"sin\", \"affine\", \"disk\", \"square\", \"ring\".\n",
"\n",
"If _bounds_ is the tuple ($x_-$,$x_+$) with length 2, then *pb.bounds* receives the value ($x_-$,$x_+$,$x_-$,$x_+$). \n",
"If _bounds_ is a tuple with length 4, *pb.bounds* receives the value _bounds_. \n",
"\n",
"The possible values for *loss_name* are \"softplus\" and \"demanding\". \n",
"For \"softplus\", \n",
"$$\n",
"\\ell(t)=\\ln\\left(1 + e^{-t}\\right).\n",
"$$\n",
"For \"demanding\", \n",
"$$\n",
"\\ell(t)=\\sqrt{(t - 1)^2+1/10} - t + 1\n",
"$$\n",
"If the parameters *loss_name* and at least one of the parameters *loss* or *loss_prime* is not specified, then the deffect value for the loss function is:\n",
"$$\n",
"\\ell(t)=\\sqrt{t^2+1/10} - t.\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. The class nD_data "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"top\n",
" \n",
" \n",
"1.\n",
" \n",
" \n",
"2.\n",
" \n",
" \n",
"3.\n",
" \n",
" \n",
"4.\n",
" \n",
" \n",
"5.\n",
" \n",
" \n",
"bot."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An object of the class *nD_data* essentially contains: \n",
"($*$) a set of points of the plane: $x^i=(x^i_0,x^i_1)$ for $i=0,\\dots,n-1$, \n",
"($*$) a set of labels $y^i\\in\\{-1,1\\}$ for $i=0,\\dots,n-1$ corresponding to the exact classification of the points $(x^i_0,x^i_1)$ with respect to a problem *pb* in the class _ToyPb_., \n",
"($*$) possibly a set of predictions $y^i_{pred}\\in\\mathbb{R}$ for $i=0,\\dots,n-1$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If _data_ is in this class: \n",
"($*$) _data.n_ is an integer (the size of the sets). \n",
"($*$) _data.X_ is a numpy array of size $n\\times2$. With the above notation, *data.X*[i,0]$=x^i_0$, *data.X*[i,1]$=x^i_1$. \n",
"($*$) _data.Y_ is a numpy array of length $n$. With the above notation, *data.Y*[i]$=y^i$. \n",
"($*$) *data.Ypred* is also a numpy array of length $n$ and *data.Y*[i]$=y_{pred}^i$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Remarks:__ \n",
"(a) *data.Ypred* is created only if *init_pred=True*. In this case it is initialized as a zero numpy array. \n",
"(b) For computing *data.Y* it is necessary to specify an object *pb* in the class *ToyPb*. The numpy array *data.Y* is then created according to the rule:\n",
"$$\n",
"\\textit{data.Y}\\text{[i]}:=\n",
"\\begin{cases}\n",
"-1&\\text{if }\\textit{pb.f}(\\textit{data.X}\\text{[i]})\\geq0,\\\\\n",
"\\phantom{-}1&\\text{if }\\textit{pb.f}(\\textit{data.X}\\text{[i]})<0.\n",
"\\end{cases}\n",
"$$\n",
"(c) The method *show_class()* displays the classification."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pb = ToyPb(name = \"disk\", bounds = (-1,1), loss_name = \"softplus\")\n",
"\n",
"ndata = 1000\n",
"data = nD_data(n = ndata, pb = pb, init_pred=True)\n",
"print(f\"data.n={data.n}\")\n",
"print(f\"data.X.shape={data.X.shape}\")\n",
"print(f\"data.Y.shape={data.Y.shape}\")\n",
"print(f\"data.Ypred.shape={data.Ypred.shape}\")\n",
"\n",
"\n",
"data.show_class()\n",
"\n",
"pb.show_border('k--')\n",
"\n",
"plt.legend(loc=1,fontsize=12)\n",
"title1=\"Values of data.Y[i] on the points with\"\n",
"title2=\" coordinates (data.X[i,0],data.X[i,1])\"\n",
"plt.title(title1 + title2, fontsize=15)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The \\_\\_init\\_\\_ method of the class nD_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Parameters of the \\_\\_init\\_\\_ method**\n",
"\n",
"\\*\\*kwargs : \n",
" \n",
"($*$) _n_ : integer (>0) \n",
" \n",
"($*$) _X_ : numpy array of shape *n*$\\times$*2* \n",
" \n",
"($*$) _Y_ : numpy array of length _n_ \n",
" \n",
"($*$) _f_ : numerical function $(x_0,x_1)\\in\\mathbb{R}^2\\mapsto f(x_0,x_1)\\in\\mathbb{R}$ (optional) \n",
" \n",
"($*$) _pb_ : object of type ToyPb \n",
" \n",
"($*$) *bounds*=(-1,1) : a tuple of 2 or 4 floats \n",
" \n",
"($*$) *init_pred*=None : boolean"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Behaviour of the \\_\\_init\\_\\_ method.**\n",
"\n",
"If *bounds*$=(x_0^-,x_0^+)$ then the attribute _bounds_ receives $(x_0^-,x_0^+,x_0^-,x_0^+)$. \n",
"If *bounds*$=(x_0^-,x_0^+,x_1^-,x_1^+)$ then the attribute _bounds_ receives _bounds_. \n",
"In the sequel, we denote _bounds_$=(x_0^-,x_0^+,x_1^-,x_1^+)$.\n",
"\n",
"If _X_ and _Y_ are given they are sent to the corresponding attributes of the object. \n",
"\n",
"If _X_ and _Y_ are not given, the \\_\\_init\\_\\_ method creates two atributes _X_ and _Y_. \n",
"_X_ is a numpy array of size _n_$\\times$_2_. The coefficients of _X_ are picked randomly *X*[i,0] is picked in $[x_0^-,x_0^+]$ and *X*[i,1] in $[x_1^-,x_1^+]$. \n",
"_Y_ is a numpy array of length _n_ which is defined with the rule.\n",
"$$\n",
"\\textit{Y}\\text{[i]}:=\n",
"\\begin{cases}\n",
"-1&\\text{if }\\textit{g}(\\textit{X}\\text{[i,0]},\\textit{X}\\text{[i,1]})\\geq0,\\\\\n",
"\\ 1&\\text{if }\\textit{g}(\\textit{X}\\text{[i,0]},\\textit{X}\\text{[i,1]})<0,\n",
"\\end{cases}\n",
"$$\n",
"where _g_=_f_ if _f_ is specified and _g_=_pb.f_ if not.\n",
"\n",
"\n",
"If *init_pred=True* then an attribute *Ypred* is created which receives a zero numpy array of length _n_ (an array with the same shape as _Y_ and with zero entries)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. The class toyNN "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"top\n",
" \n",
" \n",
" \n",
"1.\n",
" \n",
" \n",
" \n",
"2.\n",
" \n",
" \n",
" \n",
"3.\n",
" \n",
" \n",
" \n",
"4.\n",
" \n",
" \n",
" \n",
"5.\n",
" \n",
" \n",
" \n",
"bot."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An object of the class _toyNN_ contains the characteristics of a neural network (number of hidden layers, number of nodes in each layer, activation function) __but__ not the coefficients of a specific neural network with this shape. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The number of layers and nodes in a neural network is described as a tuple\n",
"$$\n",
"\\text{CardNodes}=(a_0,a_1,\\dots,a_{N-1},a_N),\n",
"$$\n",
"where $a_n$ is the number of nodes in the $n^{\\text{th}}$ layer . \n",
"There are $N-1$ hidden layers. \n",
"The neural networks of interest for the classification problems of part __1__ have two input nodes and one output node. Hence \n",
"$$\n",
"a_0=2\\qquad\\text{ and }\\qquad a_N=1.\n",
"$$\n",
"(In the optimization process, the two input nodes will be fed with the coordinates $(x_0,x_1)$ of the points to classify. The output will be a real number that we want to be positive for input points in $\\Omega$ and negative in the other cases.) \n",
"The neural networks are also characterized by an activation function $\\chi$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The atributes of an object *nn* in this class are the following. \n",
"($*$) _nn.N_ is an integer. The number of hidden layers is *nn.N*$-1$. \n",
"($*$) _nn.card_ is a tuple of integers which contains the number of nodes in each layer. \n",
"($*$) _nn.Nparam_ is the number of free coefficients of a neural network of type *nn*. Denoting $N=$_nn.N_ and $(a_0,a_1,\\dots,a_{N-1},a_N)=$*nn.card*, we have \n",
"$$\n",
"\\textit{nn.Nparam}=\\sum_{n=0}^{N-1}a_na_{n+1} +\\sum_{n=1}^N a_n.\n",
"$$\n",
"($*$) *nn.coef_bounds* is a 4-tuple of floats. It may be used when the coefficients of the neural network (weights and biasses) are picked randomly in the method *nn.create_rand*. \n",
"($*$) *nn.chi* is an implementation of the activation function $\\chi$. \n",
"($*$) *nn.chi_prime* is an implementation of the derivative $\\chi'$ of $\\chi$. \n",
"($*$) *nn.xx*, *nn.yy* and *nn.zz* are three 2D numpy arrays used in the graphic representations of the neural networks' outputs (in the method *nn.show_pred*). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below, we create a typical object in the class *ToyNN*. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"CardNodes = (2, 4, 6, 5, 1)\n",
"nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n",
"\n",
"print(f\"nn.N={nn.N}\")\n",
"print(f\"nn.card={nn.card}\")\n",
"print(f\"nn.coef_bounds={nn.coef_bounds}\")\n",
"print(f\"nn.Nparam={nn.Nparam}\")\n",
"\n",
"chi, chi_prime = nn.chi, nn.chi_prime\n",
"t=np.linspace(-3,3,100)\n",
"\n",
"plt.figure(figsize=(12,5))\n",
"plt.subplot(121)\n",
"plt.plot(t,chi(t),'b',label=r\"$\\chi$\")\n",
"plt.legend(fontsize=20)\n",
"plt.subplot(122)\n",
"plt.plot(t,chi_prime(t),'r',label=r\"$\\chi'$\")\n",
"plt.legend(fontsize=20)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With the method *nn.create_rand()* we can build lists *A*$=$[*W,Bias*] where *W* and *Bias* are both lists of _N_ numpy arrays. The coefficients in these arrays are the parameters of a neural network. _W_ contains the weights of the edges and _Bias_ the weights of the nodes. \n",
"More precisely, for $n=0,\\dots,N-1$ denoting $a_n$ the number of nodes in the $n^{\\text{th}}$ layer: \n",
" \n",
"($*$) *W*[n][i,j]$=:w^{n}_{i,j}$ is the weight of the edge from the $i^{\\text{th}}$ node of layer $n$ to the $j^{\\text{th}}$ node of layer $n+1$. \n",
" \n",
"($*$) *Bias*[n][i]$=:b^{n}_{i}$ is the weight on the $i^{\\text{th}}$ node of layer $n+1$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__How is computed the ouput $h$*(X,A)* provided by a neural network given an input *X*$=:(x_0,x_1)$ ?__\n",
"\n",
"Let us number the nodes of layer $n$ as $q_j^n$ for $j=0,\\cdots,a_n-1$. \n",
"We define for each node $q_j^n$ of the layers $n\\in\\{0,\\dots,N-1\\}$ an output value $O^n_j$ and for each node $q_j^n$ of the layers $n\\in\\{1,\\dots,N\\}$ an input value $I^n_j$. These quantities are defined as follows. \n",
" \n",
"The layer 0 has two nodes $q^0_0$ and $q^0_1$. We set (recall that *X*$=(x_0,x_1)$), \n",
"$$\n",
"(q^0_0,q^0_1)\\quad \\longleftarrow\\quad (x_0,x_1).\n",
"$$\n",
"Then for $n=0,\\dots,N-2$, \n",
" we set for $j=0,\\dots,a_{n+1}-1$, \n",
"$$\n",
"\\begin{array}{rl}\n",
"I_j^{n+1}&\\longleftarrow\\ \\displaystyle\\sum_{i=0}^{a_n-1}w^n_{i,j} O_i^n + b_j^{n+1},\\\\\n",
"O_j^{n+1}&\\longleftarrow\\ \\chi(I_j^{n+1}).\n",
"\\end{array}\n",
"$$\n",
"The input value $I^N_0$ associated to the unique node of the last layer is given by\n",
"$$\n",
"I_0^N\\longleftarrow\\ \\sum_{i=0}^{a_{N-1}-1}w^{N-1}_{i,0} O_i^{N-1} + b_0^N.\n",
"$$\n",
"The output of the neural network with coefficients $A=$[*W,Bias*] for the input data $x=(x_0,x_1)$ is then defined as\n",
"$$\n",
"h(x,A):=I_0^N.\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Example__: below we define an object _nn_ in the class _ToyNN_ and use it to build a list *A*=[*W,Bias*] which contains the weights of a neural network. \n",
"These weights are chosen randomly and uniformly in $[w_-,w_+]$ for the $w^n_{i,j}$'s and in $[b_-,b_+]$ for the $b^n_i$'s where $(w_-,w_+,b_-,b_+)=$*nn.coef_bounds*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"CardNodes = (2, 3, 4, 2, 1)\n",
"nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n",
"A=nn.create_rand()\n",
"for n in range(nn.N):\n",
" print(f\"W[{n}]={A[0][n]}\\n\")\n",
"for n in range(nn.N) : \n",
" print(f\"Bias[{n}]={A[1][n]}\\n\")\n",
" \n",
"nn.show(A)\n",
"text1=\" The width of the edges is proportional to the absolute values\"\n",
"text2=\" of the corresponding weights.\\n The color depends on their signs:\"\n",
"text3=\" red if negative, green if positive.\\n The nodes are colored\"\n",
"text4=\" according to the sign of the corresponding biasses\"\n",
"text5=\" with the same convention.\"\n",
"print(text1 + text2 + text3 + text4 + text5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 4. Methods for basic operations on lists of weights "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"top\n",
" \n",
" \n",
"1.\n",
" \n",
" \n",
"2.\n",
" \n",
" \n",
"3.\n",
" \n",
" \n",
"4.\n",
" \n",
" \n",
"5.\n",
" \n",
" \n",
"bot."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us test the basic operations available in the library. \n",
"We start by creating an object _pb_ of type *ToyPb*, an object _nn_ of type *ToyNN* and then two lists of random weights _A_, _B_. \n",
"In the sequel such objects are called _coef-lists_. For shortness the weights stored in _A_ are denoted $A_i$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pb = ToyPb(name = \"square\", bounds = (-1,1), loss_name = \"softplus\")\n",
"\n",
"CardNodes = (2, 3, 4, 1)\n",
"nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n",
"A = nn.create_rand()\n",
"B = nn.create_rand()\n",
"\n",
"print(\"A:\")\n",
"nn.show(A)\n",
"print(\"B:\")\n",
"nn.show(B)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We add _A_ and 2.5 times _B_ and put the result in a new coef-list _C_, that is\n",
"$$\n",
"\\textit{C}\\ \\leftarrow\\ \\textit{A}+2.5\\times\\textit{B}.\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"C=nn.add(A,B,c=2.5)\n",
"print(\"A + 2.5 x B:\")\n",
"nn.show(C)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also put tHe result in _A_."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nn.add(A,B,c=2.5,output=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After this _C_ and _A_ should be equal. Let us check this."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"D=nn.add(A,C,c=-1)\n",
"print(\"coefficients of D=A - C\")\n",
"print(D[0])\n",
"print(D[1])\n",
"nn.show(D)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It can be also usefull to be create a zero coef-list. This is done by:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"E=nn.create_zero() \n",
"print(\"a zero coef-list:\")\n",
"nn.show(E)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To make a (true, deep) copy of a coef_list, we do:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"D=nn.copy(A)\n",
"print(\"A:\")\n",
"nn.show(A)\n",
"print(\"D (copy of A):\")\n",
"nn.show(D)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We change _A_ and check that _D_ has not been modified"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A=nn.create_rand()\n",
"print(\"A after modification:\")\n",
"nn.show(A)\n",
"print(\"D:\")\n",
"nn.show(D)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Other methods for operations on coef_lists: \n",
" \n",
"($*$) if _c_ is a scalar and _A_ is a coef_list, *nn.scal_mult(A,c)* returns the coef list with weights *c*$\\times A_i$. \n",
" \n",
"($*$) if _A_ and _B_ are coef_lists *nn.dot(A,B)* returns the dot products of the two vectors containing all the coefficients of _A_ and _B_, that is \n",
"$$\n",
"\\sum_i A_i B_i.\n",
"$$ \n",
" \n",
"($*$) if _A_ is a coef_list, *nn.square(A)* returns a coef-list with the same structure as _A_ and with weights ${A_i}^{\\!2}$. \n",
" \n",
"($*$) if _A_ is a coef-list and _f_ is a numerical function (compatible with numpy) then *nn.maps(f,A)* returns the coef-list with weights $f(A_i)$. \n",
" \n",
"($*$) if _A_ and _B_ are coef-lists and *f* is a numerical function of two variables, then *nn.maps2(A,B)* returns the coef-list with weights $f(A_i,B_i)$.\n",
"\n",
"In the methods *square*, *maps* and *maps2*, it is possible to precise the parameter *output*$=$False. In this case the result is not returned but put in _A_. \n",
"\n",
"In the method *maps* (respectively *maps2*), the function _f_ may depend on an additional parameter, precised by *param*$=p$. In this case the computed weigths are $f(A_i,p)$ (resp. $f(A_i,B_i,p)$). See the examples below. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A=nn.create_rand()\n",
"B=nn.create_rand()\n",
"\n",
"print(\"Test of nn.scal_mult:\")\n",
"fact=3\n",
"C=nn.scal_mult(A,fact)\n",
"D=nn.add(A,C,c=-1/fact)\n",
"print(\"A-(1/3)*(3*A)=\\n\",D)\n",
"\n",
"print(\"\\nTest of nn.dot:\")\n",
"print(f\"nn.dot(A,B)={nn.dot(A,B):1.5e}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Test of nn.square:\")\n",
"A2=nn.square(A)\n",
"A4=nn.square(A2)\n",
"print(\"A\")\n",
"nn.show(A)\n",
"print(\"A^2\")\n",
"nn.show(A2)\n",
"print(\"A^4\")\n",
"nn.show(A4)\n",
"\n",
"print(\"\\nTest of nn.maps:\")\n",
"f = lambda x:x**2\n",
"fA=nn.maps(f,A)\n",
"D=nn.add(A2,fA,-1)\n",
"print(\"A^2-f(A) with f(x)=x^2\")\n",
"nn.show(D)\n",
"\n",
"\n",
"print(\"\\nTest of nn.maps2:\")\n",
"f = lambda x,y:x*y\n",
"fAA=nn.maps2(f,A,A)\n",
"D=nn.add(A2,fAA,-1)\n",
"print(\"A^2-f(A,A) with f(x,y)=x*y\")\n",
"nn.show(D)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Test nn.maps with additional parameters:\")\n",
"f = lambda x, p : np.sin(p[0]*x)+np.sin(p[1]*x)\n",
"p=(np.pi/2,np.pi/6)\n",
"fAp=nn.maps(f,A,param=p)\n",
"print(\"sin(π/2 A) + sin(π/6 A):\")\n",
"nn.show(fAp)\n",
"\n",
"print(\"\\nTest nn.maps2:\")\n",
"f = lambda x,y,p: np.exp(p[0]*x + p[1]*y)\n",
"p=(.5,-1.5)\n",
"fABp=nn.maps2(f,A,B,param=p)\n",
"print(\"exp(1/2 A - 3/2 B):\")\n",
"nn.show(fABp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"top\n",
" \n",
" \n",
"1.\n",
" \n",
" \n",
"2.\n",
" \n",
" \n",
"3.\n",
" \n",
" \n",
"4.\n",
" \n",
" \n",
"5.\n",
" \n",
" \n",
"bot."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 5. Methods for optimization "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this part, we present the following methods associated with an object _nn_ of type _ToyNN_. They take as arguments a coef-list _A_ and depending on the method: a numpy array _x_ with lenth 2 and/or a float _y_, or an object _data_ of type *nD_data* and/or an object _pb_ of type _ToyPb_. \n",
" \n",
"($*$) The method _nn.output_ computes the output $h(x,A)$ produced by a neural network with weights *A*$=A_i$ for a given input $x=(x_0,x_1)$. \n",
" \n",
"($*$) the method _nn.descent_ computes the opposite gradient of the function\n",
"$$\n",
"A\\mapsto \\ell\\left(h(x,A)\\times y\\right)\n",
"$$\n",
"where $A$, $x$ are as above, $y$ is a tag associated to $x$ and $\\ell$ is a loss function associated with some object _pb_ of type _ToyPb_. \n",
" \n",
"($*$) The method _nn.prediction_ computes the outputs of _A_ at the points of a data set _data_ of type *nD_data* and put the result in the array _data.Ypred_. \n",
" \n",
"($*$) The method *data.show_class* with the argument *pred*=True displays this predicted classification. \n",
" \n",
"($*$) The method *show_pred* computes the outputs _nn.zz_ predicted by a coef-list _A_ on a grid (*nn.xx*,*nn.yy*) and displays the result as a heat map. \n",
" \n",
"($*$) The method *nn.total_loss* computes the mean loss \n",
"$$\n",
"\\dfrac1{n_d}\\sum_{j=0}^{n_d-1} \\ell\\left(h(X_j,A)\\times y_j\\right),\n",
"$$\n",
"where the $X_j$'s and $y_j$'s are the points and tags in a data set _data_ of type *nD_data*. Namely, $n_d=$*data.n*, $X_j=$*data.X*[j], $y_j=$*data.Y*[j]. \n",
" \n",
"($*$) The method *nn.total_loss_and_prediction* combines the methods *nn.total_loss* and *nn.prediction*."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In general, the user does not need to call _nn.ouput_."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"CardNodes = (2, 3, 4, 2, 1)\n",
"nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n",
"A=nn.create_rand()\n",
"\n",
"x=np.array([0.5,-0.3])\n",
"o=nn.output(A,x)\n",
"print(f\"x={x}\")\n",
"print(f\"output(A,x)={o:1.5f}\")\n",
"\n",
"x=np.array([-0.75,0.25])\n",
"o=nn.output(A,x)\n",
"print(f\"x={x}\")\n",
"print(f\"output(A,x)={o:1.5f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The method nn_descent is the heart of gradient descent algorithms. It computes the opposite gradient with respect to the coefficienst $A_i$ of _A_ of the mapping\n",
"$$\n",
"F_{x,y}:A\\mapsto \\ell\\left(h(x,A)\\times y\\right).\n",
"$$\n",
"It returns a coef-list _dA_ with coefficients \n",
"$$\n",
"(dA)_i =-\\dfrac{\\partial F_{x,y}}{\\partial A_i}(A).\n",
"$$\n",
"It takes as arguments: a coef-list _A_, a np.array $x$ with length 2, a float _y_ and an object _pb_ of type _ToyPb_ (the loss function $\\ell$ is then *pb.loss*)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pb = ToyPb(name = \"sin\", bounds = (-1,1), loss_name = \"softplus\")\n",
"\n",
"CardNodes = (2, 3, 4, 2, 1)\n",
"nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n",
"\n",
"A=nn.create_rand()\n",
"\n",
"x=np.array([-0.75,0.25])\n",
"y=1\n",
"\n",
"dA=nn.descent(A,x,y,pb=pb)\n",
"\n",
"print(f\"x={x}, y={y}\")\n",
"print(f\"dA=-Gradient Fxy(A)\")\n",
"nn.show(dA)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are optional arguments _alpha_ and _B_. \n",
"If the float _alpha_ is specified, the weights of the returned coef-list are \n",
"$$\n",
"(dA)_i =-\\alpha\\dfrac{\\partial F}{\\partial A_i}(A).\n",
"$$\n",
"If a coef-list _B_ is specified, the result is not returned but added to _B_. This is handy when using a mini-batch method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x=np.array([-0.75,0.25])\n",
"y=1\n",
"\n",
"print(\"Test of the parameter alpha.\")\n",
"dA=nn.descent(A,x,y,pb=pb)\n",
"dA_one_half=nn.descent(A,x,y,alpha=1/2, pb=pb)\n",
"\n",
"D=nn.add(dA,dA_one_half,-2)\n",
"print(\"dA(alpha=1) - 2xdA(alpha=1/2)\")\n",
"nn.show(D)\n",
"\n",
"print(\"Test of the parameter B.\")\n",
"\n",
"DA=nn.create_zero()\n",
"\n",
"x=np.array([-0.75,0.25])\n",
"y=1\n",
"nn.descent(A,x,y, B=DA, pb=pb)\n",
"print(\"DA after one contribution\")\n",
"nn.show(DA)\n",
"\n",
"x=np.array([0.5,-0.2])\n",
"y=-1\n",
"print(\"DA after two contributions\")\n",
"nn.descent(A,x,y, B=DA, pb=pb)\n",
"nn.show(DA)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The method _nn.prediction_ uses the method _nn.output()_ to compute the predictions of the neural network on the point of a data set _data_ of type *nD_data* and store the result in _data.Ypred_ \n",
"\n",
"The method *data.show_class(pred=True)* displays these predictions.\n",
"\n",
"The method *nn.show_pred* compute the predictions of the neural network on a grid and displays these predictions as a heat map."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"pb = ToyPb(name = \"ring\", bounds = (-1,1), loss_name = \"softplus\")\n",
"\n",
"data = nD_data(n=500, pb=pb)\n",
"\n",
"CardNodes = (2, 3, 4, 2, 1)\n",
"nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n",
"A=nn.create_rand()\n",
"\n",
"\n",
"data.show_class()\n",
"pb.show_border('k--')\n",
"plt.axis('off')\n",
"plt.title(\"Correct answer\", fontsize=15)\n",
"plt.show()\n",
"\n",
"\n",
"nn.prediction(A, data)\n",
"\n",
"data.show_class(pred=True)\n",
"nn.show_pred(A)\n",
"pb.show_border('k--')\n",
"plt.title(\"predictions of a random A\", fontsize=15)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To assess the performance of a coef-list _A_ for a given problem _pb_ on a given data set *data*, we use the method *total_loss*. It returns,\n",
"$$\n",
"\\dfrac1{n_d}\\sum_{i=0}^{n_d-1} \\ell\\left(h(X_i,A)\\times y_i\\right),\n",
"$$\n",
"where $n_d=$*data.n*, the $X_i$'s and $y_i$'s are the points and tags in _data.X_ and _data.Y_ and $\\ell$ is the function _pb.loss_. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pb = ToyPb(name = \"ring\", bounds = (-1,1), loss_name = \"softplus\")\n",
"\n",
"data = nD_data(n=500, pb=pb)\n",
"\n",
"CardNodes = (2, 3, 4, 2, 1)\n",
"nn = ToyNN(card = CardNodes, coef_bounds=(-1,1,-1,1), chi=\"tanh\", grid=(-1,1,41))\n",
"A=nn.create_rand()\n",
"\n",
"error = nn.total_loss(A,data,pb=pb)\n",
"print(error)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The method *nn.total_loss_and_prediction* combines the effects of *nn.total_loss* and *nn.prediction*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"error2 = nn.total_loss_and_prediction(A,data,pb=pb)\n",
"print(f\"error ={error},\\nerror2={error2}\")\n",
"\n",
"data.show_class(pred=True)\n",
"nn.show_pred(A)\n",
"pb.show_border('k--')\n",
"plt.title(\"predictions of A\", fontsize=15)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"top\n",
" \n",
" \n",
"1.\n",
" \n",
" \n",
"2.\n",
" \n",
" \n",
"3.\n",
" \n",
" \n",
"4.\n",
" \n",
" \n",
"5.\n",
" \n",
" \n",
"bot.\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"colab": {
"collapsed_sections": [
"GafO0zXoJ6Cx",
"5l_mvC1OJ6Da",
"ZzS5-IzwaKn3",
"89AjhkJ2aKoB"
],
"name": "ToyNN_class.ipynb",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 1
}