We've looked at the concept of plasticity at the synaptic level, but when we look at neural populations instead, things get more complicated. Most of the machine learning models of neural networks use an "optimization" procedure, they usually either minimize an error term or maximize a likelihood term. In a biological network, how can we obtain regional or global information, like for example, a Hamiltonian or Lagrangian? Generating Random NumbersWhen engaging in modeling, or any kind of exploration with neural networks, there are some functions we need over and over again. One of the most important is generating random numbers according to a distribution. There are many useful ways of doing this, some of which rely on time-tested methods like the Metropolis-Hastings algorithm, and others that are highly optimized for performance. One of the important things to realize is that there are no "true" random numbers in the digital domain - the best we get from computers is a set of "pseudo"-random numbers derived from a seed. This is both helpful and unhelpful. It's helpful when we're trying to replicate an experiment exactly (we frequently have to do this during debugging!). It's unhelpful when we're trying to randomize our experiments to get as close to biological reality as possible. If we want "true" random numbers, we have to get them from quantum devices dedicated to this purpose. These are available on chips, but not commonly found in consumer products.Energy FunctionsThe extent to which a local or regional energy function can be useful is determined by the network connectivity, especially the internal convergence and divergence between processing modules. In the dynamic assembly theory of cortical function, modules go into and out of functional assemblies by mutual coupling. There are dozens of ways in which such coupling can be achieved. But does this mean the energy functions needs to be coupled too? Or can each module maintain its own cost function while still participating in a larger set of network calculations?Gradient DescentThe issues associated with gradient descent are well known and have been extensively studied. There is the issue of convergence, and there is also the issue of rate of convergence. If the algorithm gets stuck in a local minimum there are ways of restarting it, and ways of avoiding local minima altogether. There is the issue of the vanishing gradient, and a number of ways to resolve it or avoid it.Fitting ParametersIn machine learning, the method of fitting parameters is chosen in advance according to the needs of the programmer. Some kinds of data work better with logistic regression, and other kinds work better with k-means clustering. The programmer usually explores the dataset before deciding which algorithms to try. Human beings don't always have this luxury. We're expected to respond to unknown datasets in real time, and do the best we can in terms of classification and optimization. Humans engage in meta-programming, that is to say, we do the same thing the programmer does, we select the algorithm that's best suited to the task at hand. Our brains have a library of algorithms to select from. Most of these revolve around statistics - we rarely engage in any geometry more complicated than simple matrix multiplication. If we require derivatives we extract them up front and transmit them on separate channels.EntropyThe flip side of generating random numbers, is determining how much information there is in a signal. The foundation of this approach is based in information theory, using concepts like entropy. Sometimes we have to know things about the statistics up front, to make meaningful calculations about the information - but in real life we often don't have much up-front input, so we have to adjust our models on the fly, based on successive tidbits of incoming information. In the world of Bayesian models and Bayesian parameters, the number of parameters we use frequently has to be updated based on new information. An example was given earlier in relation to a coin flip that suddenly yields an unexpected result, like "5" when we're only expecting a 0 or 1. This forces us to update the number of parameters in our internal model, because now we know about three states instead of just two.Non-Euclidean ManifoldsOne of the things the brain does exceedingly well, is coordinate transformations. Not just geometric transformations, but restructuring along lines that are completely different from the axes of the input signal. One such mapping is the projection of the input onto coordinates that represent information - in particular we can use the Fisher information as a coordinate system and the Kullback-Leibler divergence as a metric. The resulting manifolds are the subject of information geometry. Such mappings are important everywhere in the brain, even at the most peripheral levels like in the retina (Ding et al 2023).Next: Optimization and Error Signals |