Brian Castle
Navigation

Navigating a maze (or any other scene) is a whole-organism activity that requires coordination between head, neck, body, and eye movements. It requires an egocentric reference frame for behaviors like "turn left" and "turn right", and it requires an allocentric reference frame to understand the placement of objects in the visual scene, and which pathways lead to which goals. The elaboration of a sequence of motor behaviors is the domain of the frontal lobes, in conjunction with the basal ganglia. An area in the midbrain reticular formation called "substantia nigra" is heavily involved in the selection of behaviors, and the decision to start and stop particular behaviors. This area is impacted in Parkinson's disease. Its pars reticulata has a strong connection into the superior colliculus, and is thought to transmit some of the disorders of eye movements associated with the condition. There is a pathway from the head of the caudate nucleus to the substantia nigra that seems to be specifically involved with eye movements, and in particular those related to other motor activities. For example when one wants to climb a stair, the eyes typically go to the base of the stair.

Navigation exposes an issue called "credit assignment". When engaged in exploratory navigation, one tries different avenues. Which avenue resulted in the achievement of the goal? Many times it's the most recent avenue tried, but not always. Often there is complicated logic involved in goal attainment. Conditions can be interrupted by long periods of delay between the activity that created them and the eventual result. Neural networks handle this by creating "graphs", which the psychologists sometimes call schemas, associated with engrams, or memory traces. The same principle applies to eye movements. When we explore a scene, we first scan the scene, and sometimes our brains then require additional detail about a portion of it, so our eyes return to that place and scan it with higher resolution. The details may be important, and may become part of the scene (this is certainly true when one sees a dangerous insect, or when one is appreciating a work of art). There is also the related issue of the "value" assigned to an object, on the basis of current or previous reward and punishment. Value serves as a goal for navigation, and often there is more than one goal, and various goals can be arranged hierarchically or in parallel or in any number of ways. At some level objects, goals, values, and the relationships between them all become the same - they are "data". Our brains can arrange the data abstractly, logically, in the form of graphs that represent the relationships - which are nothing more than patterns of firing neurons - no different than what a visual system might see when it gets input from a retina.

In relation to the brain systems we've discussed so far, navigation has a special relationship to scene mapping and short term memory, both of which are organized around the point at infinity on the sensorimotor timeline, and they both involve a special kind of encoding transmitted between the hippocampus on the one side, and the prefrontal cortex on the other. This is fertile ground for neural network modeling, because the wiring and the neuronal and synaptic mechanisms are complex, and they're nothing like any of the existing machine learning models. Human navigation can not be easily supported by transformer architectures, unless they have hundreds of layers. Does a human brain have hundreds of layers? Well... it could. For one thing, the modular organization of the cerebral cortex is well suited to serial processing. For another, a small patch of active dendrite is like its own little neuron, it sums its inputs and creates a dendritic mini-spike in an all-or-nothing manner. A single neuron could thus embody many "layers" in a convolutional context. And third, there are architectures in the brain like the upper portions of the basal ganglia (putamen, globus pallidus) containing many spiny neurons that project in cascades throughout the volume, instead of neatly in layers. There are several existing models of cognitive action in relation to motor sequences. One of the more immediately visual is the Spaun model from the Nengo group. There are videos on YouTube showing Spaun engaging in complex cognitive tasks involving motor sequences. This kind of behavior is different from a self-navigating car, although there are similarities. A car gets GPS and Lidar input, a human has no such capability. An interesting thing about humans is we have a hard time adding 2 and 2. Computers can do it in nanoseconds, but humans have to learn the symbols, and then we have to learn the rules. It takes a while for a child to learn the addition and multiplication tables.

How does a neural network create a graph? There are "graph neural networks" in the machine learning world, and they're instructive, but they're not exactly what we're after. We're after something more like the Granger micro-causality graph shown in the first section. We'd like to know all possible paths and the probabilities associated with each. That is what we'd like to draw from memory, and that is also a model we can use to update our beliefs and our understanding. This is where information geometry can help us. "Possible paths" can be represented as actual paths on a manifold, where the beginning and ending points are a set of conditions. Movement from one point to the next is exactly like moving a multi-jointed limb - one doesn't need to calculate every possible trajectory, one merely needs to move the joints into place. Trajectories "can" be optimized (for example a dancer might be interested in doing this), but they usually aren't - they're usually brought into a "range" where one trajectory is as good as another, within the computational window allowed by the real-time constraint.

A graph, in a neural network, pertains to a sequence of cognitive "actions", each of which is design to rearrange the sensory reality. It is, in effect, an abstraction of an ordinary motor movement. When a child wishes to hear herself scream, she screams. Later, when she is perhaps the victim of a home invasion robbery, she already has the graph, and the decision occurs in a larger context, as to whether this particular portion of the graph is worth activating, or whether it might be better to use a different part.

Information geometry is an advanced study, it requires conversational knowledge of the methods of probability theory, differential geometry, algebra, and statistics. However at the end of the day, it's the most general and computationally accurate model that can be applied to brain function. A stochastic information model is very real, it's how human beings operate, it's the world in which we're immersed. Right now, neuroscientists are struggling with phase encoding, and that doesn't have to be the case - from an information geometric standpoint, phase encoding is a logical first step toward the representation of data in terms of information manifolds. In data-land there is only information, and relationships between information. The dynamics are there to coordinate the communication of that information and those relationships. The memory is there to store them, so they can be called up as needed.

At this point in time, it's still difficult to extract a subspace from the global store. This is an activity that has to occur regularly, quickly, when context needs to be delivered to the scene mapping logic in the hippocampus. Let's say when navigating, we encounter a particular object, identified by its shape. And if it's a red object, it's dangerous, but if it's a green object, it's safe. So first we have to identify the object, then we have to extract from memory everything we know about that object, and then "in the context of the scene" we have to derive value from the object and ask ourselves whether it's green or red, and either approach or withdraw. We know "most" of the brain circuitry related to this sequence, but we don't know the particulars. We know the perceptual pathways for shape and color, we know the pathways for object recognition and scene mapping, we know the pathways that assign values to perceptions, and we know the pathways for motor actions. What we're missing is how all these subsystems are coordinated. The coordination of an approach/withdraw decision occurs very quickly in humans, in the subsecond-to-seconds range. We're missing things like how information is actually encoded in the global store. It may be a phase encoding scheme, but likely it's something considerably more complicated. Whatever it is, it allows access to subspaces as quickly as it allows access to details. Humans have an uncanny ability to "zoom in" to subspaces of memory. If I try, I can remember phone numbers from when I was a child. It takes a few seconds, first my brain brings up a collection of old numbers, then I go "nope, wasn't that one... wasn't that one... aha!", and then I might have to think on the particular number to make sure I get the digits right.

At this point in our knowledge of neuroscience, we still have to guess about the particulars of information geometry. We don't know for sure - but it seems to dovetail quite nicely with what we know about free energy and phase encoding and modular organization in the brain. An important lesson can be derived from the human visual system, because we don't have to do everything geometrically, we can use statistics to our benefit. Calculating the geometry of every surface in a visual scene would be prohibitively expensive, even for a massively parallel brain. But a statistical approach can extract surface information just as easily, and the resulting representation may be even more natural to a neural network than the geometric equivalent. One of the nice things about information geometry is it has "primitives", that are much like the stick figures representing human shapes. They are abstractions of manifolds that need to be individuated according to their information content. If this stick figure has its right arm extended, maybe it's telling me to stop. Information like that can be transmitted much more quickly than a geometric analysis of the angle of the arm. The lesson for navigation is that action-result pairs work on joint distributions, just like the causality analysis in the first section. It's a ubiquitous principle in brain design, for both structure and function. The filters in the early visual system extract joint micro-distributions, and the filters in V5 (area MT, the motion sensitive part of the upper visual cortex) also extract joint distributions when they're figuring out where to target an eye movement. In information theoretic terms, a cluster of memories is a joint distribution. In graph mapping terms, the relationship between actions and outcomes is a joint distribution.

Next: Let's Model It!

Back to Cerebral Cortex

Back to The Retina

(c) 2026 Brian Castle
All Rights Reserved
webmaster@briancastle.com