The Inverse-Optics Problem

Continuing from Chapter 6: Let There Be Light!


Visual perception is a constructive, or generative function that interprets the visual stimulus by constructing a three-dimensional model of the configuration of objects and surfaces in the external world that is most likely to have been the cause of the given stimulus. We can observe this constructive function in visual illusions like the Kaniza figure, Tse’s spiral worm, Idesawa’s spiky sphere, and Tse’s Loch Ness Monster. Each of these illusions are perceived as three-dimensional objects and surfaces, with a distinct experience of depth and surface orientation at every point on every visible surface. In other words, the experience itself is a three-dimensional structure, a model of external reality expressed in an explicit spatial code, a volumetric image projection mechanism.

Visual perception is analogical in nature: We perceive objects by constructing spatial analogies of them and experiencing those analogies, which we come to believe to be the objects themselves, perceived out where they lie beyond the sensory surface. Visual perception will always seem profoundly paradoxical until we see through the Grand Illusion and recognize the world of experience for what it is: A miniature virtual-reality replica of the external world in an internal representation.

The primary function of visual perception is the construction of this volumetric spatial interpretation of objects and surfaces in the world that are most likely to have been responsible for the visual stimulus. This is known as the inverse-optics problem, i.e. to undo the optical projection from the three-dimensional world to the two-dimensional retinal projection. This problem is mathematically under-constrained, because there is an infinite range of three-dimensional spatial interpretations that correspond to any given visual stimulus. For example a rectangle on the retina could correspond to any of the infinite range of irregular quadrilaterals spanning different depths, whose corners correspond to those of the retinal image. How does the visual system select from this infinite set the one that we perceive?


I propose that the only feasible way to solve such a computationally intractable problem is by way of a parallel analog wave-like algorithm that essentially constructs, or reifies, every possible interpretation simultaneously and in parallel, before selecting from that infinite set the one (or more) most stable interpretation. And the selection criterion seems to involve Gestalt principles of simplicity and symmetry. This is where mathematics enters perception.

Hochberg & Brooks(1960) found that the probability that a line-drawing is interpreted as 2-D as opposed to 3-D depends on the simplicity of the interpretation in 2-D compared to 3-D. For example figure A below is perceived as a cube with equal sides and all right angles, whereas as a 2-D pattern  there are 7 tiles with different angles and side lengths. Figure D on the other hand makes a simple pattern in 2-D, and thus is perceived easily as a 2-D figure with six identical equilateral triangles, whereas in 3-D it represents an unlikely singular viewpoint along a diagonal axis, which is evidently more difficult to perceive.


The fact that most of these figures can be perceived as 2-D or 3-D, with a particular preference for one over the other, and the fact that the ambiguous cases can be observed to flip bistably between two states, suggests that the visual system has constructed both alternatives, and is weighing them against each other in real-time to see which is the simpler interpretation.

Indeed this is the same principle in evidence in the illusions introduced earlier. A triangle occluding three perfect circles in the Kanizsa figure trumps an interpretation as three pac-man features perfectly aligned.  Peter Tse’s volumetric worm tends to be perceived with a simple cylindrical body, ends capped with perfect hemispheres, bent into a regular spiral around a regular white cylinder. Likewise with Idesawa’s spiky sphere, and Tse’s sea monster. This is a visual system that is based on symmetry, the perceptual equivalent of Occam’s Razor: In the absence of evidence to the contrary, the most likely interpretation is the most symmetrical one. Not because everything in the world is symmetrical. Far from it! But because symmetry is the primitive operational principle of the visual system: we perceive shapes by their symmetries and by their their violations of symmetry.

The metric found by Hochberg & Brooks to correspond with the perceptual preference for a 2-D or 3-D interpretation involved the number of edges of the same length, and the regularity of the angles between edges. This corresponds to the Gestalt principle of prägnanz, or simplicity, the simplest interpretation is the most stable in perception. This also corresponds to a metric involving resonance in a 2-D and 3-D context.

Start with A, inverse-project into depth, but also within xy plane

Mark centers of symmetry

Extrapolate to implications

Competition between different depths

various examples

Next chapter: Visual ornament





Consider Peter Tse’s volumetric worm depicted below in figure A. The computational function of perception can be defined as the expansion of that 2-D stimulus into a full 3-D experience as shown below in figure B.






The Kanizsa figure is actually a three-dimensional perceptual experience as the foreground triangle is perceived to occlude three complete circles that complete behind it. There are two aspects to this illusion: There is the amodal contours of the hidden sectors of the three circles that are perceived “invisibly” behind the triangle, and then there are the modal contours of the triangle that take on an actual brightness difference across the illusory edge.









The significance of symmetry is apparent in the phenomenon of the kaleidescope, whose symmetrical patterns are instantly perceived before we have time to think about it. When you think of the computational algorithm required to detect this kind of symmetry it is a combinatorial nightmare, as every piece of the image must be matched against every other piece at every different orientation.  The phenomenon of the kaleidescope strongly implicates a parallel analog wave-like computational principle whereby global patterns of symmetry emerge spontaneously from the simultaneous action of innumerable local forces. This is the Gestalt principle of emergence.Kaleidescope

The emergent symmetry-detection system must operate in a full 3-D context because the simplicity of the 3-D perceptual interpretations is not at all apparent in 2-D projection.






The way that the visual system addresses this vast expansion of possibilities is to construct, or reify, every possible interpretation of the stimulus simultaneously and in parallel.

We can see this simultaneous parallel competition between equally likely interpretations in carefully contrived visual illusions that give exactly equal weight to alternative interpretations, resulting in a bistable, or multi-stable percept that alternates randomly between alternative interpretations.

The way that the visual system addresses this overwhelming expansion of possibilities into the third dimension is to essentially construct, or reify, every possible spatial interpretation simultaneously and in parallel, a computational function only possible in a parallel analog wave-based system. Competition between alternative spatial interpretations results eventually to the emergence of the single (or more) interpretation(s) of the stimulus that is (are) consistent with the given stimulus.

The primary function of visual perception

Posted in Uncategorized | 9 Comments

Let There Be Light!

Continuing from Chapter 5: Ray-Tracing Algorithms

Ray-tracing algorithms highlight the vital importance of lighting to the interpretation of a scene, whether it be the stark bright light of the sun, the more diffuse glow of the blue sky, indirect light reflected by other surfaces, or light glowing or shining from luminiferous objects. We see the light not merely as a feature in the scene, but as a directed, causal process, that intimately links the light source with the patch of the world that it illuminates: we perceive the light as the causal source of the illumination of the scene.


However at the same time we also perceive the reverse inference:  The illumination of the scene implicates a light source illuminating it. And the nature of the illumination reveals information about the light source, even if the source itself is currently invisible, or out of our field of view.


An overall color cast across the whole scene implies that it is the illumination source that accounts for the color across the scene. The color that is distributed across the whole scene is attributed to a single causal agent, the light source.


The direction of the shadows in the scene point backward from the occluder back toward the light source, even if the light source is outside the field of view, giving a distinct amodal percept of the invisible light source that can be localized with some precision.


This spatial inference is exposed by smoke or mist in the air that reveals the invisible beams of light as a glowing shaft of illumination slanting through empty space, making explicit that which is perceived only implicitly in clear air. The haze turns the invisible amodal percept of the directed beams of light into a visible modal experience.


Multiple shadows cast from every object implicate multiple light sources, perhaps with different colors and properties. Sharp stark shadows indicate a point-like light source. Softer shadows indicate an extended source.


The causal connection between a light source and the world it illuminates becomes even more explicit in the case of local light sources that illuminate only local patches of the world.


We are not at all surprised to see the patch of illumination vanish as the lamp disappears. But we are puzzled if the lamp disappears but its illumination remains! There is no clearer example of the perception of direct causality than that which connects the light source to the patch of the world that it illuminates.


Consider this ray-traced scene of a dollhouse, from the POV-Ray gallery. Notice how each illuminated patch on a wall is accounted for, or “explained” by the light source pointing at it. Without the lamp, the bright patch on the wall might be misinterpreted as an actual discoloration, or bleached stain on the wall. The presence of the lamp helps explain each otherwise anomalous patch of brightness, and at the same time, the patches of brightness reveal and explain the presence of the lamps that are causing them. The cause reveals its effect, at the same time that the effect implicates its own cause.

Clearly what is happening in perception is a factoring of the intrinsic color or brightness of an object, and the pattern of illumination shining on it. When all the surfaces pointing in one direction are brighter than the surfaces pointing in other directions, that suggests that those surfaces are oriented toward the light, thus implicating the direction of illumination. Knowing the direction of illumination, in turn, helps disambiguate the perception of otherwise ambiguous forms.

This exemplifies a recurring theme of one of the most puzzling aspects of perception, of a kind of circular inference, where A implies B, but only if B also implies A, and thus the result cannot be computed in a single pass, but must emerge by a kind of resonance between all parts of the scene, to equilibrate to a globally consistent state. This reciprocal causality is demonstrated most explicitly in the ray-tracing model of mirroring, for example with two mirrors pointing at each other to create an “infinite tunnel” of reflections. In the first iteration each mirror records an image of the other by reflection. In the second iteration, each mirror sees its own reflection in the other mirror, and thus each mirror contains the other mirror in their reflections. In the third iteration each mirror sees the reflection of the other mirror in their own reflection producing three reflections in each mirror, and so on through more iterations potentially to infinity. Performed in analog with two real mirrors and actual light, the computation occurs virtually instantaneously, literally at the speed of light. In the ray-tracing algorithm this kind of cyclic reflection can require some of the most computation intensive processing. In fact the user must specify a limit to the number of cycles if the computation is ever to come to an end. And yet in perception, our automatic and instinctive, almost “unconscious” computation of the illumination source when viewing a scene, also seems to occur virtually instantaneously, thus strongly implicating a parallel analog wave-like computation not unlike the actual light reflecting back and forth between two real mirrors.

The perception of light propagating through space is a necessary prerequisite to the reliable perception of objects in that space, if the sensory evidence for those objeects is the result of their illumination by a light source, and their optical projection on the retina.

Continued Chapter 7: The Inverse-Optics Problem

Posted in Uncategorized | Leave a comment

Ray-Tracing Algorithms

Continuing from Chapter 4: The Language of the Mind

The computational function performed by mental imagery can be better understood by comparison with computer ray-tracing applications that perform a very similar computational function. Figure 1 shows an example of a ray-traced image of a scene from Friedrich A. Lohmueller’s most excellent web site  generated by the free open-source ray-tracing program POV-Ray, consisting of a sphere floating in space above a ground plane, under the dome of a synthetic sky. A light source is modeled in the sky, whose rays illuminate the sunny side of the sphere, which casts a shadow on the ground plane below. This is exactly the kind of image that might be evoked in mental imagery by the words “Imagine a sphere floating in the air”. Any artist worth their salt could compose such a picture on demand using paint and canvas to record a mental image that corresponds to those words. The image itself is two-dimensional, a mere projection of the mental image that it depicts. But it requires consideration of the full three-dimensional spatial configuration to correctly compute the scene, especially the patterns of light and shadow on the sphere, as well as the shadow it casts on the ground below.


The computational algorithm used in ray-tracing involves tracing every ray in straight lines radiating outward from the light source, and following its path through empty space until it strikes the surface of an object in the scene. If the surface is opaque, with a certain color, then the modeled ray is absorbed and re-emitted in that color in all directions from every point on the illuminated surface, as if the whole illuminated surface were glowing with that color.  If the object is shiny, then a portion of the ray is traced through a specular reflection as in a reflection through a mirror, creating a shiny highlight, and if the object is transparent, the ray-tracing algorithm traces the ray through the transparent object, modeling the proper refraction and partial absorption, or filtering, of the transparent, possibly colored medium. Every single ray radiating from every light source is traced in this manner through any number of reflections, refractions, and re-emissions through the scene. Any rays that happen to stray in the direction of the viewpoint or ‘camera’ position, if they fall within the frame of the rendered image, leave a trace on the image where the accumulated rays paint out the ray-traced image of the imagined scene. This is an extraordinarily computation-intensive process that takes many billions of computer cycles to compute. (Actual ray-tracing applications reduce the computational load considerably by starting at the final image plane and tracing the light rays backward toward the sources, thus only having to compute those rays that end on the output image)

The remarkable thing is that the human mind can perform this kind of computation virtually instantaneously, in a flash of mental imagery, strongly implicating a parallel analog computational strategy that solves the problem by spatial analogy. Although it takes an artist a certain amount of training and practice to learn to paint realistic scenes with the proper shading, shadows, and reflections, even young children can immediately interpret the result, even with a scene as complex as that in Figure 2, (“Balcony” from the POV-Ray gallery) with multiple cast shadows, reflections, refractions, and complex geometrical shapes. The compelling appeal of ray-tracing algorithms is not so much that computers are capable of performing the computations, which is amazing enough in itself, the real amazing thing is that people can correctly interpret the results, and they do so, even young children, instantaneously and effortlessly, without even any awareness that they are doing a computation at all.

The ray-tracing algorithm demonstrates explicitly the computational function served by the faculty of mental imagery, although what can take many hours of crunching in a digital computer seems to occur near-instantaneously in the human mind.

Figure 2. A ray-traced scene from the POV-Ray gallery.

BasicGeoObjSrcRay-tracing applications use a geometrical code to define the structures to be rendered. For example the sphere at the center of the scene in Figure 1 is defined by a POV-Ray function sphere{ } which is given parameters that define its location scale, orientation, and the color of its surface.  The ground plane is defined with the function plane{ }, which is also assigned a location and orientation and color, or patterns of color. These functions serve the same purpose as the concept node in visual perception, representing the general concept in abstract invariant terms, while providing for parameters that can specify any particular cube or plane while conforming to the invariant formula. Friedrich A. Lohmueller’s excellent POV-Ray tutorial demonstrates the great variety of different shapes that can be defined by simple geometrical functions, illustrated here, from spheres and cylinders and cones, through boxes and prisms.

Shapes and surfaces can also be defined by any mathematical function that specifies a location for every point in the surface, for example using  paraboloids and hyperboloids, as shown here, polynomial, exponential, sinusoidal, parametric, any mathematical function that defines a surface or volume. Paraboloid

Compound shapes can be built up by Constructive Solid Geometry (CSG) operations, logical set-theoretic operations between volumetric solids, such as unions, intersections, difference, etc.  For example the union of the red and yellow spheres defines a shifted double-sphere. The yellow sphere subtracted from the red sphere cuts a

Constructive Solid Geometry (CSG) such as Union, difference, and Conjunction

spherical bite,  or void from the red sphere, while the intersection of the two spheres is the volume of their geometrical intersection, a lens-shaped volume that perfectly models a spherical lens.

Shapes can also be defined by extrusion, by translating a three-dimensional shape along a three-dimensional path through space, thereby sweeping a volume through that space as shown in these examples. For example the torus (shown above) is defined by sweeping a flat circle around a circular path normal to its surface. The shape, size, and orientation, and even color of the moving shape can change as a function of position along the path, resulting in shapes that swell and shrink in characteristic ways.

Extrusion shapes defined by a sphere that changes size and shape as it traverses its track.

Compound shapes, or arrays of shapes can be constructed by repeating copies of a basic shape, each one translated, rotated, scaled, or  otherwise reshaped to create ever more complex compound shapes.

Compound and patterned shapes, defined by algorithmic loops.

You can define smoother blob objects using some kind of distance metric from some reference frame, which you can imagine as a kind of haze whose density varies with distance from the frame. A density threshold defined in that haze will in turn define a surface that encompasses a volume. For example the pink triangular blob shown in the upper-left is defined on a triangular frame between three vertex points, with a larger distance metric around the points, and a smaller metric around each triangular side. The two metrics are blended with a nonlinear equation, resulting in a smooth blobby shape that necks continuously from spherical to cylindrical form.Blobs

The surfaces of objects can be painted with patterns of  ‘pigment’, colors and textures, which are defined in the very same language, i.e. Euclidean geometry, as the language of three-dimensional shape.

MoreTexturesCandyBarberSpiral TexturesBrickCheckerSky

If you modulate regular patterns with some kind of randomized turbulence, you can generate a great variety of different patterns that bear striking resemblance to a great variety of natural phenomena.

Colors, textures, and patterns defined by geometrical shapes.

Randomized functions and fractals generate more realistic random patterns.

The shapes themselves can also be modulated or replicated in regular arrays, to define compound objects composed of a multitude of sub-units. Random or fractal functions can make these structures quasi-periodic, more similar to certain natural phenomena.


POV-Ray is a truly magnificent tool for defining three-dimensional scenes from your imagination. Its dimensions are those of the artist, color, space, form, and light, the dimensions of conscious experience. But its genius is in the fact that, like virtually all other ray-tracing apps, it uses the natural language of visual perception to express its geometrical structure, because that is the code we all understand. It is the way we think about shapes, that is the way we break down compound shapes into their geometrical primitives. We use this visual code not only in free-wheeling visual imagination, but also in perception. Visual perception is the act of hallucinating a scene that is consistent with all the sensory evidence. Perception is as much an act of creation, as it is an act of detection. If an artist were given any of the two-dimensional ray-traced scenes above, and commissioned to construct a three-dimensional painted sculpture of the scene, the computational function that he is comissioned to perform is exactly the computational function of perception: The input is a two-dimensional image not unlike a retinal image: The output is the rich three-dimensional experience that you have, immediately and unconsciously, as soon as you just glance at any of the ray-traced images above.

There is a general principle encapsulated in this mathematical approach to defining geometrical shapes, and it is a foundational principle that underlies all of mathematical thought. Every mathematical function, whether a simple linear or planar function, a circular or cylindrical or spherical arc or shell, or something more complex like a polynomial, exponential or logarithmic, or sinusoidal function in one, two, or three dimensions. It is that whatever the formula that defines the shape, that shape is defined to infinite precision. The function defines a pattern that is independent of scale. It is a pure and perfect descriptor of shape to essentially infinite resolution. The appearance of this concept in ray-tracing software reflects the deep intuitive basis of this way of representing shape in the human mind. Ray-tracing software is designed by people to be used by people, and that is why it uses basic geometrical concepts starting with points, lines, and planes, to define the shapes that create the imaginary scenes. This is the way that we think about shape, whether in perception or in mental imagery and imagination.The reason why ray-tracing algorithms employ the primitives of euclidean geometry is the same reason why Euclid chose those same primitives in the first place, because that is the way we conceptualize shape. Euclidean geometry was not an invention, but more of a discovery of the basic elements of geometrical thought, and successive generations of geometry students accept euclidean geometry not as dogma, but because they find it consistent with their natural intuitions about shape, a product of the long evolution of our perceptual and conceptual systems.

Continued Chapter 6: Let There Be Light!

Posted in Uncategorized | Leave a comment

The Language Of The Mind

Continuing from Chapter 3: Amodal Perception


Images are the primary language of the mind. We think and imagine in terms of scenes and views of objects and events that we conjure into existence and examine from different angles, testing in our mind whether this couch would fit in that corner, or whether this car would fit in that parking space. We simply will the image into existence, and we maneuver it invisibly into position in the real world, to see how much clearance there is and how tight the turns will have to be to get it there. We do this so automatically and instinctively that we are hardly aware of it, or at least aware of it as anything real, due to its amodal invisibility. But we can “see” the mental image of the couch, or car, clearly enough to estimate the clearances, sometimes assisting our mental image with sweeping motions of our hands to indicate the spatial location of the mental image as it is maneuvered into position relative to the real world. Our mind is generating 3-D images continuously in real time, both modal and amodal, in both perception and cognition. The primary function of the mind, what makes it spatially conscious, is its ability to project three-dimensional images into a spatial world of conscious experience, and that volumetric moving colored image, and the invisible amodal framework that supports it, is our only window onto the objective external world beyond the mind.



But the mental image is more than just an image. It is an image that is constructed to a particular formula that give meaning to its shape. A mental image of a cube is tied to the abstract concept of a cube, a concept that can attach to any cube as long as it is cubical. But the cube can vary in spatial location, it can rotate in orientation, it can zoom up and down in scale, it can even stretch or morph elastically (within limits) while still maintaining its perceived identity as a cube. The concept of a cube is distinct from the image of a cube, because the concept is invariant to the rotations and translations and scalings of the cube. And yet the concept is intimately connected to the image in the sense that the concept inevitably “lights up” or becomes activated in our mind whenever a cube, or cubes, are present in our experience wherever it, or they, are located. That node provides the ‘handle’ that connects imagery to language, allowing us to report verbally whether we are experiencing the image of a cube. This is the bottom-up process of visual recognition as expressed in neural network models, where the concept is represented by a “node” whose activation represents the recognition of its corresponding image in the sensory stimulus. Bottom-up recognition at least for simple shapes, exhibits an invariance to rotation, translation, and scale.




The exercise of mental imagery, or imagination, reveals that the concept is capable of more than just bottom-up recognition. It is capable of actually generating cubes in mental imagery, and it can do so through that same invariance relation. I can imagine my cube at any location, orientation, and scale, or I can imagine it rotating, translating, and scaling continuously while I am imagining it, especially if I help stabilize the image by following its corners with my fingertips as I move it through space. Mental imagery is a most extraordinary faculty that forms the foundational basis of mathematical thought. Although we have no idea how we perform this extraordinary mental calculation, we can describe the computational function of mental imagery at least in functional terms.


The most basic feature of mental imagery is the mental image space, the space of our imagination, which is extended in three dimensions, within which the images of our imagination appear. Attached to this mental image space is a conceptual representation which can be described as an abstract or symbolic code, an array of concept nodes, that represent shapes such as spheres and cubes and pyramids, invariant to their location, orientation, or scale. The computational function of mental imagery can be described as the top-down projection of an abstract spatial concept into a three-dimensional mental image through an invariance relation, assigning to the imagined object a specific location, orientation, and scale, by a deliberate act of will.

Chapter 5: Ray-Tracing Algorithms

Posted in Uncategorized | Leave a comment

Amodal Perception

Continuing from Chapter 2: The Schema As A Mental Image


Amodal perception was first discovered as a property of perception, where it occurs so unconsciously and effortlessly that we tend not to notice it at all. For example if we view two pencils crossed, one pencil occluding the other, the occluded pencil has a great gap of missing data, and yet we perceive that pencil as complete and continuous through the occlusion. Furthermore, we only see the exposed side of each pencil as a colored hemi-cylindrical surface, and yet we perceive each pencil as a whole cylindrical form, complete through its invisible volume all the way to its hidden rear surfaces. The modal percept appears as a colored surface, whereas the amodal percept appears as a solid volume occupying a specific region of space, where it is perceived, but remains completely invisible.


When I see a box on the floor before me, I generally see only three exposed surfaces, whose straight edges are usually tilted by perspective so as not to appear at right angles from my viewpoint, and yet I perceive the box as a right-angled rectangular volume, complete with an interior, which can be perceived as hollow or solid, and with hidden rear surfaces. I can easily reach back behind the box and demonstrate by morphomimesis the exact location and orientation of those hidden surfaces as if I were seeing them transparently through the box. I do not decide to form this mental image, or to make it rectangular. The amodal image forms immediately and automatically based only on the appearance of the visible surfaces, assisted by my past experience with boxes. It is my mind’s way of constructing the simplest three-dimensional explanation for the two-dimensional stimulus on my retina. The invisible image has exactly the shape that I perceive the object to have.


One reason why this amodal percept is so easily overlooked, is that it is not really a visual experience as such, even though it is often informed by a visual stimulus. When I encounter a box in pitch darkness, or with eyes closed, and feel it with my palms, I get the same amodal experience of a rectangular volume in a specific location in my space, and I perceive the whole box, through to its rear surfaces, even though I palm only selected faces or corners of the box at a time. And when I turn on the lights, or open my eyes, the tactile texture of the box felt by my palms is experienced on the very same rectangular volume as are its color and brightness that I perceive visually. The hardness or flatness of the ground that we feel underfoot even with eyes closed, are experienced as properties of the very same amodally perceived ground that carries the color and brightness and texture that we perceive visually with eyes open. In other words, the spatial structure that is our amodal experience of the world is the common ground, or lingua franca, that unites all sensory experience in a modality-independent structural representation of the world, and that amodal structure represents our perceptual and cognitive understanding of  the world.

Proprioception Furthermore, our picture of the world is incomplete without an amodal experience of our own body at the center of our space, which we also experience, even with eyes closed, as a three-dimensional spatial structure with four limbs and a head, attached to a trunk. And we can feel our own pose or posture even without looking using the sense of proprioception, which is experienced as specific patterns of articulation of the explicit spatial structure of our body.  Close your eyes and observe the experience of proprioception. The information content of that experience is equal to the information content of a wooden marionette, whose limbs and torso are bent into whatever is the current posture of your body. That is the shape of your experience of posture.


The amodal experience of our own body by proprioception, and the amodal experience of the hidden rear faces of perceived objects, are expressed in the same pure geometrical form, like an outline drawing hanging in three dimensional space, printed in invisible ink that we can magically see or feel. And it is that invisible structure that supports the modal colored surfaces that we experience visually on the exposed surfaces of colored objects, and that same amodal structure supports the sensory experience of tactile contact.

BaseballInHandWhen I hold a baseball in my hand, the round shape that I feel in the cup of my hand, is perceived as part of the same amodal sphere that supports the white leathery surface that I perceive visually. The amodal percept is the lingua franca, or common ground that unites all sensory experience in a modality-independent structural representation of the world, that is our cognitive understanding of our self and our world.

The existence of amodal perception compels us to take stock anew of the nature of visual experience. From the earliest days of childhood we have been working under the naive realist assumption that the world we see in our experience is the world itself, viewed directly out there where it lies around us. The fact of amodal perception requires us to revise that view, to acknowledge that the amodal percept is not the object itself, as suggested by naive perception, but rather, the amodal percept is a data structure constructed by our mind based on sensory evidence. And that data structure is expressed in explicit spatial form: It is a solid three-dimensional structure that occupies specific volumes in the space of our experience, and the very existence and shape of that invisible structured experience is identically equal to our understanding of the structure of the world.


Amodal perception is the connecting link between perception and cognition. It has the automatic and instantaneous nature of perceptual recognition, as in the perception of the volume of a box, and yet it is also susceptible to cognitive manipulation – it is possible to modify the shape of your amodal percept by an act of will, for example by imagining (or believing) a sealed box to be full or empty, or by imagining its hidden rear faces to be missing, or punctured, or dented, if we choose to imagine them so. Amodal perception is the earliest, most basic form of mental imagery, one that we certainly share with almost all visual creatures. But amodal perception is also the door that opened perception to cognition and free-wheeling mental imagery, and true human intelligence, connecting the world of direct experience to the world of imagination.

Continued: The Language of the Mind

Posted in Uncategorized | 8 Comments

The Schema As A Mental Image

Continuing from: Chapter 1: The Perceptual Origins of Mathematics

The schema, it seems, is a kind of mental image. But what is mental imagery? There has been considerable debate in the literature over whether mental images even exist as actual images, many insisting that they do not see mental images as pictures in their mind. Surely this is a question best left to introspection. When you instruct yourself for example to imagine a table in your mind’s eye, what, if anything do you see? Well, in the first place I see absolutely nothing in the usual sense: the space before my eyes remains an empty void, with nothing in it. And yet at the same time I do see a totally invisible mental image of a table in that same empty space, or at least in a space that is somehow superimposed on my normal visual space. I can both see the mental image, in an invisible ghostly way, and at the same time I don’t see anything at all. Furthermore, what little I see of the imagined table does not always have a specific location, nor a specific size or scale, nor a specific viewing angle, nor a specific color or furniture style. The “image” of the table (if it can be called such) often appears either fleeting and unstable in location, scale, and orientation, or it appears totally abstract, non-spatial, as if expressed only in some symbolic non-spatial code, like a node in a neural network model that is labeled “table”. It is this fleeting evanescence and instability of the mental image that allows so many to deny its very existence as a spatially extended image in our imagination.

But although the mental image of a table can remain totally unspecified with respect to location, orientation, and scale, it is clearly possible to imagine a specific table at a specific location, orientation, and scale, and we can even select a color, and furniture style for our imagined table. The mental image remains perfectly invisible, we would still swear there is nothing there in the empty space before us. And yet at the same time we can see the imagined table right there in that empty space, with greater or lesser vividness and detail, even though its appearance in that space seems coincidental and inconsequential, like reflections in a pane of glass superimposed on the world seen through the glass. An artist or sculptor routinely sketches their mental image as if copying from a real image, demonstrating that there is some kind of information present in the mental image, and that information can be clearly spatially extended, like an actual three-dimensional image of a scene. It is even possible to locate the imagined table at a specific location in the space before us, outlining the spatial limits of its top and sides and legs with our hands, as if polishing the invisible surfaces of the imagined table. I call this exercise morphomimesis, miming the morphology of an imagined object with a wave of your palms, and thus revealing its explicit three-dimensional spatial structure. Although we can only mime two parts of the image at a time with two palms, the image itself can remain fixed and stable in space during the morphomimesis, demonstrating that it is possible to have a fully specified mental image that has the property of spatial extension across a specific region of space, even though it remains completely invisible in that space.

The fact that it is possible to form a mental image with a specific location and specific dimensions, and to mime its morphology with your palms, is proof that mental images can exist as stable three-dimensional structures, and that they can carry a specific information content. And the mental image can be formulated to have a specific location and spatial extent, even if it is not usually specified so precisely, but often remains in an indeterminate state. The fleeting evanescence and instability of many mental images should not be viewed as counter-evidence for their existence as images, but is merely evidence of a fleeting and unstable imaging system, one that is capable of representing multiple possibilities all superimposed, much like a quantum particle that can exist in multiple states simultaneously. Like a quantum particle, the mechanism or principle underlying the mental image can apparently flip or morph continuously into different forms, unless it is held to a stable state by an act of will.

So let us examine the mental image medium to see what mental images are composed of, how they present themselves to consciousness. Picture, if you will, a square, of the geometrical variety, that is, composed of Euclidean lines that span four Euclidean points marking the four corners, to define a square segment of a Euclidean plane, a surface that is perfectly flat and thin. It is possible to imagine such a square of any size, I can rotate it in my mind to any orientation in three dimensions, and I can trace out with my fingertips exactly where I am imagining the square at any moment. In other words, what I see in my mind’s eye is an image, very much like the images I see with normal vision, except that the mental image is totally and completely invisible.

Mental imagery compels us to acknowledge two different types of seeing: One is the regular type of seeing, as when viewing a colored cardboard square whose edges are defined by a visible transition in color and/or brightness across the edge, and the other is a kind of invisible seeing in which imagined objects are completely invisible, and yet we can “see” them as spatially extended structures that can occupy specific volumes of visual space. Michotte (Michotte, 1963; Kanizsa, 1979; Michotte, Thine`s & Crabbe´, 1991) has called this kind of vision “amodal” perception, because it is perception in the absence of a particular visual modality, such as color or brightness. We see a square in our imagination, but it is in a kind of invisible outline form, like a figure in a geometry text, without color or substance, just a shape.

But it is also possible to imagine a color with your mental image. I can just as easily imagine a red square, or a green one, and I can see my imagined square change color on my command, with a specific square region being painted out with the specific color that I choose to imagine, all the while remaining totally invisible in the sense that I’d be willing to swear in a court of law that I do not see a colored square before me, even though I can locate with precision the edges and corners of the colored square in my imagination that I don’t see. This ability to conjure into existence any simple geometrical structure I might choose, and to imagine it at any location and orientation and scale I might choose, and even paint it with imagined color, and yet to remain acutely aware of the distinction between reality and my imagination, is both the foundational origin of mathematical thought, and at the same time, it reveals the most basic operational principle behind human intelligence. We think primarily in pictures, the words only follow after the mental image is formed. And the words lose their meanings if they become disconnected from the images that they represent in the symbolic code of language.

Continued Chapter 3: Amodal Perception

Posted in Uncategorized | 1 Comment

The Perceptual Origins of Mathematics

Mathematics is a strange and wonderful mystery, that grows more strange and wonderful the more we learn about it. Our first introduction to mathematics in school is through arithmetic,  a pragmatic system of accounting and quantification,  first developed for applications such as counting sheep, for recording debts or payments, for  counting elapsed time, to quantify distances and areas for travel and surveying, and to predict the motions of the heavenly bodies, and thereby the seasons. These are of course but a tiny sample of the innumerable other applications where arithmetic comes in handy. The remarkable thing about math is that all of these varied and diverse problem domains can be addressed using a single general purpose conceptual tool. All these diverse problem domains are quantified and calculated using the same system of numbers, although the details of that system can vary from one culture to the next.

If we are to trace the origins of mathematics, we must focus on the part that is common to all number systems across all cultures and historical periods, because the common root of mathematics is likely to give a good indication as to that component of mathematics which is innate, a genetic heritage, as opposed to a cultural heritage passed down from generation to generation. Lakoff & Núñez (2000) trace the origin of mathematics to an instinctive form of counting called subitizing, in which you can instantaneously ‘see’ the number of things if the number is small, less than about five to seven items. This is the way you recognize the number of dots on dice.

DiceYou don’t have to count the individual dots, you can ‘see’ the numerical quantity immediately, in one glance, even if you had never been taught their names. Surely this is the origin of mathematics, the ability to see a difference in quantity, even if those numbers have no name, and even if that quantification only works for small numbers. Human cultural mathematics takes this innate numerical instinct and extends it to express much larger numerical quantities than can be counted by subitizing. But the cultural extension to innate math adheres to the same basic rules that govern the first few numbers, and many of the extensions to the natural numbers, such as the concept of zero, of negative numbers, fractions and rational numbers, irrational and imaginary numbers, are such a natural extension by the same rules, that they almost seem obvious after the fact, as if they had been there all along waiting to be discovered, as soon as the problems that they resolve are first encountered.


But the concept of numbers runs much deeper than a simple one, two, three. Before you can even think of counting, you have to understand the concept of countable things. Whether you are counting people, coins, or pebbles on a beach, you must decide on the set of things you want to count, and what requirements you should establish to qualify for inclusion in that set. How big can a pebble be before it counts as a rock instead of a pebble, and how small before it counts as a grain of gravel or sand? The answer depends on why you are counting them, that requires a fine mathematical judgment. Counting requires that all the members of the set can be considered equivalent, interchangable, they each count for exactly one. This is so natural and obvious an assumption (to us humans) that we don’t even realize that we are making it. But it is an artifact of the human mind, not of the natural world, that things come in identical countable units. In reality pebbles are not at all identical, each one has its own unique size and shape and color, and the distinctions between rock and pebble and gravel are overlapping and indefinite. The fact that we can make ourselves see a set of pebbles as identical units is itself a conceptual feat that is a prerequisite to learning to count them.


Lakoff explains the foundations of mathematical thinking with the concept of schemas. For example, before counting pebbles on the beach, you must first see the beach, and see the pebbles on it, and you must select which pebbles you wish to count based on some criteria that determine which pebbles belong in your countable set. If counting to merely demonstrate the concept, you might choose a dozen or so pebbles of a certain size or color, and set them apart from the rest, drawing an encompassing circle with your mind, distinguishing these pebbles as the set to be counted. For the process of counting, you must imagine another group, or imaginary circle, into which you either move the pebbles one by one, or perhaps you imagine that circle to engulf the pebbles one by one as you are counting, moving them from the set of pebbles to be counted, to the set of pebbles already counted, while keeping a tally of the total by the familiar sequence 1, 2, 3, … . All of the thought before the actual counting begins, is wordless thought that we did not have to learn in school, we knew instinctively how to judge whether this pile of pebbles is greater in number than that one. We didn’t need to know how to count for that. We don’t need to know numbers for forming a schema in our mind. However it is impossible to do any real mathematics without first forming a schema in our mind. The schema is the framework that gives meaning to the math. It is what poses the question in the first place, and what interprets the meaning of the results at the end. That is the real mathematical thinking beyond mere arithmetic, the framing of the problem to be solved, the selection of the algorithm to be used to solve it, and understanding the significance of the results after the counting is done. The counting itself is the simplest part of the problem – so simple that even a stupid computer can do it. But conceptualizing the situation and seeing the schema in it, is in fact the real mathematical part of the task. The rest is merely arithmetic.

Another schema that Lakoff cites as a guiding metaphor for numbers is the concept of pacing off a distance with steps of equal size. This relates to the idea of the number line, numerical value depicted as spatial extension in one dimension. This opens the possibility for negative numbers, when pacing backward beyond the origin, and it also presents the unitary interval as a continuum,  infinitely sub-divisible into fractional sub-intervals. Again, the actual counting of the paces, the choice of names to call the numbers, or whether to record them in binary or hexadecimal, or using Arabic or Roman numerals, is trivial compared to the real act of mathematical thought, which is seeing the land that requires surveying, knowing which dimensions to measure, making the measurements, and then understanding the significance of the paced-off quantities. The greater part of mathematical thought is not the simple arithmetic that we first learn in school as math. Real mathematical thought is an embodied process, one that cannot be meaningfully separated from our direct experience of our body located in the world.

Lakoff is right in identifying the origin of math in practical hands-on problem solving and basic conceptualization of reality. But the problem with identifying schemas as the origin of mathematics is that the notion of a schema is very vague and abstracted. We are all familiar with the experience of forming and using a schema, but we have no idea how we are actually doing it, or what a schema even is. Lakoff leaves the story of the origins of mathematics hanging at this point. This is the frontier of the terra incognita at the root of mathematics. From here on down we have no idea how we do the most primal mathematics, constructing for ourselves a schema of the world. What does that even mean?

The profound difference between schematic conceptualization and arithmetical calculation has come into sharper contrast in this era of the digital computer, a mechanism that is capable of millions of arithmetical operations per second, and of computational algorithms of fantastic complexity, with virtually perfect reliability and reproducibility, far beyond anything we can accomplish with pencil and paper. The computer is a very useful tool for the mathematician, to plot his mathematical thoughts and make them visible on the screen, And yet the digital computer is totally incapable of even the most primitive mathematical thought. The computer is completely incapable of conceptualizing a schema, or understanding the significance of the quantities that it calculates. It seems that there are two starkly contrasting aspects of math, one which is thoroughly understood, which can be performed by stupid machines even better than by humans, the other that we all do unconsciously and instinctively virtually every waking moment, but have no idea how we do it or even what it is we are doing. The focus of this book is on that other aspect of mathematical thought, what it is, and how it works, and ultimately, how we could build an artificial mind that is capable of real mathematical thought, and with it, a spatial consciousness and sense of self-existence like our own.

Continued Chapter 2: The Schema as a Mental Image

Posted in Uncategorized | 3 Comments