A machine and human’s perception of the world in Augmented Reality

Spatial vision and depth cues

Literally billions of signals sent to the cerebral cortex for analysis to form an image. There are a number of spatial and depth cues that enable the human brain to decipher these light signals to create our visible reality.

Extra-retinal cues

These type of cues are a result of physiological processes rather than those derived from light patterns entering the eye.

Accommodation (shifting focus)
Accommodation is an extra-retinal cue that helps the eye to shift focus between the objects in the foreground and background. The ciliary muscle encircles the iris to help an observer rapidly shift focus between different depth of fields. The optical power of the eye lens changes. When the eye is looking at objects at a comfortable distance, these muscles are relaxed. Ciliary muscles contract or accommodate when the eye needs to focus on objects nearby. This is the reason why it is advisable to look into the distance when you need to relax your eyes.

This is a fairly simple idea where both eyeballs rotate towards the centre when looking at objects far away in order to align the image for the brain to process. The eyeballs converge slightly towards each other when the object is near. This synchronized rotation is called vergence.

Why are accommodation and vergence important concepts for AR?
A number of AR glasses users often complain about headaches and eye strain. This is caused due to the eyes being focused on the flat panel within inches of the eye. Even though the 3D objects appear far, the illusion of depth is only simulated. Addition to this, there is a mismatch in the sensory cues provided by the eye to the brain. The eye has to accommodate and converge for a large amount of time which is the main cause of this discomfort. Reducing the effects of accommodation and vergence while simulating real depth is an interesting problem to be solved.

Retinal or Depth cues

These cues are derived from the light patterns entering the eye. These cues are either binocular (having two eyeballs has an effect on these cues) or monocular (these can be observed even with one functioning eyeball).

We have two eyes that are separate by an average distance of about 2.5 inches. Each eyeball captures a slightly different image. The perception depth observed due to the slight offset in both images being processed by the brain is called stereopsis. Stereopsis is especially important for immersive head-mounted VR displays. The two separate images shown to each eye and even a slight displacement in the images would lead to a loss of stereopsis making the VR experience feel unnatural.

Place your eyes closer to the screen such that one eye can see only a single cube. If you are on a phone, make sure it is in landscape mode for optimum experience

Stereopsis is the only binocular cue that is discussed here. Rest of the cues are all monocular.

Motion Parallax
This is a strong depth cue where objects which are closer appear of be moving much faster than those that are far away, even when both the objects are moving at same speeds. The reason for this is that, objects closer will through your field of view quicker than far off objects. This information is important to simulate relative depths between moving objects in a 3D environment.

Occlusion or interposition
These cues are observed when one object blocks the view of another object. The brain registers the blocking object to be closer than the object which is being blocked. Simulating effects of occlusion is especially difficult for AR scenarios where the computer has to know the position of near and far objects in the view. Using depth sensing cameras would be a viable solution to solve this issue for nearby objects.

Deletion and Accretion
This cue is an extension of motion parallax and depth cues. Deletion occurs when an object moves behind another object while accretion occurs when the object reveals itself in the observer’s viewpoint. If the deleting and accretion happen quickly, then the object is registered as being closer to the blocking object. Deletion and accretion occurs slowly if the two objects are farther away.

Linear perspective
This depth cue is a result of convergence of lines toward a single point in the distance. Parallel lines appear to recede into the distance. The more lines converge, the farther they appear.

Left: Linear perspective. Right: Kinetic depth effect from a series of silhouettes (Source: Wikipedia)

Kinetic depth effect
This effect is the perception of an object’s structure from it’s motion. This effect is especially useful in showing an object’s complex structure even when other depth cues are missing.

Familiar size
This depth cue helps us in estimating the size of an object with respect to surrounding elements. This can especially be useful in data visualizations where showing a relative size gives the user a perspective of the data.

The green house appears smaller than the yellow house

Relative size
Two objects of similar size but at different distances are perceived as different sizes relative to their distance from the observer. Two houses of the same size but at different distances cast a different retinal image which is perceived as a distance cue.

Relative size and Relative height

Relative height
In most normal settings, objects near your field of vision are seen on the lower portion of the retinal field, while those farther way are viewed on the higher portion.

Atmospheric or aerial perspective
This depth cue is a result of light being scattered by particles such as vapor and smoke. As the distance increases, the contrast between the object and the background decreases.

Atmospheric perspective

Texture gradient
This is an important cue where a gradual change in the texture of objects (normally from fine to coarse) gives a perception of depth. The density of a unit of texture or height of a unit or the reducing distance between textures gives a perception of distance.

Texture gradient

Lighting, Shade and Shadows
This is one of the most common and commonly used depth cues by artists and architects. The angle and sharpness of a shadow influence depth perception. Crisp and clearly defined shadows indicate a closer proximity while a fuzzy one may indicate greater depth. Also the way in which light interacts with irregularly shaped objects might reveal significant information about the object.

Optical expansion
This cue is an extension of the relative size cue and occlusion. As an object’s retinal image increases in size, it appears to be moving closer and starts occluding objects in it’s path.

The ball appears to be moving closer

read original article here