Chapter 6

 

Chapter 6, Part I
Shape representation and description: Region identification 



Shape representation and description 

Defining the shape of an object can prove to be very difficult. Shape is usually represented verbally or in figures. 
There is no generally accepted methodology of shape description. Further, it is not known what in shape is important. 



Current approaches have both positive and negative attributes; computer graphics or mathematics use effective shape
representations which are unusable in shape recognition and vice versa. 
In spite of this, it is possible to find features common to most shape description approaches. 



Common shape description methods can be characterized from different points of view 
Input representation form: Object description can be based on boundaries or on more complex knowledge of whole
regions 
Object reconstruction ability: That is, whether an object's shape can or cannot be reconstructed from the
description. 
Incomplete shape recognition ability: That is, to what extent an object's shape can be recognized from the
description if objects are occluded and only partial shape information is available. 
Local/global description character: Global descriptors can only be used if complete object data are available for
analysis. Local descriptors describe local object properties using partial information about the objects. Thus, local
descriptors can be used for description of occluded objects. 
Mathematical and heuristic techniques: A typical mathematical technique is shape description based on the Fourier
transform. A representative heuristic method may be elongatedness. 
Statistical or syntactic object description. 
A robustness of description to translation, rotation, and scale transformations: Shape description properties in
different resolutions. 







Sensitivity to scale is even more serious if a shape description is derived, because shape may change substantially with
image resolution. 
Therefore, shape has been studied in multiple resolutions which again causes difficulties with matching corresponding shape
representations from different resolutions. 
Moreover, the conventional shape descriptions change discontinuously. 
A scale-space approach aims to obtain continuous shape descriptions if the resolution changes continuously. 





In many tasks, it is important to represent classes of shapes properly, e.g. shape classes of apples, oranges, pears,
bananas, etc. 
The shape classes should represent the generic shapes of the objects belonging to the same classes well. Obviously,
shape classes should emphasize shape differences among classes while the influence of shape variations within classes
should not be reflected in the class description. 



Despite the fact that we are dealing with two-dimensional shape and its description, our world is three-dimensional and the
same objects, if seen from different angles (or changing position/orientation in space), may form very different 2D
projections. 
The ideal case would be to have a universal shape descriptor capable of overcoming these changes -- to design
projection-invariant descriptors. 



Consider an object with planar faces and imagine how many very different 2D shapes may result from a given face if the
position and 3D orientation of this simple object changes with respect to an observer. In some special cases, like circles
which transform to ellipses, or planar polygons, projectively invariant features (invariants} can be found. 



Object occlusion is another hard problem in shape recognition. However, the situation is easier here (if pure occlusion is
considered, not combined with orientation variations yielding changes in 2D projections as discussed above), since visible
parts of objects may be used for description. 
Here, the shape descriptor choice must be based on its ability to describe local object properties -- if the descriptor only
gives a global object description, such a description is useless if only a part of an object is visible. If a local descriptor is
applied, this information may be used to compare the visible part of the object to all objects which may appear in the
image. Clearly, if object occlusion occurs, the local or global character of the shape descriptor must be considered first. 



Region identification 

Region identification is necessary for region description. One of the many methods for region identification is to label each
region (or each boundary) with a unique (integer) number; such identification is called labeling or coloring, also
connected component labeling. 
Goal of segmentation was to achieve complete segmentation, now, the regions must be labeled. 














Label collision is a very common occurrence -- examples of image shapes experiencing this are U-shaped objects,
mirrored E objects, etc. 
The equivalence table is a list of all label pairs present in an image; all equivalent labels are replaced by a unique label in the
second step. 
The algorithm is basically the same in 4-connectivity and 8-connectivity, the only difference being in the neighborhood
mask shape. 



55:148 Digital Image Processing
55:247 Image Analysis and Understanding 

Chapter 6, Part II
Shape representation and description: Contour-based shape
representation and description 



Chapter 6.2 Overview:

Chain codes 
Simple geometric shape representation 
Fourier transforms of borders 
Boundary description using segment sequences 
B-spline representation 
Other contour-based shape description approaches 
Shape invariants PE5.1



Contour-based shape representation and description 

Region borders must be expressed in some mathematical form. 







Chain codes 

Chain codes describe an object by a sequence of unit-size line segments with a given orientation. 
The first element of such a sequence must bear information about its position to permit the region to be reconstructed. 







If the chain code is used for matching it must be independent of the choice of the first border pixel in the sequence. One
possibility for normalizing the chain code is to find the pixel in the border sequence which results in the minimum integer
number if the description chain is interpreted as a base four number -- that pixel is then used as the starting pixel. 
A mod 4 or mod 8 difference is called a chain code derivative. 




Simple geometric border representation 

The following descriptors are mostly based on geometric properties of described regions. Because of the discrete
character of digital images, all of them are sensitive to image resolution. 
Boundary length 
Curvature 







Bending energy 









Signature 





Chord 

Chord is a line joining any two points of the region boundary is a chord. 
Let b(x,y)=1 represent the contour points, and b(x,y)=0 represent all other points. 









Rotation-independent radial distribution: 





The angular distribution h_a(theta) is independent of scale, while rotation causes a proportional offset 









Fourier transforms of boundaries 

Suppose C is a closed curve (boundary) in the complex plane. 
Traveling anti-clockwise along this curve keeping constant speed, a complex function z(t) is obtained, where t is a time
variable. 







The coefficients T_n of the series are called the Fourier descriptors of the curve C. It is more useful to consider the curve
distance s in comparison to time 



where L is the curve length. The Fourier descriptors T_n are given by 







The descriptors are influenced by the curve shape and by the initial point of the curve. Working with digital image data,
boundary co-ordinates are discrete and the function z(s) is not continuous. Assume that z(k) is a discrete version of z(s),
where 4-connectivity is used to get a constant sampling interval; the descriptors T_n can be computed from the discrete
Fourier transform. 





The Fourier descriptors can be invariant to translation and rotation if the co-ordinate system is appropriately chosen. 





The coefficients a_n, b_n are not invariant, but after the transform 





r_n are translation and rotation invariant. 
To achieve a magnification invariance the descriptors w_n are used: 



The first 10 -- 15 descriptors w_n are found to be sufficient for character description. 







A closed boundary can be represented as a function of angle tangents versus the distance between the boundary points
from which the angles were determined 





The descriptor set is then 





The high quality boundary shape representation obtained using only a few lower order coefficients is a favorable property
common to Fourier descriptors. 



Boundary description using segment sequences 

If the segment type is known for all segments, the boundary can be described as a chain of segment types, a code-word
consisting of representatives of a type alphabet. 



A polygonal representation approximates a region by a polygon, the region being represented using its vertices.
Polygonal representations are obtained as a result of a simple boundary segmentation. 



Another method for determining the boundary vertices is a tolerance interval approach based on setting a maximum
allowed difference e. 





Recursive boundary splitting. 





Boundary segmentation into segments of constant curvature or curve segmentation into circular arcs and straight lines is
used. 
Segments are considered as primitives for syntactic shape recognition procedures. 





Sensitivity of shape descriptors to scale is also important if a curve is to be divided into segments -- a scale-space
approach to curve segmentation. 
Only new segmentation points can appear at higher resolutions, and no existing segmentation points can disappear. 



Fine details of the curve disappear in pairs with increasing size of the Gaussian smoothing kernel, and two segmentation
points always merge to form a closed contour showing that any segmentation point existing in coarse resolution must also
exist in finer resolution. 
Moreover, the position of a segmentation point is most accurate in finest resolution and this position can be traced from
coarse to fine resolution using the scale-space image. 



A multiscale curve description can be represented by an interval tree. 





B-spline representation 

Representation of curves using piecewise polynomial interpolation to obtain smooth curves is widely used in computer
graphics. 
B-splines are piecewise polynomial curves whose shape is closely related to their control polygon - a chain of vertices
giving a polygonal representation of a curve. 
B-splines of the third-order are most common because this is the lowest order which includes the change of curvature. 



Splines have very good representation properties and are easy to compute: 
Firstly, they change their shape less then their control polygon, and do not oscillate between sampling points as
many other representations do. A spline curve is always positioned inside a convex n+1-polygon for a B-spline of
the n-th order. 
Secondly, the interpolation is local in character. If a control polygon vertex changes its position, a resulting change of
the spline curve will occur only in a small neighborhood of that vertex. 
Thirdly, methods of matching region boundaries represented by splines to image data are based on a direct search
of original image data. 







Each part of a cubic B-spline curve is a third-order polynomial, meaning that it and its first and second derivatives are
continuous. B-splines are given by 





































Other contour-based shape description approaches 

Many other methods and approaches can be used to describe two-dimensional curves and contours. 



The Hough transform has excellent shape description abilities. 



Region-based shape description using statistical moments is covered below. 



Fractal approach to shape is gaining attention in image shape description. 



Mathematical morphology can be used for shape description, typically in connection with region skeleton construction. 



Neural networks can be used to recognize shapes in raw boundary representations directly. Contour sequences of
noiseless reference shapes are used for training, and noisy data are used in later training stages to increase robustness;
effective representations of closed planar shapes result. 



Shape invariants 

Shape invariants represent a very active current research area in machine vision. 
Importance of shape invariance has been known for a long time however it is somewhat novel approach in machine vision. 
Invariant theory is not new and many of its principles were introduced in the nineteenth century. 



Shape descriptors discussed so far depend on viewpoint, meaning that object recognition may often be impossible as a
result of changed object or observer position. 



The role of shape description invariance is obvious -- shape invariants represent properties of such geometric
configurations which remain unchanged under an appropriate class of transforms. 



Machine vision is especially concerned with the class of projective transforms. 







Collinearity is the simplest example of a projectively invariant image feature. Any straight line is projected as a straight line
under any projective transform. 



Similarly, the basic idea of the projection-invariant shape description is to find such shape features that are unaffected by
the transform between the object and the image plane. 



A standard technique of projection-invariant description is to hypothesize the pose (position and orientation) of an object
and transform this object into a specific co-ordinate system; then shape characteristics measured in this co-ordinate system
yield an invariant description. 
However, the pose must be hypothesized for each object and each image which makes this approach difficult and
unreliable. 



Application of invariant theory, where invariant descriptors can be computed directly from image data without the need
for a particular co-ordinate system, represents another approach. 



Let corresponding entities in two different co-ordinate systems be distinguished by large and small letters. An invariant of a
linear transformation may be defined as: 
An invariant, I(P), of a geometric structure described by a parameter vector P, subject to a linear transformation T
of the co-ordinates x=TX, is transformed according to I(p)=I(P)|T|^w. 
Here I(p) is the function of the parameters after the linear transformation, and |T| is the determinant of the matrix T. 
In this definition, w is referred to as the weight of the invariant. If w=0, the invariants are called scalar invariants, which are
considered below. 



Invariant descriptors are unaffected by object pose, by perspective projection, and by the intrinsic parameters of the
camera. 
Several examples of invariants are now given. 



Cross ratio: 
The cross ratio represents a classic invariant of a projective line. 
A straight line is always projected as a straight line. Any four collinear points A,B,C,D may be described by the
cross-ratio invariant 



where (A-C) represents the distance between points A and C. Note that the cross ratio depends on the order in which the
four collinear points are labeled. 



Practical Experiment 5.1 

1.Open the image shape-invariants.pgm or shape-invariants.tif using cantata (located in ~dip/examples/images.dir) 
2.Determine coordinates of 4 collinear points from object type 1 and type 2 (corresponding within object types). 
3.Calculate the invariant I (eq. 6.25) for the two shape classes - you may use the simple Matlab shape.m program (located
in ~dip/examples/khoros.dir). 
4.Are the invariants equal for smae objects imaged in different pose? 
5.Are the invariants different for the two shapes? 

Hint: For the star-like object, no collinar points exist directly. However, 4 coplanar lines can always be used to generate 4
collinear points. 



A system of five general coplanar lines forms two invariants 



where M_ijk=(l_i, l_j, l_k). 

l_i=(l_i^1,l_i^2,l_i^3)^T is a representation of a line l_i^1x+l_i^2y+ l_i^3=0, where i is from interval [1,5], and |M| is the
determinant of M. 

If the three lines forming the matrix M_ijk are concurrent, the matrix becomes singular and the invariant is undefined.



A system of five coplanar points is dual to a system of five lines and the same two invariants are formed. 
These two functional invariants can also be formed as two cross ratios of two coplanar concurrent line quadruples. 
Note that even though combinations other than those given in Figure may be formed, only the two presented functionally
independent invariants exist. 









Plane conics: 
A plane conic may be represented by an equation 



for x=(x,y,1)^T. 



Then the conic may also be defined by a matrix C 





For any conic represented by a matrix C, and any two coplanar lines not tangent to the conic, one invariant may be defined





The same invariant can be formed for a conic and two coplanar points. 
Two invariants can be determined for a pair of conics represented by their respective matrices C_1, C_2 normalized so
that |C_i|=1 





For non-normalized conics, the invariants of associated quadratic forms are 





and two true invariants of the conics are 





Two plane conics uniquely determine four points of intersection, and any point that is not an intersection point may be
chosen to form a five-point system together with the four intersection points. 
Therefore, two invariants exist for the pair of conics, as for the five-point system. 



Many man-made objects consist of a combination of straight lines and conics, and these invariants may be used for their
description. 
However, if the object has a contour which cannot be represented by an algebraic curve, the situation is much more
difficult. 



Differential invariants can be formed (e.g. curvature, torsion, Gaussian curvature) which are not affected by projective
transforms. 
These invariants are local - that is, the invariants are found for each point on the curve, which may be quite general. 
Unfortunately, these invariants are extremely large and complex polynomials, requiring up to seventh derivatives of the
curve, which makes them practically unusable due to image noise and acquisition errors, although noise-resistant local
invariants are beginning to appear. 
However, if additional information is available, higher derivatives may be avoided. 



Stability of invariants is another crucial property which affects their applicability. The robustness of invariants to image
noise and errors introduced by image sensors is of prime importance, although not much is known about this. 
Different invariants have different stability and distinguishing powers. 



An example of recognition of man-made objects using invariant description of four coplanar lines, a conic and two lines,
and a pair of coplanar conics is given. 
The recognition system is based on a model library containing over thirty object models - significantly more than that
reported for other recognition systems. 
Moreover, the construction of the model library is extremely easy; no special measurements are needed, the object is
digitized in a standard way and the projectively invariant description is stored as a model. 
Further, there is no need for camera calibration. The recognition accuracy is 100% for occluded objects viewed
from different viewpoints if the objects are not severely disrupted by shadows and specularities. 



55:148 Digital Image Processing
55:247 Image Analysis and Understanding 

Chapter 6, Part III
Shape representation and description: Region-based shape
representation and description 



Chapter 6.3 Overview:

Simple scalar region descriptors 
Moments 
Convex hull 
Graph representation based on region skeleton 
Region decomposition 
Region neighborhood graphs 



Region-based shape representation and description



Simple scalar region descriptors 



A large group of shape description techniques is represented by heuristic approaches which yield acceptable results in
description of simple shapes. 
Heuristic region descriptors: 
area, 
rectangularity, 
elongatedness, 
direction, 
compactness, 
etc. 
These descriptors cannot be used for region reconstruction and do not work for more complex shapes. 



Procedures based on region decomposition into smaller and simpler subregions must be applied to describe more
complicated regions, then subregions can be described separately using heuristic approaches. 



Simple scalar region descriptors 
Area is given by the number of pixels of which the region consists. 
The real area of each pixel may be taken into consideration to get the real size of a region. 
If an image is represented as a rectangular raster, simple counting of region pixels will provide its area. 
If the image is represented by a quadtree, then: 






The region can also be represented by n polygon vertices 



the sign of the sum represents the polygon orientation. 



If the region is represented by the (anti-clockwise) Freeman chain code the following algorithm provides the area 






Euler's number 
(sometimes called Genus or the Euler-Poincare characteristic) describes a simple topologically invariant property of the
object. 
S is the number of contiguous parts of an object and N is the number of holes in the object (an object can consist of
more than one region). 





Projections 
Horizontal and vertical region projections 









Eccentricity 
The simplest is the ratio of major and minor axes of an object. 




Elongatedness 
A ratio between the length and width of the region bounding rectangle. 




This criterion cannot succeed in curved regions, for which the evaluation of elongatedness must be based on maximum
region thickness. 
Elongatedness can be evaluated as a ratio of the region area and the square of its thickness. 
The maximum region thickness (holes must be filled if present) can be determined as the number d of erosion steps that
may be applied before the region totally disappears. 





Rectangularity 
Let F_k be the ratio of region area and the area of a bounding rectangle, the rectangle having the direction k. The
rectangle direction is turned in discrete steps as before, and rectangularity measured as a maximum of this ratio F_k 





Direction 
Direction is a property which makes sense in elongated regions only. 
If the region is elongated, direction is the direction of the longer side of a minimum bounding rectangle. 
If the shape moments are known, the direction \theta can be computed as 





Elongatedness and rectangularity are independent of linear transformations -- translation, rotation, and scaling. 
Direction is independent on all linear transformations which do not include rotation. 
Mutual direction of two rotating objects is rotation invariant. 



Compactness 
Compactness is independent of linear transformations 





The most compact region in a Euclidean space is a circle. 
Compactness assumes values in the interval [1,infty) in digital images if the boundary is defined as an inner boundary, while
using the outer boundary, compactness assumes values in the interval [16,infty). 
Independence from linear transformations is gained only if an outer boundary representation is used. 







Moments 

Region moment representations interpret a normalized gray level image function as a probability density of a 2D random
variable. 
Properties of this random variable can be described using statistical characteristics - moments. 
Assuming that non-zero pixel values represent regions, moments can be used for binary or gray level region description. 
A moment of order (p+q) is dependent on scaling, translation, rotation, and even on gray level transformations and is given
by 





In digitized images we evaluate sums 





where x,y,i,j are the region point co-ordinates (pixel co-ordinates in digitized images). 



Translation invariance can be achieved if we use the central moments 





or in digitized images 





where x_c, y_c are the co-ordinates of the region's centroid 





In the binary case, m_00 represents the region area. 
Scale invariant features can also be found in scaled central moments 





and normalized unscaled central moments 





Rotation invariance can be achieved if the co-ordinate system is chosen such that mu_11 = 0. 
A less general form of invariance is given by seven rotation, translation, and scale invariant moment characteristics 







While the seven moment characteristics presented above were shown to be useful, they are only invariant to translation,
rotation, and scaling. 



A complete set of four affine moment invariants derived from second- and third-order moments is 





All moment characteristics are dependent on the linear gray level transformations of regions; to describe region shape
properties, we work with binary image data (f(i,j)=1 in region pixels) and dependence on the linear gray level transform
disappears. 



Moment characteristics can be used in shape description even if the region is represented by its boundary. 
A closed boundary is characterized by an ordered sequence z(i) that represents the Euclidean distance between the
centroid and all N boundary pixels of the digitized shape. 
No extra processing is required for shapes having spiral or concave contours. 
Translation, rotation, and scale invariant one-dimensional normalized contour sequence moments can be estimated as 





The r-th normalized contour sequence moment and normalized central contour sequence moment are defined as 





Less noise-sensitive results can be obtained from the following shape descriptors 







Convex hull



A region R is convex if and only if for any two points x_1, x_2 from R, the whole line segment defined by its end-points
x_1, x_2 is inside the region R. 
The convex hull of a region is the smallest convex region H which satisfies the condition R is a subset of H. 
The convex hull has some special properties in digital data which do not exist in the continuous case. For instance, concave
parts can appear and disappear in digital data due to rotation, and therefore the convex hull is not rotation invariant in
digital space. 
The convex hull can be used to describe region shape properties and can be used to build a tree structure of region
concavity. 





A discrete convex hull can be defined by the following algorithm which may also be used for convex hull construction. 
This algorithm has complexity O(n^2) and is presented here as an intuitive way of detecting the convex hull. 






More efficient algorithms exist, especially if the object is defined by an ordered sequence of n vertices representing a
polygonal boundary of the object. 
If the polygon P is a simple polygon (self-non-intersecting polygon) which is always the case in a polygonal representation
of object borders, the convex hull may be found in linear time O(n). 
In the past two decades, many linear-time convex hull detection algorithms have been published, however more than half
of them were later discovered to be incorrect with counter-examples published. 



The simplest correct convex hull algorithm was developed by Melkman and is now discussed further. 
Let the polygon for which the convex hull is to be determined be a simple polygon P = v_1, v_2, ... v_n and let the
vertices be processed in this order. 
For any three vertices x,y,z in an ordered sequence, a directional function delta may be evaluated 





The main data structure H is a list of vertices (deque) of polygonal vertices already processed. 
The current contents of H represents the convex hull of the currently processed part of the polygon, and after the detection
is completed, the convex hull is stored in this data structure. 
Therefore, H always represents a closed polygonal curve, H={d_b, ... ,d_t} where d_b points to the bottom of the list and
d_t points to its top. 
Note that d_b and d_t always refer to the same vertex simultaneously representing the first and the last vertex of the closed
polygon. 



Main ideas of the algorithm: 
The first three vertices A,B,C from the sequence P form a triangle (if not collinear) and this triangle represents a
convex hull of the first three vertices. 
The next vertex D in the sequence is then tested for being located inside or outside the current convex hull. 
If D is located inside, the current convex hull does not change. 
If D is outside of the current convex hull, it must become a new convex hull vertex and based on the current convex
hull shape, either none, one, or several vertices must be removed from the current convex hull. 
This process is repeated for all remaining vertices in the sequence P. 





The variable v refers to the input vertex under consideration, and the following operations are defined: 





The algorithm is then; 





The algorithm as presented may be difficult to follow, however, a less formal version would be impossible to implement. 
The following example makes the algorithm more understandable. 






A new vertex should be entered from P, however there is no unprocessed vertex in the sequence P and the convex hull
generating process stops. 
The resulting convex hull is defined by the sequence H={d_b, ... ,d_t}={D,C,A,D} which represents a polygon DCAD,
always in the clockwise direction. 





A region concavity tree is generated recursively during the construction of a convex hull. 
A convex hull of the whole region is constructed first, and convex hulls of concave residua are found next. 
The resulting convex hulls of concave residua of the regions from previous steps are searched until no concave residuum
exists. 
The resulting tree is a shape representation of the region. 







Graph representation based on region skeleton

Objects are represented by a planar graph with nodes representing subregions resulting from region decomposition, and
region shape is then described by the graph properties. 
There are two general approaches to acquiring a graph of subregions: 
The first one is region thinning leading to the region skeleton, which can be described by a graph. 
The second option starts with the region decomposition into subregions, which are then represented by nodes
while arcs represent neighborhood relations of subregions. 



Graphical representation of regions has many advantages; the resulting graphs 
are translation and rotation invariant; position and rotation can be included in the graph definition 
are insensitive to small changes in shape 
are highly invariant with respect to region magnitude 
generate a representation which is understandable 
can easily be used to obtain the information-bearing features of the graph 
are suitable for syntactic recognition 



Graph representation based on region skeleton 
This method corresponds significantly curving points of a region boundary to graph nodes. 
The main disadvantage of boundary-based description methods is that geometrically close points can be far away from one
another when the boundary is described - graphical representation methods overcome this disadvantage. 
The region graph is based on the region skeleton, and the first step is the skeleton construction. 



There are four basic approaches to skeleton construction: 
thinning - iterative removal of region boundary pixels 
wave propagation from the boundary 
detection of local maxima in the distance-transformed image of the region 
analytical methods 
Most thinning procedures repeatedly remove boundary elements until a pixel set with maximum thickness of one or two is
found. The following algorithm constructs a skeleton of maximum thickness two. 







Steps of this algorithm are illustrated in the next Figure. 
If there are skeleton segments which have a thickness of two, one extra step can be added to reduce those to a thickness
of one, although care must be taken not to break the skeleton connectivity. 





Thinning is generally a time-consuming process, although sometimes it is not necessary to look for a skeleton, and one side
of a parallel boundary can be used for skeleton-like region representation. 



Mathematical morphology is a powerful tool used to find the region skeleton. 



Thinning procedures often use a medial axis transform to construct a region skeleton. 



Under the medial axis definition, the skeleton is the set of all region points which have the same minimum distance from the
region boundary for at least two separate boundary points. 



Such a skeleton can be constructed using a distance transform which assigns a value to each region pixel representing its
(minimum) distance from the region's boundary. 



The skeleton can be determined as a set of pixels whose distance from the region's border is locally maximal. 



Every skeleton element can be accompanied by information about its distance from the boundary -- this gives the potential
to reconstruct a region as an envelope curve of circles with center points at skeleton elements and radii corresponding to
the stored distance values. 



Small changes in the boundary may cause serious changes in the skeleton. 



This sensitivity can be removed by first representing the region as a polygon, then constructing the skeleton. 



Boundary noise removal can be absorbed into the polygon construction. 



A multi-resolution approach to skeleton construction may also result in decreased sensitivity to boundary noise. 



Similarly, the approach using the Marr-Hildreth edge detector with varying smoothing parameter facilitates scale-based
representation of the region's skeleton. 









Skeleton construction algorithms do not result in graphs but the transformation from skeletons to graphs is relatively
straightforward. 
Consider first the medial axis skeleton, and assume that a minimum radius circle has been drawn from each point of the
skeleton which has at least one point common with a region boundary. 
Let contact be each contiguous subset of the circle which is common to the circle and to the boundary. 
If a circle drawn from its center A has one contact only, A is a skeleton end-point. 
If the point A has two contacts, it is a normal skeleton point. 
If A has three or more contacts, the point A is a skeleton node-point. 





It can be seen that boundary points of high curvature have the main influence on the graph. 
They are represented by graph nodes, and therefore influence the graph structure. 
If other than medial axis skeletons are used for graph construction, end-points can be defined as skeleton points having just
one skeleton neighbor, normal-points as having two skeleton neighbors, and node-points as having at least three skeleton
neighbors. 
It is no longer true that node-points are never neighbors and additional conditions must be used to decide when
node-points should be represented as nodes in a graph and when they should not. 





Region decomposition 
The decomposition approach is based on the idea that shape recognition is a hierarchical process. 
Shape primitives are defined at the lower level, primitives being the simplest elements which form the region. 
A graph is constructed at the higher level - nodes result from primitives, arcs describe the mutual primitive relations. 
Convex sets of pixels are one example of simple shape primitives. 








The solution to the decomposition problem consists of two main steps: 
The first step is to segment a region into simpler subregions (primitives) and the second is the analysis of primitives. 
Primitives are simple enough to be successfully described using simple scalar shape properties. 
If subregions are represented by polygons, graph nodes bear the following information; 
Node type representing primary subregion or kernel. 
Number of vertices of the subregion represented by the node. 
Area of the subregion represented by the node. 
Main axis direction of the subregion represented by the node. 
Center of gravity of the subregion represented by the node. 
If a graph is derived using attributes 1-4, the final description is translation invariant. 
A graph derived from attributes 1-3 is translation and rotation invariant. 
Derivation using the first two attributes results in a description which is size invariant in addition to possessing
translation and rotation invariance. 





Region neighborhood graphs 

Any time a region decomposition into subregions or an image decomposition into regions is available, the region or image
can be represented by a region neighborhood graph (the region adjacency graph being a special case). 
This graph represents every region as a graph node, and nodes of neighboring regions are connected by edges. 
A region neighborhood graph can be constructed from a quadtree image representation, from run-length encoded image
data, etc. 





Very often, the relative position of two regions can be used in the description process -- for example, a region A may be
positioned to the left of a region B, or above B, or close to B, or a region C may lie between regions A and B, etc. 



We know the meaning of all of the given relations if A,B,C are points, but, with the exception of the relation to be close,
they can become ambiguous if A,B,C are regions. 
For instance, human observers are generally satisfied with the definition: 
The center of gravity of A must be positioned to the left of the leftmost point of B and (logical AND) the rightmost
pixel of A must be left of the rightmost pixel of B 



55:148 Digital Image Processing
55:247 Image Analysis and Understanding 

Chapter 6, Part IV
Shape representation and description: Shape classes



Shape classes 

Representation of shape classes is considered a challenging problem of shape description. 
The shape classes are expected to represent the generic shapes of the objects belonging to the class well and emphasize
shape differences between classes, while the shape variations allowed within classes should not influence the description. 



There are many ways to deal with such requirements. 
A widely used representation of in-class shape variations is determination of class-specific regions in the feature space. 
The feature space can be defined using a selection of shape features described earlier. 
Another approach to shape class definition is to use a single prototype shape and determine a planar warping transform
that if applied to the prototype produces shapes from the particular class. 
The prototype shape may be derived from examples. 



If a set of landmarks can be identified on the regions belonging to specific shape classes, the landmarks can characterize
the classes in a simple and powerful way. 
Landmarks are usually selected as easily recognizable border or region points. 
For planar shapes, a co-ordinate system can be defined that is invariant to similarity transforms of the plane (rotation,
translation, scaling). 
If such a landmark model uses n points per 2D object, the dimensionality of the shape space is 2n. 
Clearly, only a subset of the entire shape space corresponds to each shape class and the shape class definition reduces to
the definition of the shape space subsets. 



Cootes et al. determine principal components in the shape space from training sets of shapes after the shapes are iteratively
aligned. 
The first few principal components characterize the most significant variations in shape. 
Thus, a small number of shape parameters represent the major shape variation characteristics associated with the shape
class. 
Such a shape class representation is referred to as point distribution models and is discussed in detail in Chapter 8 in the
context of image interpretation.