This demo shows our image retrieval system, which enables the user to
search a greyscale image database intuitively by presenting simple sketches.
The database
contains 120 different isolated objects (mostly hand tools) with arbitrary orientation.
Each of these images is represented by a Hidden Markov Model (HMM) which has
been modified in order to obtain rotation and scale invariance.
Thus, no labeling of the
content or the rotation angle of the object is needed when adding new images to
the database.
This is particularly important when using query by sketch due to
the skew which occurs naturally in human handwriting and in drawings.
The retrieved images can be ranked according to the
similarity with the query sketch using the output probabilities of the HMMs.
Furthermore, the use of HMMs allows efficient pruning and thus a fast
retrieval even with large databases.
Experiments with our demonstration system showed good retrieval results
with several users.
Traditionally, image databases have been accessed by textual queries, which requires the indexing of the entire database content. This is a time consuming procedure and may lead to inaccurate retrieval results if the interpretation of the image differs between the person(s) who did the indexing and the user of the database who issues the query. Apart from these disadvantages, the use of textual queries is not intuitive for humans and many users demand man-machine interfaces such as pens or touch screens rather than keyboards. In order to overcome these limitations of textual queries, several systems which use either example images or visual properties to retrieve pictorial information have been presented [1, 2, 3, 4]. In [4] Smith and Chang describe an image retrieval system applicable to the World Wide Web (WWW). Pictorial queries to the constantly growing WWW are a challenging task due to the need for fast and easy to expand algorithms. However, this system uses the histogram of an example image in order to retrieve similar pictorial information, which has the disadvantage, that the query is only based on global features and topological features like shape, etc. are omitted. Query by shape or sketch is used in [1, 2, 5], but important issues such as rotation or scale invariance have not been addressed yet. Rotation invariance is particularly important when dealing with user sketches, due to the skew which occurs naturally in human handwriting. Del Bimbo and Pala [5] present an elastic matching algorithm, which is able to deal with small rotations of the order of 12-15 degrees. The ability to cope with these small rotations should be sufficient in order to deal with the skew of the user sketch, however if the database consists of arbitrary rotated objects, 360 degree rotation invariance is needed. Our database consists mainly of hand tools and thus rotation invariance prevents either a time consuming labeling of the angle of each image or the user from being restricted to draw the tools at a fixed rotation angle. As in [3] we utilize Hidden Markov Models to build up our query by sketch image retrieval system, but after an initial training of the HMMs, we modify our models in order to obtain rotation invariance. Similar modifications might be difficult to integrate into the system presented by Lin et al. [3] due to the use of so called pseudo-2D HMMs (P2DHMMs) which try to model two dimensional shapes but fail to model the dependency of consecutive columns if the rows are presented to the superstates of the P2DHMMs. However, in [3] the use of colour features for the training of the P2DHMMs is described, what we think is a promising approach due to the possibility to express intra colour dependencies easily using the covariance matrixes of the Markov Model framework. In this paper the use of colour features is omitted in order to concentrate on shape as a feature and explore the combination of rotation invariance and HMMs for image retrieval tasks. The use of the methods presented here, together with colour features is also possible.
In [6] Santini and Jain state that image databases should rely on similarity searches, which means that all the images should be ordered with respect to the similarity to the query. This ordering of the database elements according to a measure can elegantly be implemented by using the HMM framework and the likelihood scores of the Markov Models after observation sequences have been shown to the model. Prior to explaining the invariant modeling of the images in our database, a short introduction to HMMs is given in the following section.

Figure 1: Continuous linear Hidden Markov Model
Hidden Markov Models are finite stochastic automata which have been successfully applied to continuous speech [7] and online handwriting recognition [8] but recently also to new applications such as gesture recognition [9] and video indexing [10].
Fig. 1 shows a continuous three state HMM with transition probabilities
and output probability density functions (pdfs)
, where
denotes the actual state at time t,
is a distinct state and
denotes an observation vector.
The pdf
of state
is usually given by a finite Gaussian mixture of the form

where
is the mixture coefficient for the mth mixture and
is a multivariate Gaussian density with mean vector
and covariance matrix
.
A HMM
with N states is fully
described by the N
N-dimensional transition matrix
, the N-dimensional output pdf vector
and the initial state distribution vector
which consists of the
probabilities
. After the model
has been trained using the Baum-Welch-Algorithm, feature sequences
can be scored according to

Usually the likelihood
is estimated by the Viterbi
algorithm, which is an approximation based on the most likely state sequence
. For recognition tasks,
is
used to classify an unknown pattern to class
which satisfies Eq. 3, whereas we propose to use this likelihood as a score for ordering the elements of our database according to the similarity with a sketched query image.

In order to obtain rotation invariant Markov Models, we suggest a modification of the HMM topology and a special feature extraction method, both presented in detail in the next section.
As mentioned in Sec. 2.1 we have to build a feature sequence
from each image when using Hidden Markov Models.
This sequence is calculated by applying the following steps to each image in the database (see also
Fig. 2). After thresholding the greyvalues (a step discussed in
detail in Sec. 3), a lowpass filter is applied for avoiding alias
components in the image which is subsampled in the final step.

Figure 2: Preprocessing and feature extraction steps
This subsampling
is performed in an adaptive way which results in scale invariance and which
leads also to rotation invariance together with the modified HMMs. Samples
are taken on polar coordinates
where the origin is placed at the so
called centre of gravity (COG), calculated on the thresholded
image I(x,y) as follows:

Fig. 3 shows the subsampling raster, which can be expressed by
![]()
where
denotes the subsampled image and
and
the sampling intervals of the radius and the angle, respectively.
The interval
is determined by dividing the distance from the COG to the maximum radius point of the object in the thresholded image by a fixed number (the dimension of the feature vector) whereas
.
After applying this step, feature vectors are formed by those samples in Eq. 5, where
and the sequence
is built by arranging these vectors such that m increases
.

Figure 3: The subsampling raster
The feature sequences calculated on each image of the database are used to
train individual HMMs with linear architectures (see also Fig. 1) by
the Baum-Welch reestimation formula. However, these HMMs do not represent the
feature sequence in a rotation invariant way and thus the models are modified
as follows. Two copies of the initially trained HMM are attached in front of
and behind the original model, with the first HMM (Filler Model 1)
having a modified initial state distribution vector
and the final
one (Filler Model 2) being modified so as to allow the HMM to be in any
state, once the end of the feature sequence is reached.
This concatenation of the HMMs is illustrated in Fig. 4.
If the feature sequence of an object being rotated with respect to the
(unrotated) object used for training the original HMM, is presented twice to the concatenated HMMs, the original model will be aligned to the unrotated part of the sequence by the Viterbi algorithm.
This alignment leads to a high probability
if the two objects have a similar shape.
The method presented here has been animated by techniques applied to word-spotting tasks in speech recognition
(see e.g. [11]).

Figure 4: Concatenated HMMs for rotation invariant modeling
Once the HMM structure shown in Fig. 4 has been trained
and built for every image in the database,
sketches can be presented to the system and
can be used in
order to score every database element according to the similarity to the
query image. If one is only interested in e.g. the five most similar images,
pruning techniques can be used to speed up the retrieval task very efficiently,
due to the use of the HMM framework.
Our database consists of 120 greyscale images of isolated objects (mostly hand tools) with arbitrary orientation. For every image the preprocessing and feature extraction steps presented in Fig. 2 have been applied, followed by the training of an individual HMM as discussed in Sec. 2.2.
The HMMs used in our experiments consist of 30 states and the feature extraction
is carried out by taking five samples at every
(see also Eq. 5).
Sketches can be presented to our retrieval system using our Java-Applet below.
The feature extraction of the sketches is performed using basically the same steps as shown in Fig. 2, apart from the thresholding step.
Thereafter, the feature sequence is scored by the HMMs and the corresponding
images can be sorted according to
.
Fig. 6 presents some results achieved with our system,
where in every row the query sketch is shown first, followed by those three
images with the highest similarity score.
It can be seen in Fig. 6, that the system performs very well and
that images can be retrieved by presenting simple sketches.
Note that the user
of our system does not need to know the rotation angle of the desired object.

Figure 6: Query sketches and retrieved images