Image Database Retrieval of Rotated Objects by User Sketch (Demo)


This demo shows our image retrieval system, which enables the user to search a greyscale image database intuitively by presenting simple sketches. The database contains 120 different isolated objects (mostly hand tools) with arbitrary orientation. Each of these images is represented by a Hidden Markov Model (HMM) which has been modified in order to obtain rotation and scale invariance. Thus, no labeling of the content or the rotation angle of the object is needed when adding new images to the database. This is particularly important when using query by sketch due to the skew which occurs naturally in human handwriting and in drawings. The retrieved images can be ranked according to the similarity with the query sketch using the output probabilities of the HMMs. Furthermore, the use of HMMs allows efficient pruning and thus a fast retrieval even with large databases. Experiments with our demonstration system showed good retrieval results with several users.

Introduction

Traditionally, image databases have been accessed by textual queries, which requires the indexing of the entire database content. This is a time consuming procedure and may lead to inaccurate retrieval results if the interpretation of the image differs between the person(s) who did the indexing and the user of the database who issues the query. Apart from these disadvantages, the use of textual queries is not intuitive for humans and many users demand man-machine interfaces such as pens or touch screens rather than keyboards. In order to overcome these limitations of textual queries, several systems which use either example images or visual properties to retrieve pictorial information have been presented [1, 2, 3, 4]. In [4] Smith and Chang describe an image retrieval system applicable to the World Wide Web (WWW). Pictorial queries to the constantly growing WWW are a challenging task due to the need for fast and easy to expand algorithms. However, this system uses the histogram of an example image in order to retrieve similar pictorial information, which has the disadvantage, that the query is only based on global features and topological features like shape, etc. are omitted. Query by shape or sketch is used in [1, 2, 5], but important issues such as rotation or scale invariance have not been addressed yet. Rotation invariance is particularly important when dealing with user sketches, due to the skew which occurs naturally in human handwriting. Del Bimbo and Pala [5] present an elastic matching algorithm, which is able to deal with small rotations of the order of 12-15 degrees. The ability to cope with these small rotations should be sufficient in order to deal with the skew of the user sketch, however if the database consists of arbitrary rotated objects, 360 degree rotation invariance is needed. Our database consists mainly of hand tools and thus rotation invariance prevents either a time consuming labeling of the angle of each image or the user from being restricted to draw the tools at a fixed rotation angle. As in [3] we utilize Hidden Markov Models to build up our query by sketch image retrieval system, but after an initial training of the HMMs, we modify our models in order to obtain rotation invariance. Similar modifications might be difficult to integrate into the system presented by Lin et al. [3] due to the use of so called pseudo-2D HMMs (P2DHMMs) which try to model two dimensional shapes but fail to model the dependency of consecutive columns if the rows are presented to the superstates of the P2DHMMs. However, in [3] the use of colour features for the training of the P2DHMMs is described, what we think is a promising approach due to the possibility to express intra colour dependencies easily using the covariance matrixes of the Markov Model framework. In this paper the use of colour features is omitted in order to concentrate on shape as a feature and explore the combination of rotation invariance and HMMs for image retrieval tasks. The use of the methods presented here, together with colour features is also possible.

Representing Images by Hidden Markov Models

In [6] Santini and Jain state that image databases should rely on similarity searches, which means that all the images should be ordered with respect to the similarity to the query. This ordering of the database elements according to a measure can elegantly be implemented by using the HMM framework and the likelihood scores of the Markov Models after observation sequences have been shown to the model. Prior to explaining the invariant modeling of the images in our database, a short introduction to HMMs is given in the following section.

 

Introduction to HMMs

  HMM
Figure 1: Continuous linear Hidden Markov Model

Hidden Markov Models are finite stochastic automata which have been successfully applied to continuous speech [7] and online handwriting recognition [8] but recently also to new applications such as gesture recognition [9] and video indexing [10]. Fig. 1 shows a continuous three state HMM with transition probabilities tex2html_wrap_inline402 and output probability density functions (pdfs) tex2html_wrap_inline404, where tex2html_wrap_inline406 denotes the actual state at time t, tex2html_wrap_inline410 is a distinct state and tex2html_wrap_inline412 denotes an observation vector. The pdf tex2html_wrap_inline404 of state tex2html_wrap_inline410 is usually given by a finite Gaussian mixture of the form
equation41
where tex2html_wrap_inline418 is the mixture coefficient for the mth mixture and tex2html_wrap_inline422 is a multivariate Gaussian density with mean vector tex2html_wrap_inline424 and covariance matrix tex2html_wrap_inline426. A HMM tex2html_wrap_inline428 with N states is fully described by the Ntex2html_wrap_inline430N-dimensional transition matrix tex2html_wrap_inline432, the N-dimensional output pdf vector tex2html_wrap_inline434 and the initial state distribution vector tex2html_wrap_inline436 which consists of the probabilities tex2html_wrap_inline438. After the model tex2html_wrap_inline440 has been trained using the Baum-Welch-Algorithm, feature sequences tex2html_wrap_inline442 can be scored according to
equation79
Usually the likelihood tex2html_wrap_inline444 is estimated by the Viterbi algorithm, which is an approximation based on the most likely state sequence tex2html_wrap_inline446. For recognition tasks, tex2html_wrap_inline444 is used to classify an unknown pattern to class tex2html_wrap_inline450 which satisfies Eq. 3, whereas we propose to use this likelihood as a score for ordering the elements of our database according to the similarity with a sketched query image.
 equation97
In order to obtain rotation invariant Markov Models, we suggest a modification of the HMM topology and a special feature extraction method, both presented in detail in the next section.

 

Rotation Invariant Hidden Markov Modeling

As mentioned in Sec. 2.1 we have to build a feature sequence tex2html_wrap_inline454 from each image when using Hidden Markov Models. This sequence is calculated by applying the following steps to each image in the database (see also Fig. 2). After thresholding the greyvalues (a step discussed in detail in Sec. 3), a lowpass filter is applied for avoiding alias components in the image which is subsampled in the final step.

  figure114
Figure 2: Preprocessing and feature extraction steps

This subsampling is performed in an adaptive way which results in scale invariance and which leads also to rotation invariance together with the modified HMMs. Samples are taken on polar coordinates tex2html_wrap_inline456 where the origin is placed at the so called centre of gravity (COG), calculated on the thresholded image I(x,y) as follows:
equation119
Fig. 3 shows the subsampling raster, which can be expressed by
 equation130
where tex2html_wrap_inline460 denotes the subsampled image and tex2html_wrap_inline462 and tex2html_wrap_inline464 the sampling intervals of the radius and the angle, respectively. The interval tex2html_wrap_inline462 is determined by dividing the distance from the COG to the maximum radius point of the object in the thresholded image by a fixed number (the dimension of the feature vector) whereas tex2html_wrap_inline468.

After applying this step, feature vectors are formed by those samples in Eq. 5, where tex2html_wrap_inline470 and the sequence tex2html_wrap_inline454 is built by arranging these vectors such that m increases tex2html_wrap_inline476.

  figure141
Figure 3: The subsampling raster

The feature sequences calculated on each image of the database are used to train individual HMMs with linear architectures (see also Fig. 1) by the Baum-Welch reestimation formula. However, these HMMs do not represent the feature sequence in a rotation invariant way and thus the models are modified as follows. Two copies of the initially trained HMM are attached in front of and behind the original model, with the first HMM (Filler Model 1) having a modified initial state distribution vector tex2html_wrap_inline436 and the final one (Filler Model 2) being modified so as to allow the HMM to be in any state, once the end of the feature sequence is reached. This concatenation of the HMMs is illustrated in Fig. 4. If the feature sequence of an object being rotated with respect to the (unrotated) object used for training the original HMM, is presented twice to the concatenated HMMs, the original model will be aligned to the unrotated part of the sequence by the Viterbi algorithm. This alignment leads to a high probability tex2html_wrap_inline444 if the two objects have a similar shape. The method presented here has been animated by techniques applied to word-spotting tasks in speech recognition (see e.g. [11]).

  figure155
Figure 4: Concatenated HMMs for rotation invariant modeling

Once the HMM structure shown in Fig. 4 has been trained and built for every image in the database, sketches can be presented to the system and tex2html_wrap_inline444 can be used in order to score every database element according to the similarity to the query image. If one is only interested in e.g. the five most similar images, pruning techniques can be used to speed up the retrieval task very efficiently, due to the use of the HMM framework.

Experiments

Our database consists of 120 greyscale images of isolated objects (mostly hand tools) with arbitrary orientation. For every image the preprocessing and feature extraction steps presented in Fig. 2 have been applied, followed by the training of an individual HMM as discussed in Sec. 2.2. The HMMs used in our experiments consist of 30 states and the feature extraction is carried out by taking five samples at every tex2html_wrap_inline484 (see also Eq. 5).

Sketches can be presented to our retrieval system using our Java-Applet below. The feature extraction of the sketches is performed using basically the same steps as shown in Fig. 2, apart from the thresholding step. Thereafter, the feature sequence is scored by the HMMs and the corresponding images can be sorted according to tex2html_wrap_inline444. Fig. 6 presents some results achieved with our system, where in every row the query sketch is shown first, followed by those three images with the highest similarity score. It can be seen in Fig. 6, that the system performs very well and that images can be retrieved by presenting simple sketches. Note that the user of our system does not need to know the rotation angle of the desired object.

  figure181
Figure 6: Query sketches and retrieved images

References

1
M. Flickner et al., ``Query by Image and Video Content: The QBIC System'', IEEE Computer, Sep. 1995, pp. 23-32.

2
A. Pentland, R. W. Picard and S. Sclaroff, ``Photobook: Content-Based Manipulation of Image Databases'', Proc. SPIE Storage and Retrieval Image and Video Databases II, Feb 1994.

3
H.-C. Lin, L.-L. Wang and S.-N. Yang, ``Color Image Retrieval Based on Hidden Markov Models'', Proc. IEEE-ICIP, Vol. 1, 1995, pp. 342-345.

4
J. R. Smith and S.-F. Chang, ``An Image and Video Search Engine for the World-Wide Web'', Symposium on Electronic Imaging: Science and Technology - Storage & Retrieval for Image and Video Databases V, Feb. 1997.

5
A. Del Bimbo and P. Pala, ``Visual Image Retrieval by Elastic Matching of User Sketches'', IEEE Transactions on PAMI, Vol. 19, Feb. 1997, pp. 121-132.

6
S. Santini and R. Jain, ``Similarity Queries in Image Databases'', Proc. CVPR, June 1996.

7
L.R. Rabiner, ``A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition'', Proc. IEEE, Vol. 77, No. 2, Feb. 1989, pp. 257-286.

8
J. Subrahmonia, K. Nathan and M. P. Perrone, ``Writer Dependent Recognition of On-Line Unconstrained Handwriting'', Proc. IEEE ICASSP, Vol. 6, 1996, pp. 3478-3481.

9
G. Rigoll and A. Kosmala, ``New Improved Feature Extraction Methods for Realtime High Performance Image Sequence Recognition'', Proc. IEEE-ICASSP, 1997, pp. 2901-2904.

10
S. Eickeler, A. Kosmala and G. Rigoll, ``A New Approach to Content-Based Video Indexing Using Hidden Markov Models'', Proc. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), June 1997, pp. 149-154.

11
R. C. Rose and D. B. Paul, ``A Hidden Markov Model Based Keyword Recognition System'', Proc. IEEE-ICASSP, Vol. 2, 1990, pp. 129-132.