2006 International Conference on Image Processing

2006 International Conference on Image Processing held at Atlanta, Georgia from last Sunday (10/08/2006) to today (10/11/2006). Thanks to Dr. Tsai, I could have participated in this conference as a guest and it was very enjoyfull experience. ICIP is a pretty big conference. This year it holds 834 technical papers and posters.

I took many lessons and talked with a number of authors in poster sessions in person. My short reviews, which will follow hereinafter, will cover specific research areas of our interests. Before that, though I am writing this to share my experiences with you all, I should mention that the size of my memory is strictly limited. In fact mine is never expandable but only shrinkable, and even worse it is very volatile. So I beg your generosity on my problem.

Let me start this review in chronicle order. You may find the full program list here. The materials and conference CD will be available for copying at this week group meeting.


Seven tutorials were held on last Sunday. I chose first Digital Color Management: Encoding Solutions and one more Image Processing Techniques in Computer-Aided Detection and Diagnosis based on Dr. Tsai’s recommendation.

(TUT-2) Image Processing Techniques in Computer-Aided Detection and Diagnosis (CAD)

Dr. Gurcan is currently the research assistant professor in the Biomedical Informatics Department of Ohio State University. He looks rather young to be the tutorial teacher but his presentation went smoothly with lots of materials based on his works done for a long time. So it was nice exposure for me to this before-never-interested area. His lecture went in two parts: introduction on CAD and the image processing technologies used for CAD.

A number of data types are used in their application: X-ray (chest, breast, ..), CT scan, MRI, etc (let us call these data body scan in this review). Basically, the objective of CAD is to help radiologists who read those body scans to detect abnormalities like tumors (mostly malignant ones), clogged vessels, etc. The implemented system is expected to show very high True-Positive and False-Positive recognition ratio. A number of image processing technologies are actively used here like texture and boundary recognition for the region segmentation to isolate suspicious areas in the body scan data. So the region segmentation using edge connection, contour correction, classification and clustering are dominating and most challenging areas in this application.

At a glance, heuristic approaches are still overwhelming in many places. However, considering that this area is relatively new compared to the long history of the general computer vision and so I expect more and more computer vision technologies will flow into this field. In short, there are plenty rooms to improve for this medical image processing domain. Please see the tutorial textbook for details. The difficulty that have kept bothering me during the lesson was that our research area, specifically my case that handles the natural scene image captured in the street, has very different domain environments that requires the content analysis in a very different context.

His lecture on the second part was based on many new slides missed in the given tutorial copy. So I asked the copy of his slide and he promised to send me one. When I get the slide, I will let you know.

(TUT-4) Digital Color Management: Encoding Solutions

The author of the book entitled same as the tutorial title, Thomas E. Madden of Eastman Kodak Company gave the tutorial in, say, professional way that deserves the money. He clearly showed why colors look so different to humans by many examples. His presentation claimed that problems that I am challenging for the TSPR project are very common and in fact, unavoidable in our application domain environment. Though his presentation did not lead me to the solution, since it is not easy at all, but made my eye broad to look for more factors to improve, and most promisingly, he showed interests in our project and asked me to send him an email. I will talk later as soon as I got his reply. This tutorial is basically the summary of his book with a lot of contents – 100 pages! Sadly, this tutorial material is printed in black and white and he could not share his presentation material with us due to the copyright – it is already published. So it will be great if Dr. Tsai can order his book for us – it costs $199, yes, it is the COLOR book.

Anyway, his specific talk is more biased to the problem on delivering the color in the way user wanted from the source to the target. The environmental conditions of capturing and viewing images are rather constricted like at least to be the room environment with a limited number of light sources. So his method is rather inapplicable to our cases. However, he taught us when we capture the image using a digital camera, reflexive scanner, or slide, then how to process its captured color values to print those out onto some screen, paper, or projector. Obviously various kinds of factors that affect the appearance of colors are involved in aforementioned process. He argues why these factors, some of which I have not been aware, including brightness, luminance, flare, and human eye adaptation significantly changed the actual color value that people think in their mind it should be.

Additionally, he introduced the new Appearance-based color encoding method under development in a collaboration with many other large industrial companies. He argued that any existing color spaces which can be derived from RGB color space, and vice versa, can not predict the color appearance due to lack of the actual working environment factors like the room brightness, light source, and other conditions.


I tried to take lessons and see poster presentations as many as possible but with a single body there is clearly limitation. Topics I took were chosen considering their relational probability with our research approaches. PDF files are attached at the end of each review. Or you may get the full copy in 1 CD from me. My reviews are limited to my understanding on their works and may be too much limited for you to understand. So please download and open the PDF file of your interests. Some PDF files may include my personal note embedded into it. I put those notes while I was listening their lecture. Just ignore it when you print out the PDF file. Belows are the list of lessons and poster presentation I took and within those, selected articles are reviewed.

  • (MA-L5) Content Summarization and Clustering
  • (MP-L1) Visual Tracking
  • (TA-L1) Image Segmentation
  • (TP-p9) Stereoscopic and 3-D Processing
  • (WA-L6) Knowledge-Based Image Processing for Classification and Recognition in Surveillance Applications
  • (WP-L4) Semantic Indexing and Retrieval of Images
  • (WP-P8) Image/Video Processing Applications: Models and Methods

Selected Presentations of Interests

Model-free, Statistical Detection and Tracking of Moving Objects

  • Authors: Mark Ross; University of Koblenz
  • Abstract: A novel statistical approach for detection and tracking of objects is presented here, which uses both edge and color information in a particle filter. The approach does not need any prior models of the objects of interest or of the scene. It starts with homogeneous regions as tracking primitives and creates complex objects by merging similar moving regions. Even partially occluded objects in a sequence captured by a moving camera can be tracked efficiently and robust.
  • Review: Their work is on tracking moving objects using initially generated particles. So it is tracking particles at each video frame to find the best match of each particle from the previous frame, then compute their movement to segment the moving object region from the background. Their video demo shows that their algorithm works well even in case when multiple moving objects are occluded by other objects. So it looks interesting for our approach to track the traffic signs in the video. Though our video might be taken from the moving car. So the big difference is that everything is moving in our case – no background! We may put some particles on the specific colors of our interests after color decomposition and do track those particles from the video to extract and estimate their boundary shape to recognize the sign plate type. I remember that OpenCV provides the example on this algorithm but basic, not sophisticated, part of this algorithm – search camshiftdemo under their sample directory.
  • File: [Media:0000557.pdf]*/0000557.pdf “wikilink”)

Recognition of Multi-Object Events using Attribute Grammars

  • Authors: Seong-Wook Joo; University of Maryland, Rama Chellappa; University of Maryland
  • Abstract: We present a method of representing and recognizing visual events using attribute grammars. In contrast to conventional grammars, attribute grammars are capable of describing features that are not easily represented by finite symbols. Our approach handles multiple concurrent events involving multiple entities by associating unique object identification labels with multiple event threads. Probabilistic parsing and probabilistic conditions on the attributes are used to achieve a robust recognition system. We demonstrate the effectiveness of our method for the task of recognizing vehicle casing in parking lots and events occurring in an airport tarmac.
  • Review: Think the traffic sign recognition as the sequence of causal event detection. For instance, imagine seeing a traffic video and trying to find the traffic sign in video approaching to our camera. You will first see their boundary and later will identify its inside contents as it gets looking bigger. This paper presents the object recognition using the temporal relationships between primitive events like stop, start, disappear, isPerson, etc. They provides the typed grammar called Attribute Grammars so that a user may specify the event sequence of their interests as like computer programming. As a matter of fact, their implementation is just like the state model with a probability at each state transition but their expression power using Attribute Grammars makes their work looking more semantical approach. I was also thinking that we need some kind of sign object description method to represent the relationships between inside objects in the traffic sign. So it was interesting to me.
  • File: [Media:0002897.pdf]*/0002897.pdf “wikilink”)

Generation of Long-Term Color and Motion Coherent Partitions

  • Authors: Camilo Dorea, Montse Pardas, Ferran Marques
  • Abstract: This paper describes a technique for generating partition sequences of regions presenting long-term homogeneity in color and motion coherency in terms of affine models. The technique is based on region merging schemes compatible with hierarchical representation frameworks and can be divided into two stages: Partition Tracking and Partition Sequence Analysis. Partition Tracking is a recursive algorithm whereby regions are constructed according to short-term spatio-temporal features, namely color and motion. Partition Sequence Analysis proposes the Trajectory Adjacency Graph (TAG) to exploit the long-term connectivity relations of tracked regions. A novel trajectory merging strategy using color homogeneity criteria over multiple frames is introduced. Algorithm performance is assessed and comparisons to other proposals are drawn by means of established evaluation metrics [1].
  • Review: This is another region segmentation method for videos. It shows the fundamental procedures to classify and segment regions from the video robustly even if there exists camera panning or moving. Their video demonstration showed that their algorithms react to the camera movement adaptively very well. Technically, they use the color values to segment the region and their procedure is composed sequentially from motion compensation, marker fitting, region growing & creation and motion-based splitting.
  • File: [Media:0000581.pdf]*/0000581.pdf “wikilink”)

Multispectral Object Segmentation and Retrieval in Surveillance Video

  • Authors: Ciarán Ó Conaire, Noel O’Connor, Eddie Cooke, Alan Smeaton
  • Abstract: This paper describes a system for object segmentation and feature extraction for surveillance video. Segmentation is performed by a dynamic vision system that fuses information from thermal infrared video with standard CCTV video in order to detect and track objects. Separate background modelling in each modality and dynamic mutual information based threshold are used to provide initial foreground candidates for tracking. The belief in the validity of these candidates is ascertained using knowledge of foreground pixels and temporal linking of candidates. The Transferable Belief Model is used to combine these sources of information and segment objects. Extracted objects are subsequently tracked using adaptive thermo-visual appearance models. In order to facilitate search and classification of objects in large archives, retrieval features from both modalities are extracted for tracked objects. Overall system performance is demonstrated in a simple retrieval scenario.
  • Review: This paper presents the method merging two different types of sensors, infrared and normal CCTV videos, for the video surveillance application. One thing that took my interests is the Transferable Belief Model (TBM) that I heard first time. It is described that the Transferable Belief Model (TBM) is a framework for expressing degrees of belief and allowing evidence from various sources to be combined. It also provides a powerful framework for uncertainty (or doubt) to be expressed. Given some evidence, it allows the construction of a basic belief assignment function m : Ω → [0, 1], allocating a belief mass to each possibility, such that sum(m(A)) = 1, when the summation is over all possibilities. I will take a look into TBM later.
  • File: [Media:0002381.pdf]*/0002381.pdf “wikilink”)

Knowledge-based supervised learning methods in a classical problem of video object tracking

  • Authors: Lionel Carminati, Jenny Benois-Pineau, Christian Jennewein
  • Abstract: In this paper we present a new scheme for detection and tracking of specific objects in a knowledge-based framework. The scheme uses a supervised learning method: Support Vector Machines. Both problems, detection and tracking, are solved by a common approach: objects are located in video sequences by a SVM classifier. They are then tracked along the time by a SVM tracker with complete 6 parameters affine model. The method is applied in a video surveillance environment for detection and tracking of frontal view faces. Real time application constraints are met by reduction of support vector set.
  • Review: During this conference, I found many presentations using Support Vector Machines (SVM) for the classification. This article is the extreme case that use SVM at every step of the object recognition from the face detection, tracking, and set reduction for the recognition. I will look into their work detail later to see whether SVM is such a omnipotential solution or that is because they were lazy to employ other methods. Besides, they used Affine Transformation to estimate the size and orientation of moving object. This is very relevant for our approach to model the distortion of the sign boundary by the changes of the viewer orientation.
  • File: [Media:0002385.pdf]*/0002385.pdf “wikilink”)

An adaptive mixture color model for robust visual tracking

  • Authors: Antoine Lehuger, Patrick Lechat, Patrick Pérez
  • Abstract: Global color characterization is a very powerful tool to model in a simple yet discriminant way the visual appearance of complex objects. A fixed reference model of this type can be used within both deterministic and probabilistic sequential estimation frameworks to track robustly targets that undergo drastic changes of shape and detailed appearance. However, changes of illumination as well as occlusions require that reference model is updated while avoiding drift. Within the particle filtering framework, we propose to address this adaptation problem using a dynamic mixture of color models with two components which are respectively fixed and rapidly updated. The merit of this approach is demonstrated on the problem of player tracking in team sport videos.
  • Review: This was very fascinating demonstration on tracking objects in some illumination changing environment. As well known, the object tracking in such environment is very difficult. For instance, if an object of our interests enters the shade, conventional color or texture-based tracking methods easily fail in tracking those objects. The idea of this work is simply creating the bigger bounding box overlapped on the object to compute the illumination changes in real time to track the object adaptively.
  • File: [Media:0000573.pdf]*/0000573.pdf “wikilink”)

Perceptual Feature Selection for Semantic Image Classification

  • Authors: Dejan Depalov, Thrasyvoulos N. Pappas, Dongge Li, Bhavan Gandhi
  • Abstract: Content-based image retrieval has become an indispensable tool for managing the rapidly growing collections of digital images. The goal is to organize the contents semantically, according to meaningful categories. In recent papers we introduced a new approach for semantic image classification that relies on the adaptive perceptual color-texture segmentation algorithm proposed by Chen et al. This algorithm combines knowledge of human perception and signal characteristics to segment natural scenes into perceptually uniform regions. The resulting segments can be classified into semantic categories using region-wide features as medium level descriptors. Such descriptors are the key to bridging the gap between low-level image primitives and high-level image semantics. The segment classification is based on linear discriminant analysis techniques. In this paper, we examine the classification performance (precision and recall rates) when different sets of region-wide features are used. These include different color composition features, spatial texture, and segment location. We demonstrate the effectiveness of the proposed techniques on a database that includes 9000 segments from approximately 2500 photographs of natural scenes.
  • Review: I want to give 5 stars on this paper considering its relevance with our works though the actual usability of their technology is another matter. The basic assumption under their approach for the image segmentation is that a human eye can not perceive many different colors at one time. So they reduced the number of colors by grouping the color values in the descending frequency order and adpatively select a few top colors, which they call adaptive dominant colors, for the segmentation considering the blob texture either. Classification and categorization are done by the K-means algorithm and for the training and classification they used the Linear Discriminant method (LDA) [14]. LDA is the method that belongs to the class of linear classifiers, which try to find a projection to a lower dimensional space such that samples from the different classes are well separated. They are at the early stage of this approach. Currently, only one segmented blob is considered to be an object, which means any connected two or more color objects is considered n objects. One object or blob can have currently only one color and one texture, I thought if they extend their works by analyzing the spatial relationships between those recognized blobs, then it will make a more sense to name their work semantic image classification. Anyhow, many people in the room agreed that reducing the number of colors – that is using the dominant set of colors – is a reasonable and practical approach generally showing the good performance in image segmentation. Technically, we may look at the adaptive dominant color method that is referred in their paper and plus, how to merge those recomposed colors with texture analysis results to create the segmented blobs.
  • File: [Media:0002921.pdf]*/0002921.pdf “wikilink”)

A real-time adaptive thresholding for Video change detection

  • Authors: Chang Su, Aishy Amer
  • Abstract: A real-time adaptive thresholding algorithm for change detection is proposed in this paper. Based on the estimation of the scatter of regions of change in a difference image, a threshold of each image block is computed discriminatively, then the global threshold is obtained by averaging all the thresholds for image blocks. The block threshold is calculated differently for regions of change and background. Experimental results show the proposed thresholding algorithm performs well for change detection with high efficiency.
  • Review: Good idea applying different thresholds to each subregion adaptively in real time. Cons are that their method is for the grey images. Jaccard similarity coefficient (JC) was introduced to take a change of the ground truth data. This method is also introduced in the CAD tutorial to compute the index of their recognition performance using True-Positive, True-Negative, False-Positive, and False-Negative classification.
  • File: [Media:0000157.pdf]*/0000157.pdf “wikilink”)

Summarization and Indexing of Human Activity Sequences

  • Authors: B. Song, N. Vaswani, Amit Roy-Chowdhury
  • Abstract: In order to summarize a video consisting of a sequence of different activities, there are three fundamental problems: tracking the objects of interest, detecting the activity change times and recognizing the new activity. This paper presents an algorithm for achieving all these three tasks simultaneously and presents results on how it can used for indexing and summarizing a real-life video sequence. Human activities are represented by a model for the dynamics of the shape of the human body contour. Measures are designed for detecting both gradual transitions and sudden changes between activity models.
  • Review: One thing popped up into my mind during this presentation is whether we may compare the boundary shape of traffic signs not based on their pixel locations and their spatial relationships but based on the difference bewteen their tangent vectors on the boundary. This will make the shape recognition robust to image orientation changes like rotation or scaling problems. Just an idea.
  • File: [Media:0002925.pdf]*/0002925.pdf “wikilink”)

Selected Posters of Interests

Developing an Efficient Region Growing Engine for Image Segmentation

  • Authors: Emanuel Gofman
  • Abstract: Image segmentation is a crucial part of image processing applications. Currently available approaches require significant computer power to handle large images. We present an efficient region growing algorithm for the segmentation of multi-spectral images in which the complexity of the most time-consuming operation in region growing, merging segment neighborhoods, is significantly reduced. In addition, considerable improvement is achieved by preprocessing, where adjacent pixels with close colors are gathered and used as initial segments. The preprocessing provides substantial memory savings and performance gain without a noticeable influence on segmentation results. In practice, there is an almost linear dependency between the runtime and image size. Experiments show that large satellite images can be processed using the new algorithm in a few minutes on a moderate desktop computer.
  • Review: Interesting approach to drop down the computing cost of image segmentation by converting a 2D image into a 1D vector and after growing regions separately.
  • File: [Media:0002413.pdf]*/0002413.pdf “wikilink”)

Automation of Pavement Surface Crack Detection Using the Continuous Wavelet Transform

  • Authors: Peggy Subirats, Jean Dumoulin, Vincent Legeay, Dominique Barba,
  • Abstract: This paper presents a new approach in automation for crack detection on pavement surface images. The method is based on the continuous wavelet transform. In the first step, a separable 2D continuous wavelet transform for several scales is performed. Complex coefficient maps are built. The angle and modulus information are used to keep significant coefficients. Then, wavelet coefficients maximal values are searched and their propagation through scales is analyzed. Finally, a post-processing gives a binary image which indicates the presence or not of cracks on the pavement surface image.
  • Review: I had a long talk with the author since I heard about this topic from Dr. Tsai many times. Though I am not familiar with the details on this application, but their techniques used to detect the cracks on the road make a good sense to me. First at their poster, the special truck to picture photos on the road was shown. It has a row of lights and cameras to take a picture of ground. The truck runs on average in between 30~80 km/hour. It was pretty fast and in fact it has to drive fast not to cause accidents. The actual capture of road image looks pretty bad for me to differentiate the road crack from the road texture. The key idea of their approach is finding the connected line segment that is longer than the threshold. To extract these connected lines, meaning cracks, they created complex coefficient maps from 2D continuous wavelet transform and apply the window in a multiple scale from the coarse to detail level to find the connected line segment of interests with a desired length. Since I am not that familiar with this area, it was difficult to catch up their work at a glance but their works look pretty reasonable. Ibrahima may want to look at this paper. I put the author’s contact at the end of this review.
  • File: [Media:0003037.pdf]*/0003037.pdf “wikilink”)

Determination of Color Space for Accurate Change Detection

  • Authors: Youngbae Hwang, Jun-Sik Kim, In So Kweon
  • Abstract: Most change detection methods are based on gray-level images. A gray-level image is regarded as a 1-D projection of three channels of color images. Therefore, more precise change detection results are expected by utilizing color information. We previously developed a change detection scheme using color images. In this paper, we determine which color space should be selected for accurate change detection based on our previous detection scheme. Our method can be applied to various color spaces, including gray-level images. Then we can measure the expected number of error pixels in order to select an appropriate color space which gives the best result among various color spaces. The experiments show that selecting a color space based on measurements results in the fewest error pixels.
  • Review: Do you remember that I once showed you the picture of various color spaces whether we may use a different color space rather than RGB to compute the color distance? This paper is exactly for that. It detects moving objects from a different color space adaptively. Two things are given in their work: Generalized Exponential Model (GEM) for binarization and Expected Number of False Pixels (ENFP) to predict the color space that will work best at a given time.
  • File: [Media:0003021.pdf]*/0003021.pdf “wikilink”)

Missed presentations or posters but may deserve to look into

Generative Models for License Plate Recognition By Using a Limited Number of Training Samples

  • Authors: Alessandro Mecocci, Capasso Tommaso
  • Abstract: Increased mobility and internationalization open new challenges to develop effective traffic monitoring and control systems. This is true for automatic license plate recognition architectures that, nowadays, must handle plates from different countries with different character sets and syntax. While much emphasis has been put on the license plate localization and segmentation, little attention has been devoted to the huge amount of samples that are needed to train the character recognition algorithms. Nevertheless, these samples are difficult to get when dealing with an international-wide scenario that involves many different countries and the related legislations. This paper reports a new algorithm for License Plate recognition, developed under a joint research funded by Autostrade per l’Italia S.p.A., the main Italian highways company. The research aimed at achieving improved recognition rates when dealing with vehicles coming from different European and nearby states. Extensive experimental tests have been performed on a database of about 7.000 images comprising License Plates picked up by portals spread nationally. The overall rate of correct classification is 98.1%
  • File: [Media:0002769.pdf]*/0002769.pdf “wikilink”)

A New Method for Boundary-Based Shape Matching and Retrieval

  • Authors: Minh-Son Dao, Raffaele De Amicis
  • Abstract: This paper presents a novel method for efficient boundary-based shapes matching and retrieval in presence of occlusion. In this method, the geometric and topological information of boundary curves are encoded in the form of longest common sub-curves (LCS) graphs and their similarity is estimated by graph matching. B-Spline is used for approximating the original boundary, then inflection points are detected to split such a B-spline to convex/concave segments. The characteristic string is constructed based on these segments’ canonical frame. After LCS candidates are found, their graphs which are constructed by using its segments as vertices and the weighted walkthrough (WW) between two segments as edges are compared to obtain the optimal match. Thorough experimental results and comparisons demonstrate that our method outperforms traditional LCS- or dynamic programming-based methods in shape matching and enhances the quality of inexact shape retrieval, in particular in the presence of occlusion and affine transformation.
  • File: [Media:0001485.pdf]*/0001485.pdf “wikilink”)

Attention-based vanishing point detection

  • Authors: Fred Stentiford
  • Abstract: Perspective is a fundamental structure that is found to some extent in most images that reflect 3D structure. It is thought to be an important factor in the human visual system for obtaining understanding and extracting semantics from visual material. This paper describes a method of detecting vanishing points in images that does not require prior assumptions about the image being analysed. It enables 3D information to be inferred from 2D images. The approach is derived from earlier work on visual attention that identifies salient regions and translational symmetries.
  • File: [Media:0000417.pdf]*/0000417.pdf “wikilink”)

A Heuristic Binarization Algorithm for Documents with Complex Background

  • Authors: George Cavalcanti, Eduardo Silva, Cleber Zanchettin, Byron Bezerra, Rodrigo Dória, Juliano Rabelo
  • Abstract: This paper proposes a new method for binarization of digital documents. The proposed approach performs binarization by using a heuristic algorithm with two different thresholds and the combination of the thresholded images. The method is suitable for binarization of complex background document images. In experiments, it obtained better results than classical techniques in the binarization of real bank checks.
  • File: [Media:0000389.pdf]*/0000389.pdf “wikilink”)

Author contacts

From their business cards:

  • Thomas E. Madden: Principal Scientist, Consumer Digital Image Science Group, Eastman Kodak Company, Rochester, New York 14650. (585) 477-3379, thomas.madden@kodak.com
  • Emanuel Gofman, Research Staff, IBM Research Lab in Haifa, gofman@il.ibm.com, Office:+972-4-829-6237, Fax: +972-4-8296-6114, Mobile: +972-52-600-6366
  • Dominique Barba, Professor, Deputy director of IRCCyN, Rue Christian Pauc - La Chantreie - BP 50609 - 44306 NANTES CEDEX 3 - FRANCE. Tel: 33(0)2 40 68 30 22, Fax: 33 (0) 2 40 68 32 32, dominique.barba@polytech.univ-nantes.fr, http://www.polytech.univ-nantes.fr