Reference:Link to PDF:
dl.acm.org/ft_gateway.cfm?id=2396359&ftid=1311354&dwn=1&CFID=154130435&CFTOKEN=59695977Large-Scale Simultaneous Multi-Object Recognition and
Localization via Bottom Up Search-Based Approach, Chun-Che-Wu et al, National Taiwan University, 2012
[/b]
[li] multiple object recognition and localization over large-scale object classes
Proposal:[/li][li] propose a bottom up search-based approach, which localizes the grid-based search candidates in Markov Random Field (MRF)
Results: [/li][li]proposed approach enables simultaneously
recognizing and localizing multiple objects; therefore, it reduces
response time and ensures the accuracy as well
[/li][li] Experimental results show that the proposed method can have 40% relative
improvement over the state-of-the-art bag-of-words model
[/li][li] Propose a bottom-up framework to recognize multiple objects
[/li][li] Simultaneous recognition of multiple objects by MRF
Applies to what:[/li][li] Object recognition. Extracting multiple objects locally from a single image using databases. Can be used for sorting objects in images and removing noise or uninteresting objects.
Theory or Application:[/li][li] Both
What does it improve on:[/li][li] Bag of words model. Object extraction and classification, moves it from one object recognized per image to multiple.
[/li][li] Efficient subwindow search was the first proposal to solve multi-object recognition. It uses a top-down approach.
[/li][li] ESS Iteratively finds a better boundary until convergence.
[/li][li] Adaptive Window Search (AWS) was proposed. It works like this: First find the most possible object in the image and use spatial verification to locate the known object. Then, based on the found object, they build the template-based window for further search.
Pros:[/li][li] Grid based recognition and searching using Markov Random Fields
[/li][li] Can do multi-object queries with single images.
[/li][li] Combines approaches of past techniques (See AWS and ESS)
Pros of bottom-up multi-object recognition:
[/li][li] Suppress the noisy effects because the whole image contains a huge amount of noisy features.
[/li][li] With local information from each grid, we are able to recognize multiple objects concurrently.
Cons:Possible issues: How to do Multi-Object recognition: Large-scale grid-based search and scoring:
[/li][li] First, divide the queried image into grids.
[/li][li] Calculate similarity scores for each grid
[/li][li] Grid similarity scores of each candidate are aggregated to form the score distributions
[/li][li] All candidates would compete for ownership of each grid by MRF.
[/li][li] After optimization for the object partition, calculate final score of each candidate.
[/li][li] Check spatial consistency between query and dataset candidates.
Notes: For similarity score, uses intersection of normalized histogram on bag-of-words.
Notes: Adopts inverted file as indexing structure for efficient similarity measurement between query and large-scale object classes databases.
Multi-object recognition by MRF:
[/li][li] Let the candidates fight for the grids ownership with grid similarity scores
[/li][li] Formulize the score distributions of candidates as first order MRF.
[/li][li] Define the graph G = (V, e) where V is the set of all vertices and E is the set of all edges in G. In our case, a vertex v in V is a grid in query image and we build an edge between 2 vertices if their corresponding grids are 8 connected in the query.
[/li][li] Each candidate competes for ownership of vertices
[/li][li] They use Hessian Affine and SIFT as local feature detector and descriptor. They then apply hierarchical k-means to quantize local features as BoW.
[/li][/ul]