CVDB 2005: Paper Session I: Multimedia Modeling and Querying

EXTENT: Fusing Context, Content, and Semantic Ontology for Photo Annotation

Edward Y. Chang, VIMA Technologies, Santa Barbara

This architecture paper presents EXTENT, a probabilistic framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. EXTENT fuses contextual information (location, time, and camera parameters), photo content (perceptual features), and semantic ontology in a synergistic way. It uses causal strengths to encode causalities between variables, and between variables and semantic labels. Through a landmark-recognition case study, we show that EXTENT can provide high-quality annotation, substantially better than any traditional unimodal methods.