Computer Vision

With the proliferation of digital video capturing, storage and communication devices, the amount of information in video form is growing rapidly in personal entertainment, security, and military applications. To share and manage video content effectively presents a technical challenge to existing information management systems. Semantic-label-based retrieval systems require substantial amounts of manual labeling of the content, and are therefore limited in their capability in handling retrieval scenarios where labels are not present, or difficult to derive.

Consider the following example representing an general application of video retrieval: A mobile phone user has just watched a low visual quality (e.g., QCIF size, 10 fps), short (e.g., 5 s) segment of a soccer game from some advertisement for the season. Now the user wants to locate the complete game in SDTV/HDTV format from their personal soccer game video collection, or some content provider’s collections. The system will therefore need to search a video database based on this 5 s segment and return the locations of the full size program, if it exists. The semantic labels are clearly not present in this 5s querying segment. The matching has thus to be `content-based'. In addition, the variance in temporal and spatial scale, in addition to the noise and distortion incurred during the communication must also be addressed.