| Mobile
Media Metadata (MMM)
Mobile phones with media creation capabilities are
rapidly entering the marketplace in the USA and already have significant
market presence in Asia and Europe. In 2003, there were more cameraphones sold worldwide than digital cameras. cameraphones will bring
about a revolution in consumer imaging because they are not only networked,
but programmable. Software developers can write applications for these
mobile imaging devices, and software can fundamentally change the imaging
experience at the point of image capture. The opportunity is to create
software solutions for cameraphones that can address long standing
challenges in consumer media creation, sharing, management, and reuse
in a fundamentally new way. We can do this by leveraging the spatio-temporal
context and social community of media capture and use (when, where,
and by whom media is captured, shared, and used) to infer media content,
context, and community and thereby help automate media annotation, retrieval,
sharing, and reuse. As a result of this approach, we believe we will
solve a fundamental problem in consumer adoption of mobile media services:
the need to have content-based access to the media consumers capture
on their mobile devices.
We have conducted fairly large scale deployments and
user testing of our MMM prototypes with 60 users using MMM1 on the Nokia
3650 cameraphone in 2003-2004 and 60 users using MMM2 on the Nokia
7610 cameraphone in 2004-2005. Our SIMS graduate student users in
IS202
Information Organization and Retrieval have also worked in project
teams to develop numerous innovative mobile media application concepts
based on MMM1
and MMM2.
Mobile
Media Metadata 2 (MMM2)

In our MMM research, we leverage regularities in media
and metadata created by communities of users that share common spatial,
temporal, and social contexts to make inferences about the content,
context, and community of media captured on mobile devices (especially
cameraphones). MMM2 uses "context-to-community" inferencing
to support the sharing of mobile media by inferring the likely recipients
for media captured on cameraphones based on the user's and the community's
prior sharing history and contextual metadata such as the time of capture,
CellID and GPS location, and Bluetooth-sensed human co-presence. We
have seen a 2189% increase in the number of photos uploaded per user
per day in MMM2 (1.31) compared to MMM1 (0.06) which seems due to several
factors: better image quality (VGA vs. 1 megapixel image resolution,
"night mode" for low light, and digital zoom) in the Nokia
7610 vs. the Nokia 3650; familiarity of the user population with cameraphones
(12 prior cameraphone users in 2004 vs. only 1 in 2003); the availability
of only one, rather than two camera applications in MMM2 vs. MMM1; automatic
background upload of photos to the MMM2 web photo management application;
and automatic suggestion of sharing recipients on the cameraphone and
in the web application. Our qualitative and quantitative studies have
shown that MMM2 users are pleased with the share guesser's ability to
suggest sharing recipients based on prior sharing history and contextual
metadata and share on average 26% of the photos they capture and upload
with MMM2.
Mobile Media Metadata 1
(MMM1)

The devices and usage contexts of personal digital
photography are undergoing rapid transformation from the traditional
camera-to-desktop-to-network image pipeline to an integrated mobile
imaging experience. The ascendancy of mobile media capture devices (especially
cameraphones) makes possible a significant new paradigm for digital
imaging because, unlike traditional digital cameras, cameraphones integrate
multimedia capture, programmable processing, wireless networking, rich
user interaction capabilities, personal information management functions,
and automatic contextual metadata all in one device that users carry
with them almost all the time. Our first Mobile Media Metadata prototype
(MMM1) leverages the spatio-temporal context and the social community
of media capture to infer media content. In our approach we:
- Gather all automatically available information at the point of
capture (time, spatial location, phone user, etc.)
- Use metadata similarity algorithms to find similar media that
has been annotated before
- Take advantage of this previously annotated media to make educated
guesses about the content of the newly captured media
- Interact in a simple and intuitive way with the phone user to
confirm and augment system-supplied metadata for captured media
Using this approach, MMM1 guessed the correct location
of the subject of the photo (out of an average of 36.8 possible locations)
100% of the time within the first four guesses, 96% of the time within
the first three guesses, 88% of the time within the first two guesses,
and 69% of the time as the first guess.

This sister project lead by Prof.
Nancy Van House is investigating a central problem for technology
design: predicting users and uses for emerging technologies, i.e., doing
user-centered design for users and uses that don't yet exist. This is
especially true in the case of mobile media technology and applications,
in particular cameraphones, which are undergoing rapid growth and transformation.
Designers of mobile media technology and applications in industry and
academia need new methods to project and design for future uses and
users of mobile media. We use the term "social uses" to describe
the higher level motives that guide the specific actions that users
perform. For example, while we may observe that a user performs the
action of emailing a photo to family members, this action (i.e., "what"
the user does) is not the same as the motive informing the action (i.e.,
"why" the user does it), in this case to maintain the social
relationship. Our social science research has uncovered several significant
social uses of personal imaging technology which designers of imaging
and mobile media technology need to understand and design for: constructing
personal and group memory; creating and maintaining social relationships;
self-expression; self-presentation, and functional uses for oneself
and others.. These social uses and the associated findings from our
social science research have significant implications for mobile media
technology design and inform our development of design methods aimed
at projecting and designing for future uses and users of mobile media
technology.

Garage Cinema Research is building on Professor Davis'
Media Streams, an iconic visual language and system for media annotation,
retrieval, and resequencing according to semantic descriptions of media
content using manual, semi-automatic, and automatic techniques. The
MSMDX project's goal is to create a platform for collaboratively annotating,
retrieving, sharing, and remixing multimedia content on the World Wide
Web. This platform will be used to discover whether the power of distributed
social networks together with semantic web technology can be exploited
to solve the problem of how to generate useful machine-readable descriptions
of multimedia content. The usefulness of the descriptions produced will
be evaluated by building innovative media services that rely on them.
Active Capture

Actve Capture software and interaction design automate
the capture of stills and video for, and of, users. By integrating capture,
processing, and interaction, Garage Cinema Research's Active Capture
approach automates the traditional processes of direction and cinematography.
Using real-time media analysis in an interactive control loop, Active
Capture software structures the user's interaction with a capture device
to record reusable, annotated media assets. Garage Cinema Research is
researching and developing a set of consumer capture scenarios that
support media personalization and reuse as well as design methods and
tools for creating Active Capture applications. The captured media assets
are automatically annotated for later access and reuse in a variety
of applications from Visual IDs to personalized video communications,
marketing, and entertainment.

Garage Cinema Research is researching and developing
software for the mass customization and personalization of media by
structuring media assets into Adaptive Media Templates (AMTs). AMTs
encode media assets in such a way that they can co-adapt input media
assets and compute a unique customized and/or personalized result. Garage
Cinema Research has systematically automated several of the main functions
of cinematic editing, including: reframing and repositioning of images
and video (especially of people); audio-video synchronization; cutting
on motion; 1-shot/2-shot/cutaway editing; audio Foley; a variety of
parametric special effects; and basic editing operations such as keying,
compositing, and sequencing. Garage Cinema Research's automatic editing
functions render high quality personalized and customized media in seconds
on consumer level platforms that would take skilled operators on expensive
hardware hours to produce. Garage Cinema Research is extending its work
in Adaptive Media Templates to the development of media components that
understand their contents and the principles of their (re)combination.
|