gstreamer/docs/random/thomasvs/metadata

273 lines
9.3 KiB
Text
Raw Normal View History

I'll use this doc to describe how I think media info should work from the
perspective of the application developer and end user, and from that
extrapolate what we need to provide that.
RATIONALE
---------
One of the strong points of GStreamer is that it abstracts library dependencies
away. A user is free to choose whatever plug-ins he has, and a developer
can code to the general API that GStreamer provides without having to deal
with the underlying codecs.
It is important that GStreamer also handles media info well and efficiently,
since more often than not the same libraries are needed to do this. So
to avoid applications depending on these libs just to do the media info,
we should make sure GStreamer provides a reasonable and fast abstraction
for this as well.
GOALS
-----
- quickly read and write "tags"
- quickly read stream metadata (technical properties, length, audio props, ...)
- cache both kinds of data transparently
- (possibly) provide bins that do this
- provide a simple API to do this
DEFINITION OF TERMS
-------------------
The user or developer using GStreamer is interested in all information that
describes the stream. The library handles these two types differently
however, so I will use the following terms to describe this :
- metadata :
every kind of information that is tied to the "concept" of the stream,
and not tied to the actual encoding or representation of the stream.
- it can be altered without transcoding the stream
- it would stay the same for different encodings of the file
- describes properties of the information encoded into the stream
- examples:
- artist, title, author
- year, track order, album
- comments
- mediainfo
every kind of information that is tied to the "codec" used.
- cannot be altered without transcoding
- is the set of parameters the stream has been encoded with
- describes properties of the encoded stream itself
- examples:
- bitrate targets (e.g. nominal), encoding mode (e.g. joint stereo)
- to this we also add "bitrate", but we query this through the pad_query
interface
- format
every kind of information that is tied to the "raw" bitstream
- cannot be altered without decoding and changing the raw bitstream
- examples:
- samplerate, bit depth/width, channels
- length in time
- video size, frames per second, colorspace used
- the format is queried by getting the GstCaps of the pad that sources
the buffers
- length in time and tracks for the whole stream
- gotten through pad queries
- stored in variables in the struct
- immediate info
- examples:
- position in time
- current bitrate
- tracks :
a media file or stream can contain multiple consecutive streams, which
we will call "tracks". GStreamer has a format for track used in querying
and seeking as well.
A track should be thought of as the whole of one single piece of media
inside a physical stream.
A track can have at most one set of tags, and has fixed "raw" properties.
EXAMPLE PIPELINES
-----------------
reading metadata : filesrc ! id3v1
- would read metadata from file
- id3v1 immediately causes filesrc to seek until it has found
- the (first) metadata
- that there is no metadata present
- id3v1 sends out a property notification with name "metadata" and
a GstCaps structure
resetting and writing content metadata :
id3v1 reset=true artist="Arid" ! filesink
- effect: clear the current tag and reset it to only have Arid as artist
- id3v1 seeks to the right location, clears the tag, and writes the new one
COST
----
Querying media info can be expensive.
Any application querying for media info should take this into account and
make sure that it doesn't block the app unnecessarily while the querying
happens.
The app should create an object, hand it a bunch of locations to query,
and connect to the signal the app is going to send out.
In most cases, querying content data should be fast since it doesn't involve
decoding
Technical data could be harder and thus might be better done only when needed.
CACHE
-----
Getting media info can be an expensive operation. It makes sense to cache
the dia info queried on-disk to provide rapid access to this data.
It is important however that this is done transparently - the system should
be able to keep working without it, or keep working when you delete this cache.
The API would provide a function like
gst_media_info_read_cached (media_info, location,
GST_MEDIA_INFO_METADATA,
GST_MEDIA_INFO_READ_CACHED);
to try and get the cached metadata using the media info object.
- check if the file is cached in the media info cache
- if no, then read the media info and store it in the cache
- if yes, then check the file against it's timestamp (or (part of) md5sum ?)
- if it was changed, force a new read and store it in the cache
- if it wasn't changed, just return the cached media info
For optimizations, it might also make sense to do
GList * gst_metadata_read_many (media_info, GList *locations, ...)
which would allow the back-end to implement this more efficiently.
Suppose an application loads a playlist, for example, then this playlist
could be handed to this function, and a GList of metadata types could
be returned.
Possible implementations :
- one large XML file : would end up being too heavy
- one XML file per dir on system : good compromise; would still make sense
to keep this in memory instead of reading and writing it all the time
Also, odds are good that users mostly use files from same dir in one app
(but not necessarily)
Possible extra niceties :
- matching of moved files, and a similar move of metadata (through user-space
tool ?)
!!! For speed reasons, it might make sense to somehow keep the cache in memory
instead of reparsing the same cache file each time.
!!! For disk space reasons, it might make sense to have a system cache.
Not sure if the complexity added is worth it though.
!!! For disk space reasons, we might want to add an upper limit on the size of
the cache. For that we might need a timestamp on last retrieval of metadata,
so that we can drop the old ones.
The cache should use standard glibc.
FIXME: is it worth it to use gnome-vfs for this ?
STANDARDIZATION OF MEDIAINFO
----------------------------
Different file formats have different "tags". It is not always possible
to map metadata to tags. Some level of agreement on metadata names is also
required.
For media info, the names or properties should be fairly standard.
We also use the same names as used for properties and capabilities in
GStreamer.
This means we use
- encoded audio
- "bitrate" (which is bits per second - use the most correct one,
ie. average bitrate for VBR for example)
- raw audio
- "samplerate" - sampling frequency
- "channels"
- "bitwidth" - how wide is the audio in bits
- encoded video
- "bitrate"
- raw video
(FIXME: I don't know enough about video, are these correct)
- "width"
- "height"
- "colordepth"
- "colorspace"
- "fps"
- "aspectratio"
We must find a way to avoid collision. A system stream can contain both
audio and video (-> bitrate) or multiple audio or video streams. One way
to do this might be to make a metadata set for a stream a GList of metadata
for elementary streams.
For metadata and tags, the standards are less clear.
Some nice ones to standardize on might be
- artist
- title
- author
- year
- genre (messy though)
- RMS, inpoint, outpoint (calculated through some formula, used for mixing)
TESTING
-------
It is important to write a decent testsuite for this and do speed comparisons
between the library used and the GStreamer implementation.
API
---
struct GstMetadata
{
gchar *location;
GstMetadataType type;
GList *streams;
GHashtable *values;
};
(streams would be a GList of (again) GstMetadata's.
"location" would then be reused to indicate an identifier in the stream.
FIXME: is that evil ?)
GstMetadataType - technical, content
GstMetadataReadType - cached, raw
GstMetadata *
gst_metadata_read (const char *location,
GstMetadataType type,
GstMetadataReadType read_type);
GstMetadata *
gst_metadata_read_props (const char *location,
GList *names,
GstMetadataType type,
GstMetadataReadType read_type);
GstMetadata *
gst_metadata_read_cached (const char *location,
GstMetadataType type,
GstMetadataReadType read_type);
GstMetadata *
gst_metadata_read_props_cached (...)
GList *
gst_metadata_read_cached_many (GList *locations,
GstMetadataType type,
GstMetadataReadType read_type);
GList *
gst_metadata_read_props_cached_many (GList *locations,
GList *names,
GstMetadataType type,
GstMetadataReadType read_type);
GList *
gst_metadata_content_write (const char *location,
GstMetadata *metadata);
SOME USEFUL RESOURCES
---------------------
http://www.chin.gc.ca/English/Standards/metadata_multimedia.html
- describes multimedia data for images
distinction between content (descriptive), technical and
administrative metadata