I'll use this doc to describe how I think media info should work from the perspective of the application developer and end user, and from that extrapolate what we need to provide that. RATIONALE --------- One of the strong points of GStreamer is that it abstracts library dependencies away. A user is free to choose whatever plug-ins he has, and a developer can code to the general API that GStreamer provides without having to deal with the underlying codecs. It is important that GStreamer also handles media info well and efficiently, since more often than not the same libraries are needed to do this. So to avoid applications depending on these libs just to do the media info, we should make sure GStreamer provides a reasonable and fast abstraction for this as well. GOALS ----- - quickly read and write "tags" - quickly read stream metadata (technical properties, length, audio props, ...) - cache both kinds of data transparently - (possibly) provide bins that do this - provide a simple API to do this DEFINITION OF TERMS ------------------- The user or developer using GStreamer is interested in all information that describes the stream. The library handles these two types differently however, so I will use the following terms to describe this : - metadata : every kind of information that is tied to the "concept" of the stream, and not tied to the actual encoding or representation of the stream. - it can be altered without transcoding the stream - it would stay the same for different encodings of the file - describes properties of the information encoded into the stream - examples: - artist, title, author - year, track order, album - comments - mediainfo every kind of information that is tied to the "codec" used. - cannot be altered without transcoding - is the set of parameters the stream has been encoded with - describes properties of the encoded stream itself - examples: - bitrate targets (e.g. nominal), encoding mode (e.g. joint stereo) - to this we also add "bitrate", but we query this through the pad_query interface - format every kind of information that is tied to the "raw" bitstream - cannot be altered without decoding and changing the raw bitstream - examples: - samplerate, bit depth/width, channels - length in time - video size, frames per second, colorspace used - the format is queried by getting the GstCaps of the pad that sources the buffers - length in time and tracks for the whole stream - gotten through pad queries - stored in variables in the struct - immediate info - examples: - position in time - current bitrate - tracks : a media file or stream can contain multiple consecutive streams, which we will call "tracks". GStreamer has a format for track used in querying and seeking as well. A track should be thought of as the whole of one single piece of media inside a physical stream. A track can have at most one set of tags, and has fixed "raw" properties. EXAMPLE PIPELINES ----------------- reading metadata : filesrc ! id3v1 - would read metadata from file - id3v1 immediately causes filesrc to seek until it has found - the (first) metadata - that there is no metadata present - id3v1 sends out a property notification with name "metadata" and a GstCaps structure resetting and writing content metadata : id3v1 reset=true artist="Arid" ! filesink - effect: clear the current tag and reset it to only have Arid as artist - id3v1 seeks to the right location, clears the tag, and writes the new one COST ---- Querying media info can be expensive. Any application querying for media info should take this into account and make sure that it doesn't block the app unnecessarily while the querying happens. The app should create an object, hand it a bunch of locations to query, and connect to the signal the app is going to send out. In most cases, querying content data should be fast since it doesn't involve decoding Technical data could be harder and thus might be better done only when needed. CACHE ----- Getting media info can be an expensive operation. It makes sense to cache the dia info queried on-disk to provide rapid access to this data. It is important however that this is done transparently - the system should be able to keep working without it, or keep working when you delete this cache. The API would provide a function like gst_media_info_read_cached (media_info, location, GST_MEDIA_INFO_METADATA, GST_MEDIA_INFO_READ_CACHED); to try and get the cached metadata using the media info object. - check if the file is cached in the media info cache - if no, then read the media info and store it in the cache - if yes, then check the file against it's timestamp (or (part of) md5sum ?) - if it was changed, force a new read and store it in the cache - if it wasn't changed, just return the cached media info For optimizations, it might also make sense to do GList * gst_metadata_read_many (media_info, GList *locations, ...) which would allow the back-end to implement this more efficiently. Suppose an application loads a playlist, for example, then this playlist could be handed to this function, and a GList of metadata types could be returned. Possible implementations : - one large XML file : would end up being too heavy - one XML file per dir on system : good compromise; would still make sense to keep this in memory instead of reading and writing it all the time Also, odds are good that users mostly use files from same dir in one app (but not necessarily) Possible extra niceties : - matching of moved files, and a similar move of metadata (through user-space tool ?) !!! For speed reasons, it might make sense to somehow keep the cache in memory instead of reparsing the same cache file each time. !!! For disk space reasons, it might make sense to have a system cache. Not sure if the complexity added is worth it though. !!! For disk space reasons, we might want to add an upper limit on the size of the cache. For that we might need a timestamp on last retrieval of metadata, so that we can drop the old ones. The cache should use standard glibc. FIXME: is it worth it to use gnome-vfs for this ? STANDARDIZATION OF MEDIAINFO ---------------------------- Different file formats have different "tags". It is not always possible to map metadata to tags. Some level of agreement on metadata names is also required. For media info, the names or properties should be fairly standard. We also use the same names as used for properties and capabilities in GStreamer. This means we use - encoded audio - "bitrate" (which is bits per second - use the most correct one, ie. average bitrate for VBR for example) - raw audio - "samplerate" - sampling frequency - "channels" - "bitwidth" - how wide is the audio in bits - encoded video - "bitrate" - raw video (FIXME: I don't know enough about video, are these correct) - "width" - "height" - "colordepth" - "colorspace" - "fps" - "aspectratio" We must find a way to avoid collision. A system stream can contain both audio and video (-> bitrate) or multiple audio or video streams. One way to do this might be to make a metadata set for a stream a GList of metadata for elementary streams. For metadata and tags, the standards are less clear. Some nice ones to standardize on might be - artist - title - author - year - genre (messy though) - RMS, inpoint, outpoint (calculated through some formula, used for mixing) TESTING ------- It is important to write a decent testsuite for this and do speed comparisons between the library used and the GStreamer implementation. API --- struct GstMetadata { gchar *location; GstMetadataType type; GList *streams; GHashtable *values; }; (streams would be a GList of (again) GstMetadata's. "location" would then be reused to indicate an identifier in the stream. FIXME: is that evil ?) GstMetadataType - technical, content GstMetadataReadType - cached, raw GstMetadata * gst_metadata_read (const char *location, GstMetadataType type, GstMetadataReadType read_type); GstMetadata * gst_metadata_read_props (const char *location, GList *names, GstMetadataType type, GstMetadataReadType read_type); GstMetadata * gst_metadata_read_cached (const char *location, GstMetadataType type, GstMetadataReadType read_type); GstMetadata * gst_metadata_read_props_cached (...) GList * gst_metadata_read_cached_many (GList *locations, GstMetadataType type, GstMetadataReadType read_type); GList * gst_metadata_read_props_cached_many (GList *locations, GList *names, GstMetadataType type, GstMetadataReadType read_type); GList * gst_metadata_content_write (const char *location, GstMetadata *metadata); SOME USEFUL RESOURCES --------------------- http://www.chin.gc.ca/English/Standards/metadata_multimedia.html - describes multimedia data for images distinction between content (descriptive), technical and administrative metadata