diff --git a/ext/ogg/README b/ext/ogg/README index 4a6ccf37d2..79f71e2a77 100644 --- a/ext/ogg/README +++ b/ext/ogg/README @@ -1,3 +1,93 @@ +This document describes some things to know about the Ogg format, as well +as implementation details in GStreamer. + +INTRODUCTION +============ + +ogg and the granulepos +---------------------- + +An ogg stream contains pages with a serial number and a granulepos. +The granulepos is a 64 bit signed integer. It is a value that in some way +represents a time since the start of the stream. +The interpretation as such is however both codec-specific and +stream-specific. + +ogg has no notion of time: it only knows about bytes and granulepos values +on pages. + +The granule position is just a number; the only guarantee for a valid ogg +stream is that within a logical stream, this number never decreases. + +While logically a granulepos value can be constructed for every ogg packet, +the page is marked with only one granulepos value: the granulepos of the +last packet to end on that page. + +theora and the granulepos +------------------------- + +The granulepos in theora is an encoding of the frame number of the last +key frame ("i frame"), and the number of frames since the last key frame +("p frame"). The granulepos is constructed as the sum of the first number, +shifted to the left for granuleshift bits, and the second number: +granulepos = pframe << granuleshift + iframe + +(This means that given a framenumber or a timestamp, one cannot generate + the one and only granulepos for that page; several granulepos possibilities + correspond to this frame number. You also need the last keyframe, as well + as the granuleshift. + However, given a granulepos, the theora codec can still map that to a + unique timestamp and frame number for that theora stream) + + Note: currently theora stores the "presentation time" as the granulepos; + ie. a first data page with one packet contains one video frame and + will be marked with 0/0. Changing that to be 1/0 (so that it + represents the number of decodable frames up to that point, like + for Vorbis) is being discussed. + +vorbis and granulepos +--------------------- + +In Vorbis, the granulepos represents the number of samples that can be +decoded from all packets up to that point. + +In GStreamer, the vorbisenc elements produces a stream where: +- OFFSET is the byte offset this buffer is at; ie a running count of the + number of bytes produced before +- OFFSET_END is the granulepos of the produced vorbis buffer +- TIMESTAMP is the timestamp matching the begin of the buffer +- DURATION is set such that TIMESTAMP + DURATION is the correct +in a raw vorbis stream we use the granulepos as the offset field. + +Ogg media mapping +----------------- + +Ogg defines a mapping for each media type that it embeds. + +For Vorbis: + + - 3 header pages, with granulepos 0. + - 1 page with 1 packet header identification + - N pages with 2 packets comments and codebooks + - granulepos is samplenumber of next page + - one packet can contain a variable number of samples but one frame + that should be handed to the vorbis decoder. + +For Theora + + - 3 header pages, with granulepos 0. + - 1 page with 1 packet header identification + - N pages with 2 packets comments and codebooks + - granulepos is framenumber of last packet in page, where framenumber + is a combination of keyframe number and p frames since keyframe. + - one packet contains 1 frame + + + + +DEMUXING +======== + ogg demuxer ----------- @@ -9,25 +99,25 @@ with great efficiency. 1) the streaming mode. - In this mode, the ogg demuxer receives buffers in the _chain() function which - are then simply submited to the ogg sync layer. Pages are then processed when the - sync layer detects them, pads are created for new chains and packets are sent to - the peer elements of the pads. +In this mode, the ogg demuxer receives buffers in the _chain() function which +are then simply submited to the ogg sync layer. Pages are then processed when +the sync layer detects them, pads are created for new chains and packets are +sent to the peer elements of the pads. - In this mode, no seeking is possible. This is the typical case when the stream is - read from a network source. +In this mode, no seeking is possible. This is the typical case when the +stream is read from a network source. - In this mode, no setup is done at startup, the pages are just read and decoded. - A new logical chain is detected when one of the pages has the BOS flag set. At this - point the existing pads are removed and new pads are created for all the logical - streams in this new chain. +In this mode, no setup is done at startup, the pages are just read and decoded. +A new logical chain is detected when one of the pages has the BOS flag set. At +this point the existing pads are removed and new pads are created for all the +logical streams in this new chain. 2) the random access mode. - In this mode, the ogg file is first scanned to detect the position and length of - all chains. This scanning is performed using a recursive binary search algorithm - that is explained below. + In this mode, the ogg file is first scanned to detect the position and length +of all chains. This scanning is performed using a recursive binary search +algorithm that is explained below. find_chains(start, end) { @@ -97,121 +187,158 @@ testcases | 111 | 222 | BOS EOS +What can an ogg demuxer do? +--------------------------- +An ogg demuxer can read pages and get the granulepos from them. +It can ask the decoder elements to convert a granulepos to time. -ogg and the granulepos ----------------------- +An ogg demuxer can also get the granulepos of the first and the last page of a +stream to get the start and end timestamp of that stream. +It can also get the length in bytes of the stream +(when the peer is seekable, that is). -an ogg streams contains pages with a serial number and a granule pos. The granulepos -is a number that is codec specific and denotes the 'position' of the last sample in -the last packet in that page. +An ogg demuxer is therefore basically able to seek to any byte position and +timestamp. -ogg has therefore no notion about time, it only knows about bytes and granule positions. - -The granule position is just a number, it can contain gaps or can just be any random -number. - - -theora and the granulepos -------------------------- - -the granulepos in theora consists of the framenumber of the last keyframe shifted some -amount of bits plus the number of p/b-frames. - -This means that given a framenumber or a timestamp one cannot generate the granulepos -for that frame. eg frame 10 could have several valid granulepos values depending on if -the last keyframe was on frame 5 or 0. Given a granulepos we can, however, create a -unique correct timestamp and a framenumber. - -in a raw theroa stream we use the granulepos as the offset field. - -The granulepos of an ogg page is the framenumber of the last frame in the page. - - -vorbis and granulepos ---------------------- - -the granulepos in vorbis happens to be the same as the sample counter. conversion to and -from granulepos is therefore easy. - -in a raw vorbis stream we use the granulepos as the offset field. - -The granulepos of an ogg page is the sample number of the next page in the ogg stream. - - -What can ogg do? ----------------- - -An ogg demuxer can read pages and get the granuleposition from it. It can ask the decoder -elements to convert a granulepos to time. - -An ogg demuxer can also get the granulepos of the first and the last page of a stream to -get the start and end timestamp of that stream. It can also get the length in bytes of -the stream (when the peer is seekable, that is). - -An ogg demuxer is therefore basically able to seek to any byte position and timestamp. -When asked to seek to a given granulepos, the ogg demuxer should always convert the -value to a timestamp using the peer decoder element conversion function. It can then -binary search the file to eventually end up on the page with the given granule pos or -a granulepos with the same timestamp. +When asked to seek to a given granulepos, the ogg demuxer should always convert +the value to a timestamp using the peer decoder element conversion function. It +can then binary search the file to eventually end up on the page with the given +granule pos or a granulepos with the same timestamp. Seeking in ogg currently ------------------------ -When seeking in an ogg, the decoders can choose to forward the seek event as a +When seeking in an ogg, the decoders can choose to forward the seek event as a granulepos or a timestamp to the ogg demuxer. In the case of a granulepos, the ogg demuxer will seek back to the beginning of the stream and skip pages until it finds one with the requested timestamp. In the case of a timestamp, the ogg demuxer also seeks back to the beginning of -the stream. For each page it reads, it asks the decoder element to convert the -granulepos back to a timestamp. The ogg demuxer keeps on skipping pages until the -page has a timestamp bigger or equal to the requested one. +the stream. For each page it reads, it asks the decoder element to convert the +granulepos back to a timestamp. The ogg demuxer keeps on skipping pages until +the page has a timestamp bigger or equal to the requested one. -It is therefore important that the decoder elements in vorbis can convert a granulepos -into a timestamp or never seek on timestamp on the oggdemuxer. +It is therefore important that the decoder elements in vorbis can convert a +granulepos into a timestamp or never seek on timestamp on the oggdemuxer. -The default format on the oggdemuxer source pads is currently defined as a the -granulepos of the packets, it is also the value of the OFFSET field in the GstBuffer. +The default format on the oggdemuxer source pads is currently defined as a the +granulepos of the packets, it is also the value of the OFFSET field in the +GstBuffer. +MUXING +====== Oggmux ------ +The ogg muxer's job is to output complete Ogg pages such that the absolute +time represented by the valid (ie, not -1) granulepos values on those pages +never decreases. This has to be true for all logical streams in the group at +the same time. + +To achieve this, encoders are required to pass along the exact time that the +granulepos represents for each ogg packet that it pushes to the ogg muxer. +This is ESSENTIAL: without this exact time representation of the granulepos, +the muxer can not produce valid streams. + +The ogg muxer has a packet queue per sink pad. From this queue a page can +be flushed when: + - total byte size of queued packets exceeds a given value + - total time duration of queued packets exceeds a given value + - total byte size of queued packets exceeds maximum Ogg page size + - eos of the pad + - encoder sent a command to flush out an ogg page after this new packet + (in 0.8, through a flush event; in 0.10, with a GstOggBuffer) + - muxer wants a flush to happen (so it can output pages) + +The ogg muxer also has a page queue per sink pad. This queue collects +Ogg pages from the corresponding packet queue. Each page is also marked +with the timestamp that the granulepos in the header represents. + +A page can be flushed from this collection of page queues when: +- ideally, every page queue has at least one page with a valid granulepos + -> choose the page, from all queues, with the lowest timestamp value +- if not, muxer can wait if the following limits aren't reached: + - total byte size of any page queue exceeds a limit + - total time duration of any page queue exceeds a limit +- if this limit is reached, then: + - request a page flush from packet queue to page queue for each queue + that does not have pages + - now take the page from all queues with the lowest timestamp value + - make sure all later-coming data is marked as old, either to be still + output (but producing an invalid stream, though it can be fixed later) + or dropped (which means it's gone forever) + The oggmuxer uses the offset fields to fill in the granulepos in the pages. +GStreamer implementation details +-------------------------------- +As said before, the basic rule is that the ogg muxer needs an exact time +representation for each granulepos. This needs to be provided by the encoder. + +Potential problems are: + - initial offsets for a raw stream need to be preserved somehow. Example: + if the first audio sample has time 0.5, the granulepos in the vorbis encoder + needs to be adjusted to take this into account. + - initial offsets may need be on rate boundaries. Example: + if the framerate is 5 fps, and the first video frame has time 0.1 s, the + granulepos cannot correctly represent this timestamp. + This can be handled out-of-band (initial offset in another muxing format, + skeleton track with initial offsets, ...) + +Given that the basic rule for muxing is that the muxer needs an exact timestamp +matching the granulepos, we need some way of communicating this time value +from encoders to the Ogg muxer. So we need a mechanism to communicate +a granulepos and its time representation for each GstBuffer. + +(This is an instance of a more generic problem - having a way to attach + more fields to a GstBuffer) + +Possible ways: +- setting TIMESTAMP to this value: bad - this value represents the end time + of the buffer, and thus conflicts with GStreamer's idea of what TIMESTAMP + is. This would cause problems muxing the encoded stream in other muxing + formats, or for streaming. Note that this is what was done in GStreamer 0.8 +- setting DURATION to GP_TIME - TIMESTAMP: bad - this breaks the concept of + duration for this frame. Take the video example above; each buffer would + have a correct timestamp, but always a 0.1 s duration as opposed to the + correct 0.2 s duration +- subclassing GstBuffer: clean, but requires a common header used between + ogg muxer and all encoders that can be muxed into ogg. Also, what if + a format can be muxed into more than one container, and they each have + their own "extra" info to communicate ? +- adding key/value pairs to GstBuffer: clean, but requires changes to + core. Also, the overhead of allocating e.g. a GstStructure for *each* buffer + may be expensive. +- "cheating": + - abuse OFFSET to store the timestamp matching this granulepos + - abuse OFFSET_END to store the granulepos value + The drawback here is that before, it made sense to use OFFSET and OFFSET_END + to store a byte count. Given that this is not used for anything critical + (you can't store a raw theora or vorbis stream in a file anyway), + this is what's being done for now. + +In practice +----------- +- all encoders of formats that can be muxed into Ogg produce a stream where: + - OFFSET is abused to be the granulepos of the encoded theora buffer + - OFFSET_END is abused to be the timestamp corresponding exactly to the + granulepos + - TIMESTAMP is the timestamp matching the begin of the buffer + - DURATION is the length in time of the buffer + +- initial delays should be handled in the GStreamer encoders by mangling + the granulepos of the encoded packet to take the delay into account as + best as possible and store that in OFFSET; + this then brings TIMESTAMP + DURATION to within less + than a frame period of the granulepos's time representation + The ogg muxer will then create new ogg packets with this OFFSET as + the granulepos. So in effect, the granulepos produced by the encoders + does not get used directly. TODO ---- - -- use the OFFSET field in the GstBuffer to store/read the granulepos as - opposed to the OFFSET_END field. - - -Ogg media mapping ------------------ - -Ogg defines a mapping for each media type that it embeds. - -For Vorbis: - - - 3 header pages, with granulepos 0. - - 1 page with 1 packet header identification - - N pages with 2 packets comments and codebooks - - granulepos is samplenumber of next page - - one packet can contain a variable number of samples but one frame - that should be handed to the vorbis decoder. - -For Theora - - - 3 header pages, with granulepos 0. - - 1 page with 1 packet header identification - - N pages with 2 packets comments and codebooks - - granulepos is framenumber of last packet in page, where framenumber - is a combination of keyframe number and p frames since keyframe. - - one packet contains 1 frame - - - +- decide on a proper mechanism for communicating extra per-buffer fields diff --git a/ext/ogg/gstoggmux.c b/ext/ogg/gstoggmux.c index 18a58bccb3..f3fced896f 100644 --- a/ext/ogg/gstoggmux.c +++ b/ext/ogg/gstoggmux.c @@ -574,7 +574,7 @@ gst_ogg_mux_dequeue_page (GstOggMux * mux, GstFlowReturn * flowret) * TODO: it CAN be, but it seems silly to do so? */ buf = g_queue_peek_head (pad->pagebuffers); while (buf && GST_BUFFER_OFFSET_END (buf) == -1) { - GST_LOG_OBJECT (pad->collect.pad, GST_GP_FORMAT " pushing page", -1); + GST_LOG_OBJECT (pad->collect.pad, "[gp -1] pushing page"); g_queue_pop_head (pad->pagebuffers); *flowret = gst_ogg_mux_push_buffer (mux, buf); buf = g_queue_peek_head (pad->pagebuffers); @@ -584,11 +584,17 @@ gst_ogg_mux_dequeue_page (GstOggMux * mux, GstFlowReturn * flowret) if (buf) { /* if no oldest buffer yet, take this one */ if (oldest == GST_CLOCK_TIME_NONE) { + GST_LOG_OBJECT (mux, "no oldest yet, taking from pad %" + GST_PTR_FORMAT " with timestamp %" GST_TIME_FORMAT, + pad->collect.pad, GST_TIME_ARGS (GST_BUFFER_TIMESTAMP (buf))); oldest = GST_BUFFER_END_TIME (buf); opad = pad; } else { /* if we have an oldest, compare with this one */ if (GST_BUFFER_END_TIME (buf) < oldest) { + GST_LOG_OBJECT (mux, "older buffer, taking from pad %" + GST_PTR_FORMAT " with timestamp %" GST_TIME_FORMAT, + pad->collect.pad, GST_TIME_ARGS (GST_BUFFER_TIMESTAMP (buf))); oldest = GST_BUFFER_END_TIME (buf); opad = pad; } @@ -600,7 +606,7 @@ gst_ogg_mux_dequeue_page (GstOggMux * mux, GstFlowReturn * flowret) if (oldest != GST_CLOCK_TIME_NONE) { g_assert (opad); buf = g_queue_pop_head (opad->pagebuffers); - GST_LOG_OBJECT (opad, + GST_LOG_OBJECT (opad->collect.pad, GST_GP_FORMAT " pushing oldest page (end time %" GST_TIME_FORMAT ")", GST_BUFFER_OFFSET_END (buf), GST_TIME_ARGS (GST_BUFFER_END_TIME (buf))); *flowret = gst_ogg_mux_push_buffer (mux, buf);