Subtitles
=========

1. Problem
GStreamer currently does not support subtitles.

2. Proposed solution
  - Elements
  - Text-overlay
  - Autoplugging
  - Scheduling
  - Stream selection

The first thing we'll need is subtitle awareness. I'll focus on AVI/MKV/OGM
here, because I know how that works. The same methods apply to DVD subtitles
as well. The matroska demuxer (and Ogg) will need subtitle awareness. For
AVI, this is not needed. Secondly, we'll need subtitle stream parsers (for
all popular subtitle formats), that can deal both with parsed streams (MKV,
OGM) as well as .sub file chunks (AVI). Sample code is available in
gst-sandbox/textoverlay/.

Secondly, we'll need a textoverlay filter that can take text and video and
blits text on video. We have several such elements (e.g. the cairo-based
element) in gst-plugins already. Those might need some updates to work
exactly as expected.

Thirdly, playbin will need to handle all that. We expect subtitle streams
to end up as subimages or plain text (or xhtml text). Note that playbin
should also allow access to the unblitted subtitle as text (if available)
for accessibility purposes.

A problem popping up is that subtitles are no continuous streams. This is
especially noticeable in the MKV/OGM case, because there the input of data
depends on the other streams, so we'll only notice delays inside an element
when we've received the next data chunk. There are two possible solutions:
using timestamped filler events or using decoupled subtitle overlay elements
(bins, probably). The first has as a difficulty that it only works well in
the AVI/.sub case, where we will notice discontinuities before they become
problematic. The second is more difficult to implement, but works for both
cases.
A) fillers
Imagine that two subtitles come after each other, with 10 seconds of no-data
in between. By parsing a .sub file, we would notice immediately and we could
send a filler event (or empty data) with a timestamp and duration in between.
B) decoupled
Imagine this text element:
------------------------------
video ----- | actual element |out
|        /  -----------------|
text - -                     |
------------------------------
where the text pad is decoupled, like a queue. When no text data is available,
the pad will have received no data, and the element will render no subtitles.
The actual element can be a bin here, containing another subtitle rendering
element. Disadvantage: it requires threading, and the element itself is (in
concept) kinda gross. The element can be embedded in playbin to hide this
fact (i.e. not be available outside the scope of playbin).
Whichever solution we take, it'll require effort from the implementer.
Scheduling (process, not implementation) knowledge is assumed.

Stream selection is a problem that audio has, too. We'll need a solution for
this at the playback bin level, e.g. playbin. By muting all unused streams
and dynamically unmuting the selected stream, this is easily solved. Note
that synchronization needs to be checked in this case. The solution is not
hard, but someone has to do it.

3. Written by
Ronald S. Bultje <rbultje@ronald.bitfreak.net>, Dec. 25th, 2004


Appendix A: random IRC addition
<Company> intersting question: would it be a good idea to have a "max-buffer-length" property?
<Company> that way demuxewrs would now how often they'd need to generate filler events
<Company> s/now/know/
<BBB> hm...
<BBB> I don't think it's good to make that variable
<Company> dunno
<Company> (i'm btw always looking at this from the midi perspective, too)
<Company> (because both subtitles and midi are basically the same in this regard)
<BBB> and do you mean 'after the stream has advanced <time> and we didn't read a new subtitle in this mkv stream, we should send a filler'?
<Company> yeah
<BBB> it goes for avi with large init_delay values, too
<Company> so you don't need to send fillers every frame
<BBB> right
<BBB> cant' we just set that to, for example, 1s?
<BBB> it's fairly random, but still
<Company> that's another option, too
<Company> though you could write all file parsers with max-delay=MAXINT
<Company> would make them a lot easier
<BBB> it's true that queue size, for example, depends on this value
<BBB> e.g. if you make this 5s and set queue size to 1s, it'll hang
<Company> right
<BBB> whereas if you set it to 1s and queue size to 5s, you waste space
<BBB> :)
<BBB> you ought to set it to max-delay * (n_streams + 1)
<BBB> or so
<BBB> or -1
<BBB> I forgot
<BBB> ohwell
<Company> if you'd use filtercaps and queue sizes in your app, you could at least work around deadlocks
<BBB> yeah
<Company> though ideally it should just work of course...
<BBB> good point...