The internal elements are only created when caps on both video and subtitle pads
are known.
Prior to that, a GST_QUERY_CAPS on a video sink pad would just return ANY
instead of giving a hint of what downstream can actually handle and
prefers. This could result in upstream elements (such as decoders) deciding on
chosing (in the best cases) a non-optimal caps or (in the worst case) caps that
couldn't be handled by the elements downstream of subtitleoverlay.
In order to fix that, we assume that all subtitle "elements" handle the subtitle
overlay composition feature/meta and handle `GST_QUERY_CAPS` ourselves if the
internal elements aren't present yet.
Fixes#3176
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5834>