Implement a new memory type wrapping CVPixelBuffer.
There are two immediate advantages:
a) Make the GstMemory itself retain the CVPixelBuffer. Previously,
the containing GstBuffer was solely responsible for the lifetime of
the backing CVPixelBuffer.
With this change, we remove the GST_MEMORY_FLAG_NO_SHARE so that
GstMemory objects be referenced by multiple GstBuffers (doing away
with the need to copy.)
b) Delay locking CVPixelBuffer into CPU memory until it's actually
mapped -- possibly never.
The CVPixelBuffer object is shared among references, shares and
(in planar formats) planes, so a wrapper GstAppleCoreVideoPixelBuffer
structure was introduced to manage locking.
https://bugzilla.gnome.org/show_bug.cgi?id=747216
When doing GLMemory avfvideosrc negotiates UYVY. This change allows avfvideosrc
! tee name=t ! ... ! glimagesink t. ! ... ! gldownload ! vtenc_h264 ! ...
to do GLMemory and 0-copy with the encoder (with the CV meta).
Change texture format from BGRA to NV12. This allows a pipeline like avfvideosrc
! tee name=t ! ... ! glimagesink t. ! ... ! gldownload ! vtenc_h264 ! ... to
negotiate GLMemory. This makes the glimagesink branch much faster (obviously)
and triggers the 0-copy path between avfvideosrc and vtenc (using the CV meta).
Combined this results in a huge perf improvement on iOS (25-30% of CPU time in a
pipeline like the one above).
Note that this doesn't introduce a new shader conversion in the sink, since BGRA
textures had to be copied/converted from format=BGRA,texture-target=RECTANGLE to
format=RGBA,texture-target=2D anyway.
Fixate to the highest possible resolution and fps. Otherwise by default we end
up fixating at 2fps and the lowest supported resolution, which is hardly what
someone who bought an overpriced smartphone wants.
We need a static lock to protect various NVENC methods in _set_format(). Without
this the CPU use increases dramatically on initialisation of the element when
there are multiple elements being initialised at the same time.
https://bugzilla.gnome.org/show_bug.cgi?id=759742
When the mode of decklinkvideosink is set to "auto", the sink claims to
support the full set of caps that it can support for all modes. Then, every
time new caps are set, the sink will automatically find the correct mode for
these caps and set it.
Caveat: We have no way to know whether a specific mode will actually work for
your hardware. Therefore, if you try sending 4K video to a 1080 screen, it
will silently fail, we have no way to know that in advance. Manually setting
that mode at least gave the user a way to double-check what they are doing.
https://bugzilla.gnome.org/show_bug.cgi?id=759600
Otherwise qtkitvideosrc fails to build on OSX 10.10.4
because QTKit has been deprecated since OS X 10.9.
Also set -mmacosx-version-min=10.8 in front to allow
the user or cerbero to override the version.
https://bugzilla.gnome.org/show_bug.cgi?id=745564
Add gst_gl_memory_allocator_get_default to get the default allocator based on
the opengl version. Allows us to stop hardcoding the PBO allocator which isn't
supported on gles2.
Fixes GL upload on iOS9 among other things.
- Create GstGLVideoAllocationParams which is a GstGLAllocationParams subclass.
- Make it possible to allocate glmemory objects directly if no frills are
needed.
Prefer GLMemory over sysmem. Also now when pushing GLMemory we push the
original formats (UYVY in OSX, BGRA in iOS) and leave it to downstream to
convert.
It was added back in the day to make texture sharing work by default with
glimagesink inside playbin. These days glimagesink accepts (and converts) YUV
internally so it's no longer needed.
Switch to using IOSurface instead of CVOpenGLTextureCache on OSX. The latter can't be
used anymore to do YUV => RGB with opengl3 on El Capitan as GL_YCBCR_422_APPLE
has been removed from the opengl3 driver. Also switch to NV12 from UYVY, which
was the only YUV format supported by CVOpenGLTextureCache.
First of a few commits to stop using CVOpenGLTextureCache on OSX and use
IOSurfaces directly instead. CVOpenGLTextureCache hasn't been updated for OpenGL
3 which is why texture sharing is currently disabled on OSX.
rename gst-launch --> gst-launch-1.0
replace old elements with new elements(ffmpegcolorspace -> videoconvert, ffenc_** -> avenc_**)
fix caps in examples
https://bugzilla.gnome.org/show_bug.cgi?id=759432
It will fail and cause the sink to crash. Instead wait until the window is
visible again before checking if the swapchain really has to be recreated.
https://bugzilla.gnome.org/show_bug.cgi?id=741608
The video decoders tried calling gst_buffer_add_*meta() on non-writable
buffer resulting in warnings of this kind:
gstamcvideodec.c:921 (_gl_sync_render_unlocked): WARNING: amcvideodec
Failed to create the transformation meta for the gl_sync 0xabc03848
buffer 0xabb01b40 (0)
https://bugzilla.gnome.org/show_bug.cgi?id=758694
Some devices only ever keep one buffer available in the GL queue resulting in
multiple calls to release_output_buffer only causing one frame to be rendered.
If there is a queue after amcvideodec (even playsink's small one), then
multiple buffers are pushed but only a small fraction of them are actually
rendered on time. The rest will either render some number of frames ahead of
where they are meant to be or timeout waiting for a frame that's already been
rendered.
Solved by moving the release_output_buffer into the sync_meta the is pushed
downstream. When downstream renders, the custom sync implementation attempts
to release the current buffer (if not already released) and render. Once the
frame has been rendered to the screen, the next frame is released and is
hopefully available by the time the next frame is to be rendered.
This fixes a perceived frame jitter in the output.
Year 12: I still don't understand how negotiation works.
Apparently gst_pad_query_caps doesn't do what I thought it did. To get the
actual caps that can flow through vtdec:src we must call gst_pad_peer_query_caps
with the template caps as filter.
Fixes negotiation with stuff that doesn't understand GLMemory (hello videoscale).
This provides a performance and power usage improvement by removing
the texture copy from an OES texture to 2D texture.
The flow is as follows
1. Generate the output buffer with the required sync meta with the incrementing
push counter and OES GL memory
1.1 release_output_buffer (buf, render=true) and push downstream
2. Downstream waits for on the sync meta (timed wait) or drops the frame (no wait)
2.1 Timed wait for the frame number to reach the number of frame callbacks fired
2.2 Unconditionally update the image when the wait completes (success or fail).
Sets the affine transformation matrix meta on the buffer.
3. Downstream renders as usual.
At *some* point through this the on_frame_callback may or may not fire. If it
does fire, we can finish waiting early and render. Otherwise we have to
wait for a timeout to occur which may cause more buffers to be pused into the
internal GL queue which siginificantly decreases the chances of the
on_frame_callback to fire again. This is because the frame callback only occurs
when the internal GL queue changes state from empty to non-empty.
Because there is no way to reliably correlate between the number of buffers
pushed and the number of frame callbacks received, there are a number of
workarounds in place.
1. We self-increment the ready counter when it falls behind the push counter
2. Time based waits as the frame callback may not be fired for a certain frame.
3. It is assumed that the device can render at speed or performs some QoS of
the interal GL queue (which may not match the GStreamer QoS).
It holds that we call SurfaceTexture::updateTexImage for each buffer pushed
downstream however there's no guarentee that updateTexImage will result in
the exact next frame (it could skip or duplicate) so synchronization is not
guaranteed to be accurate although it seems to be close enough to be unable
to discern visually. This has not changed from before this patch. The current
requirement for synchronization is that updateTexImage is called at the point in
time when the buffers is to be rendered.
https://bugzilla.gnome.org/show_bug.cgi?id=757285
Rework negotiation implementing GstVideoDecoder::negotiate. Make it possible to
switch texture sharing on and off at runtime. Useful to (eventually) turn
texture sharing on in pipelines where glimagesink is linked only after
decoding has already started (for example OWR).
Improve decode error handling by avoiding calling into GstVideoDecoder from the
VT decode callback. This removes contention on the GST_VIDEO_DECODER_STREAM_LOCK
which used to make the decode callback slow enough for VT to start dropping lots
of frames once the first frame was dropped.
Otherwise, gst_vtenc_negotiate_profile_and_level will double-release as
it checks for profile_level != NULL. This caused crashes when the
vtenc instance is stopped and then restarted.
https://bugzilla.gnome.org/show_bug.cgi?id=757935
Use gst_gl_sized_gl_format_from_gl_format_type to get the format passed to
CVOpenGLESTextureCacheCreateTextureFromImage. Before this change extracting the
second texture from the pixel buffer was failing on ios 9.1.
No need to use G_GINT64_FORMAT for potentially negative values of
GstClockTimeDiff. Since 1.6 these can be handled with GST_STIME_ARGS.
Plus it creates more readable values in the logs.
https://bugzilla.gnome.org/show_bug.cgi?id=757480
Solved with a simple shader templating mechanism and string replacements
of the necessary sampler types/texture accesses and texture coordinate
mangling for rectangular and external-oes textures.
Add the various tokens/strings for the differnet texture types (2D, rect, oes)
Changes the GLmemory api to include the GstGLTextureTarget in all relevant
functions.
Update the relevant caps/templates for 2D only textures.
Otherwise we're going to return times starting at 0 again after shutting down
an element for a specific input/output and then using it again later.
https://bugzilla.gnome.org/show_bug.cgi?id=755426
GstVideoDecoder has its own logic for detecting when to reconfigure
which ultimately calls decide_allocation and results in a new
texture cache that has not been configured from our reconfigure check.
https://bugzilla.gnome.org/show_bug.cgi?id=755156
Fixes playback to GL memory on iOS, where the colours are messed
up by passing Luminance/LuminanceAlpha textures where
color convert expects R/RG textures.
https://bugzilla.gnome.org/show_bug.cgi?id=754504
We were converting all times to our internal running times, that is the time
the sink itself spent in PLAYING already. But forgot to do that for the
running time calculated from the buffer timestamps. As such, all buffers were
scheduled much later if the pipeline's running time did not start at 0.
This happens for example if a base time is explicitly set on the pipeline.
https://bugzilla.gnome.org/show_bug.cgi?id=754528
Casting to UINT from HMIXER generates the following warning with
64bit Windows target MinGW:
gstdirectsoundsrc.c: In function 'gst_directsound_src_mixer_find':
gstdirectsoundsrc.c:733:30: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
mmres = mixerGetDevCaps ((UINT) dsoundsrc->mixer,
^
cc1: all warnings being treated as errors
We can use portable GPOINTER_TO_UINT() macro for this propose.
https://bugzilla.gnome.org/show_bug.cgi?id=754756
Instead of checking for the gstreamer-video-1.0 package is installed,
just assume it is since we already check for the -base dependency.
With this replace the GST_VIDEO_* variables in makefiles and directly
link with libgstvideo.
https://bugzilla.gnome.org/show_bug.cgi?id=753820
Also implement framerate handling correctly by borrowing the code from
ximagesrc. GstBaseSrc::get_times() can't be used for that, we have to
implement proper waiting ourselves.
The block that is dispatched async to the main thread assumed the
wrapping GstAvSampleVideoSink to be alive. However, at the time of
the block execution the GstObject instance that is deferenced to access
the CA layer might already be freed, which caused occasional crashes.
Instead, we now only pass the CoreAnimation layer that needs to be
released to the block. We use __block to make sure the block is not
increasing the refcount of the CA layer again on its own.
https://bugzilla.gnome.org/show_bug.cgi?id=753081
CMBlockBuffer offers a model similar to GstBuffer, as it can
consist of multiple non-consecutive memory blocks.
Prior to this change, what we were doing was:
1) Incorrect:
CMBlockBufferCreateWithMemoryBlock does not copy the data,
but we gst_buffer_unmap'd right away.
2) Inefficient:
If the GstBuffer consisted of non-contiguous memory blocks,
gst_buffer_map resulted in malloc / memcpy.
With this change, we construct a CMBlockBuffer out of individual mapped
GstMemory objects. CMBlockBuffer is made to retain the GstMemory
objects (through the use of CMBlockBufferCustomBlockSource), so the
original GstBuffer can be unref'd.
https://bugzilla.gnome.org/show_bug.cgi?id=751241
All goto fail happen before ret is set. ret must be NULL, and the only
thing the fail statement block does is return NULL. Replacing the jumps to
do this return directly.
CID #1311329
CMBlockBufferGetDataLength would return the entire data length, while
size of individual blocks can be smaller. Iterate over the block buffer
and add the individual (possibly non-contiguous) memory blocks.
https://bugzilla.gnome.org/show_bug.cgi?id=751071
When AVFoundation indicates a supported frame rate range, add it to
the caps. This is important for devices such as the iPhone 6, which
indicate a single AVFrameRateRange of 2fps - 60fps.
https://bugzilla.gnome.org/show_bug.cgi?id=751048
In JNI_OnLoad() we will already get the Java VM passed and could
just directly use that. gstreamer_android-1.0.c will now provide
this to us.
Reason for this is that apparently not all Android system are
providing the JNI functions to get the currently running Java VMs, so
we would fail to get. With this we will always be able to get the Java
VM on such systems.
We only need that if no Java VM is running yet, and all usual cases,
i.e. when calling GStreamer from an actual Android app, there will already
be a Java VM we can just use.
It seems like some phones come without that symbol, let's hope they come
with the other symbol but for now don't make a missing JNI_CreateJavaVM fatal.
This allows us to signal what kind of audio we are expecting to record,
which should tell the system to apply filters (such as echo
cancellation, noise suppression, etc.) if required.
Even when we fail to encode frame, we should still enqueue it so
it could be passed into handle_frame (with output_buffer == NULL).
Otherwise, we risk GstVideoEncoder's queue of frames growing unbounded.
Note: We're slightly changing the renegotiation code to accommodate for
frames without output buffers, but this commit takes no ownership over
the way negotiation is being done.
https://bugzilla.gnome.org/show_bug.cgi?id=750669
VTCompressionSessionEncodeFrame retains the CVPixelBuffer during
encoding, and will release it as soon as it can (e.g. before it even
calls our callback). This means we can safely release input buffer
at this point, possibly allowing the system to reuse it sooner.
https://bugzilla.gnome.org/show_bug.cgi?id=750671
Copying arbitrary metas is going to cause problems and this should really be
handled by the base class. It overrides most other things already anyway,
including timestamp and duration. Those are just set here now so we can
insert the frame sorted into the queue.
https://bugzilla.gnome.org/show_bug.cgi?id=748922
OMX.Exynos. codecs are existing on some devices like the
Galaxy S5 mini, and cause random crashes (of the device,
not the app!) and generally misbehave. That specific device
has other codecs that work with a different name, but let's
just give them marginal rank in case there are devices that
have no other codecs and these are actually the only working
ones
On some devices there are codecs that don't start with OMX., while
there are also some that do. And on some of these devices the ones
that don't start with OMX. just crash during initialization while
the others work. To make things even more complicated other devices
have codecs with the same name that work and no alternatives.
So just give a lower rank to these non-OMX codecs and hope that
there's an alternative with a higher rank.
Also stagefright gives codecs starting with OMX. a higher rank too and
considers other codecs that don't start with OMX. as software codecs.
This decoder does not work if width and height field are not set
in the sinkpad caps. Let's make this explicit by adding them to
the template caps.
https://bugzilla.gnome.org/show_bug.cgi?id=749655
It is incorrect to modify the frame properties after passing them, since
VTCompressionSessionEncodeFrame takes reference and we have no control
over when it's being used.
In fact, the code can be simplified. We just preallocate the frame
properties for keyframe requests, and pass NULL otherwise.
https://bugzilla.gnome.org/show_bug.cgi?id=748467
Unless stopRequest is set, we should unlock conditionally -- otherwise,
the 'create:' method can wake up to an empty buffer queue
and pull a nil buffer.
https://bugzilla.gnome.org/show_bug.cgi?id=748054
gst_ks_device_provider_probe() is a no-braier, just runs ks_enumerate_devices()
and reports the results.
Monitoring is a bit more tricky. We have to create a dummy message-processing
window and register device change notifications for it.
As kernel streaming can (and should) be used for audio capture and audio
playback, this change also has certain placeholders for such.
https://bugzilla.gnome.org/show_bug.cgi?id=747757
The autodetection mode was broken because a race condition in the input mode
setting. The mode could be reverted back when it was replaced in
the streaming thread by the old mode in the middle of mode changed callback.
We now fill GErrors for everything that could throw an exception, and method
calls now always return a gboolean and their value in an out-parameter to
distinguish failures from other values.
The shm-area-property tells the name of the shm area used by the element. This
is useful for cases where shmsink is not able to clean up (calling
shm_unlink()), e.g. if it is in a sandbox.
https://bugzilla.gnome.org/show_bug.cgi?id=675134
while having the default vtdec at secondary rank. This allows decodebin/playbin
to prefer the hardware based decoders, and if that fails to initialize because
hardware resources are busy to fall back to e.g. the libav based h264 decoder
instead of the software based vtdec (which is slower), and only fall back to
the software based vtdec if there is no higher ranked decoder available.
Using requestMediaDataWhenReadyOnQueue the layer will execute a block
when it would like more frames. Using this we can provide the current
frame and avoid needlessly filling the layer's buffer queue causing
older frames to be displayed when under resource pressure.
Otherwise we might set bogus values or GST_CLOCK_TIME_NONE.
Also make sure to reset the caps field to NULL after unreffing
the caps to prevent accidential use afterwards, and unref any
old caps before we remember new caps.
Otherwise we will still have a reference to the surface left, which would
prevent activating the sink again later. E.g. after we lost the device.
Hopefully fixes https://bugzilla.gnome.org/show_bug.cgi?id=744615
Add the diff between the external time when we went to playing and
the external time when the pipeline went to playing. Otherwise we
will always start outputting from 0 instead of the current running
time.
gstdecklink.cpp: In member function 'virtual HRESULT GStreamerDecklinkInputCallback::VideoInputFrameArrived(IDeckLinkVideoInputFrame*, IDeckLinkAudioInputPacket*)':
gstdecklink.cpp:498:22: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
if (capture_time > m_input->clock_start_time)
^
gstdecklink.cpp:503:22: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
if (capture_time > m_input->clock_offset)
^
The driver has an internal buffer of unspecified and unconfigurable size, and
it will pull data from our ring buffer as fast as it can until that is full.
Unfortunately that means that we pull silence from the ringbuffer unless its
size is by conincidence larger than the driver's internal ringbuffer.
The good news is that it's not required to completely fill the buffer for
proper playback. So we now throttle reading from the ringbuffer whenever
the driver has buffered more than half of our ringbuffer size by waiting
on the clock for the amount of time until it has buffered less than that
again.
The ringbuffer's acquire() is too early, and ringbuffer's start() will only be
called after the clock has advanced a bit... which it won't unless we start
scheduled playback.
Not from the decklink clock. Both will return exactly the same time once the
decklink clock got slaved to the pipeline clock and received the first
observation, but until then it will return bogus values. But as both return
exactly the same values, we can as well use the pipeline clock directly.
There is no reason to pre-roll more buffers here as we have our own ringbuffer
with more segments around it, and we can immediately provide more buffers to
OpenSL ES when it requests that from the callback.
Pre-rolling a single buffer before starting is necessary though, as otherwise
we will only output silence.
Lowers latency a bit, depending on latency-time and buffer-time settings.
4 is the "typical" number of buffers defined by Android's OpenSL ES
implementation, and its code is optimized for this. Also because we
have our own ringbuffer around this, we will always have enough
buffering on our side already.
Allows for more efficient processing.
The pseudo buffer pool code was using gst_buffer_is_writable()
alone to try and figure-out if cached buffer could be reused.
It needs to check for memory writability too. Also check map
result and fix map flags.
https://bugzilla.gnome.org/show_bug.cgi?id=734264
Use YUV instead of RGB textures, then convert using the new apple specific
shader in GstGLColorConvert. Also use GLMemory directly instead of using the
GL upload meta, avoiding an extra texture copy we used to have before.
When doing texture sharing we don't need to call CVPixelBufferLockBaseAddress to
map the buffer in CPU. This cuts about 10% relative cpu time from a vtdec !
glimagesink pipeline.
Otherwise we might start the scheduled playback before the audio or video streams are
actually enabled, and then error out later because they are enabled to late.
We enable the streams when getting the caps, which might be *after* we were
set to PLAYING state.
Otherwise we might start the streams before the audio or video streams are
actually enabled, and then error out later because they are enabled to late.
We enable the streams when getting the caps, which might be *after* we were
set to PLAYING state.
This API has been deprecated for eternities and microsoft
stopped shipping the headers in 2010 accoding to wikipedia,
so let's just remove it and focus on bringing the plugins
based on the newer APIs up to snuff.
This fixes handling of flushing seeks, where we will get a PAUSED->PLAYING
state transition after the previous one without actually going to PAUSED
first.
Otherwise we will overflow the internal buffer of the hardware
with useless frames and run into an error. This is necessary until
this bug in basesink is fixed:
https://bugzilla.gnome.org/show_bug.cgi?id=742916
decklinkvideosink might be added later to the pipeline, or its state might
be handled separately from the pipeline. In which case the running time when
we (last) went into PLAYING state will be different from the pipeline's.
However we need our own start time to tell the Decklink API, which running
time should be displayed at the moment we go to PLAYING and start scheduled
rendering.
... and hope that everything will be fine. This shouldn't really happen but
previously happened during shutdown. It should be fixed in videoencoder now,
but better be on the safe side here.
Use AVF provided timings to timestamp output buffers. Use the running time at
the time the first buffer is produced to base timestamps on. Report 1-frame
latency based on the negotiated framerate instead of hardcoding 4ms latency.
The property is in kbit/s and we store it in bit/s, so just multiply and
divide by 1000. No need to put a factor of 8 in there.
kVTCompressionPropertyKey_AverageBitRate is also in bit/s according to
its documentation.
This updates the dshowvideosink to work with the GStreamer 1.0.x APIs, and
avoids the use of deprecated GLib threading API that are now used since
GLib 2.32+
https://bugzilla.gnome.org/show_bug.cgi?id=699364
No idea where the DecklinkAPIDispatch.cpp comes from on Windows,
but this should still work. Will just become a problem once we
use other parts of the API.
Otherwise we're going to starve other elements if the decklink clock
is slower than the pipeline clock, or starts much later.
Of course this will still cause problems if the decklink clock and ours are
completely out of sync, or running at a very different rate. But this at least
works better now.
If we just count the frames and calculate timestamps from that, all frames
will arrive late in the sink as we have a live source here. Instead take
the pipeline clock at capture time as reference.
We have to handle the callback object a bit different:
a) it needs a virtual destructor
b) we need to set the callback to NULL when we're done with the output
c) create a new one every time
https://bugzilla.gnome.org/show_bug.cgi?id=740616
We will run into an assertion in set_caps() if we try to change
caps while the source is already running. Don't try to find new
caps in GstBaseSrc::negotiate() to prevent caps changes.
The object lock only protects the session, as we modify
the session from other threads when the bitrate property
is changed. Don't hold it much longer than for session
related things.
And we need to release the video decoder stream lock before
enqueueing a frames. It might wait for our callback to dequeue
a frame from another thread, which will then take the stream
lock too and deadlock.
It is not required on OSX apparently and was only added in 10.9.6 there.
Calculating the correct level from the configuration is not trivial, so let's
just not set a level at all here.
DVB-T2 supports 5, 10 and 1.712 MHz
Order of the enum values (new values after _AUTO)
has been kept congruent with the one in the v4l
API for consistency
Previously known as DMB-T/H, this is the
terrestial DTV broadcast standard currently
used by the People's Republic of China,
Hong Kong, Laos and Macau (officially),
and by Malaysia, Iraq, Jordan, Syria and
Lebanon (experimentally).
These apply to ISDB-T, DVB-T2 and DTMB
Order of the enum values (new rates after _AUTO)
has been kept congruent with the one in the v4l
API for consistency.
According to the v4l-dvb API docs, these are only
used for DVB-T2 at the moment.
Order of the enum values (new rates after _AUTO)
has been kept congruent with the one in the v4l
API for consistency.
iOS has special stride requirements that we don't know yet, so copy
input buffers into buffers allocated by iOS for now.
Later we should check the stride and probably provide a buffer pool for these
buffers so upstream can directly write in there.