2009-09-09 06:38:54 +00:00
|
|
|
BufferPools
|
|
|
|
-----------
|
|
|
|
|
|
|
|
This document proposes a mechnism to build pools of reusable buffers. The
|
|
|
|
proposal should improve performance and help to implement zero-copy usecases.
|
|
|
|
|
2009-11-09 14:20:52 +00:00
|
|
|
Last edited: 2009-09-01 Stefan Kost
|
2009-09-09 06:38:54 +00:00
|
|
|
|
|
|
|
|
|
|
|
Current Behaviour
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
Elements either create own buffers or request downstream buffers via pad_alloc.
|
|
|
|
There is hardly any reuse of buffers, instead they are ususaly disposed after
|
|
|
|
being rendered.
|
|
|
|
|
|
|
|
|
|
|
|
Problems
|
|
|
|
--------
|
|
|
|
|
|
|
|
- hardware based elements like to reuse buffers as they e.g.
|
|
|
|
- mlock them (dsp)
|
|
|
|
- establish a index<->adress relation (v4l2)
|
|
|
|
- not reusing buffers has overhead and makes run time behaviour
|
|
|
|
non-deterministic:
|
|
|
|
- malloc (which usualy becomes an mmap for bigger buffers and thus a
|
|
|
|
syscall) and free (can trigger compression of freelists in the allocator)
|
|
|
|
- shm alloc/attach, detach/free (xvideo)
|
|
|
|
- some usecases cause memcpys
|
|
|
|
- not having the right amount of buffers (e.g. too few buffers in v4l2src)
|
|
|
|
- receiving buffers of wrong type (e.g. plain buffers in xvimagesink)
|
|
|
|
- receving buffers with wrong alignment (dsp)
|
|
|
|
- some usecases cause unneded cacheflushes when buffers are passed between
|
|
|
|
user and kernel-space
|
|
|
|
|
|
|
|
|
|
|
|
What is needed
|
|
|
|
--------------
|
|
|
|
|
|
|
|
Elements that sink raw data buffers of usualy constant size would like to
|
|
|
|
maintain a bufferpool. These could be sinks or encoders. We need mechanims to
|
2011-09-07 11:14:38 +00:00
|
|
|
select and dynamically update:
|
2009-09-09 06:38:54 +00:00
|
|
|
|
|
|
|
- the bufferpool owners in a pipeline
|
|
|
|
- the bufferpool sizes
|
|
|
|
- the queued buffer sizes, alignments and flags
|
|
|
|
|
|
|
|
|
|
|
|
Proposal
|
|
|
|
--------
|
|
|
|
Querying the bufferpool size and buffer alignments can work simillar to latency
|
|
|
|
queries (gst/gstbin.c:{gst_bin_query,bin_query_latency_fold}. Aggregation is
|
|
|
|
quite straight forward : number-of-buffers is summed up and for alignment we
|
|
|
|
gather the MAX value.
|
|
|
|
|
|
|
|
Bins need to track which elemnts have been selected as bufferpools owners and
|
|
|
|
update if those are removed (FIXME: in which states?).
|
|
|
|
|
|
|
|
Bins would also need to track if elements that replied to the query are removed
|
|
|
|
and update the bufferpool configuration (event). Likewise addition of new
|
|
|
|
elements needs to be handled (query and if configuration is changed, update with
|
|
|
|
event).
|
|
|
|
|
|
|
|
Bufferpools owners need to handle caps changes to keep the queued buffers valid
|
|
|
|
for the negotiated format.
|
|
|
|
|
|
|
|
The bufferpool could be a helper GObject (like we use GstAdapter). If would
|
|
|
|
manage a collection of GstBuffers. For each buffer t tracks wheter its in use or
|
2010-12-31 08:50:17 +00:00
|
|
|
available. The bufferpool in gst-plugin-good/sys/v4l2/gstv4l2bufferpool might be
|
2009-09-09 06:38:54 +00:00
|
|
|
a starting point.
|
|
|
|
|
|
|
|
|
|
|
|
Scenarios
|
|
|
|
---------
|
|
|
|
|
|
|
|
v4l2src ! xvimagesink
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- v4l2src would report 1 buffer (do we still want the queue-size property?)
|
|
|
|
- xvimagesink would report 1 buffer
|
|
|
|
|
|
|
|
v4l2src ! tee name=t ! queue ! xvimagesink t. ! queue ! enc ! mux ! filesink
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- v4l2src would report 1 buffer
|
|
|
|
- xvimagesink would report 1 buffer
|
|
|
|
- enc would report 1 buffer
|
|
|
|
|
|
|
|
filesrc ! demux ! queue ! dec ! xvimagesink
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- dec would report 1 buffer
|
|
|
|
- xvimagesink would report 1 buffer
|
|
|
|
|
|
|
|
|
|
|
|
Issues
|
|
|
|
------
|
|
|
|
|
|
|
|
Does it make sense to also have pools for sources or should they always use
|
|
|
|
buffers from a downstream element.
|
|
|
|
|
|
|
|
Do we need to add +1 to aggregated buffercount to alloc to have a buffer
|
|
|
|
floating? E.g. Can we push buffers queickly enough to have e.g. v4l2src !
|
|
|
|
xvimagesink working with 2 buffers. What about v4l2src ! queue ! xvimagesink?
|
|
|
|
|
|
|
|
There are more attributes on buffers needed to reduce the overhead even more:
|
|
|
|
|
|
|
|
- padding: when using buffers on hardware one might need to pad the buffer on
|
|
|
|
the end to a specific alignment
|
|
|
|
- mlock: hardware that uses DMA needs buffers memory locked, if a buffer is
|
|
|
|
already memory locked, it can be used by other hardware based elements as is
|
|
|
|
- cache flushes: hardware based elements usualy need to flush cpu caches when
|
|
|
|
sending results as the dma based memory writes do no update eventually
|
|
|
|
cached values on the cpu. now if there is no element next in the pipeline
|
|
|
|
that actually reads from this memory area we could avoid the flushes. All
|
|
|
|
other hardware elements and elements with any caps (tee, queue, capsfilter)
|
|
|
|
are examples for those.
|
|
|
|
|