diff --git a/subprojects/gst-docs/markdown/additional/design/machine-learning-analytics.md b/subprojects/gst-docs/markdown/additional/design/machine-learning-analytics.md index 247f31ffac..ae452d29f3 100644 --- a/subprojects/gst-docs/markdown/additional/design/machine-learning-analytics.md +++ b/subprojects/gst-docs/markdown/additional/design/machine-learning-analytics.md @@ -266,77 +266,6 @@ advantage is the ability to keep a relation description between tensors in a refinement context On the other hand this mode of transporting analytics result make negotiation of tensor-decoder in particular difficult. -### Negotiation -Allowing to negotiate the required analysis pre/post-processing and automatically -injecting the required elements to able to perform them would be very valuable and -minimize effort of porting an analytics pipeline between different platforms and -making use of acceleration available. Tensor-decoders bin, auto-plugging of -pre-processing (considering acceleration available), auto-plugging of inference -element (optimized of the platform), post-processing, tensor-decoder bin -selecting required tensor-decoders potentially from multiple functionally -equivalent, but more adapted to the platform are all aspect to consider when -designing negotiation involved in analytics-pipeline. - -#### Negotiating Tensor-Decoder -As described above tensor-decoder need to know 4 attributes about a tensor to -know if it can handle it: - -1. Tensor dimension cardinality ( not required explicitly in some cases) -2. Tensor dimension -3. Tensor datatype -4. Tensor type (identifier of analytics-result encoding semantic) - -Note 1, 2, 3 could be encoded into 4, but this is not desirable because 1,2,3 -are useful for selection semantically-agnostic tensor processor. - -Tensor-decoder can handle multiple tensor types. This could be expressed in the -sinkpad(s) template by a list of arrays where each combination of tensor types -it can handle would be expressed. This would make the sinkpad(s) caps difficult -to read. To avoid this problem when a tensor-decoder handle multiple tensors the -tensor type is a category the encapsulate all tensor type it can handle. -Referring again to YOLOv3's 3 tensors: small, medium large, all 3 would have the -same tensor-type identifier, ex YOLOv3, and each tensors themselves would have -sub-type field distinguishing them ('small', 'medium', 'large'). Same also -applies to FastSAM 2 tensors ('FastSAM-masks', 'FastSAM-logits') where both -would be represented by the same tensor type ('FastSAM') in pad capability level. - -When tensor is stored as a meta, allocation query need to be used to negotiate -tensor-decoder. TODO: expand how this would work. - - -##### Tensor-Decoder Sinkpad Caps Examples - -Examples assuming object-detection on video frame -``` -PadTemplates: - SINK template: 'vsink' // Tensor attached on to buffer - Avaiability: 'always' - Capabilities: - video/x-raw - format: {...} - width: {...} - height: {...} - framerate: {...} - - SINK template: 'tsink' // Tensor - Avaiability: 'always' - Capabilities: - tensor/x-raw - shape:{} // This represent a x b x ... x z - datatype: {(enum) "int8", "float32", ...} - type: { (string)"YOLOv3", (string)"YOLOv4", (string)"SSD", ...)} - -``` - -##### Tensor-Decoder Srcpad(s) -Typically will be the same as sinkpad but could be different. In general -tensor-decoder only attach an analytics-meta to buffer. Analytics-meta -consumption is left to other downstream elements. It's also possible for -tensor-decoder to have very different caps on srcpad. This can be the case when -analytics-result is difficult to represent like text-to-speech or -super-resolution. In these case the tensor-decoder could be producing a media -directly. audio for TTS or image for super-resolution. - ### Inference Sinkpad(s) Capabilities Sinkpad capability, before been constrained based on model, can be any media type, including `tensor` . Note that multiple sinkpads can be present.