Settle with 7x7 gaussian convolution kernels, maybe slightly less
accurate than previous 9x9 but fast enough to be able to use it on i915.
About a 20% percent speed gain (again, roughly measured with
videotestsrc and glimagesink sync=false). No noticeable rendering
difference with current effects.
Get rid of buggy and complicated hls conversion code for the sin effect.
The only thing needed was hue anyway and it is easily calculated using
Preucil formula for rgb to polar coordinates conversion.
Now works on i915 (removed all the IF blocks). Still needs some tuning,
I wonder if it will ever work properly.
Add a new convenience function in GstGLFilter that just draws an input
texture to a target texture using a simple shader with just a "tex"
uniform sampler.
Move draw_texture from glfiltersobel to glfilter. Still need to update
other plugins to this.
Rework Sobel a little bit again making it work as the old one:
1. desaturate input texture
2. calculate horizontal convolution for x gradient and vertical
convolution for y gradient at the same time (halves the number of
needed texture lookups)
3. store results in a single texture (red and green channel)
4. calculate remaining convolution (same as above switching vertical and
horizontal)
5. calculate length of gradient using red and green as x and y
components.
Optimize wherever possible, store kernels as constants in the shaders,
remove unneeded uniforms. Restore invert property carefully avoiding
using IF.
Still not sure if "full color" convolution will be needed, glfiltersobel
is to be intended as a demo filter and xray, the only effect which uses
sobel only needs edge intensity. Dropping it for now.
Reimplement sobel in a multipass fully separated convolution:
- calculate x gradient map convolving first horizontally with blurring
kernel and then vertically with differentiating kernel
- calculate y gradient map convolving first vertically with blurring
kernel and then horizonally with differentiating kernel
- calculate length of the gradient vector
Particular care was needed with normalization of the blurring kernel and
with grey level offset of the differentiating one to prevent overflow of
rgb values from the [0.0,1.0] range in intermediate passes.
Now works on i915.
Thanks to Eric Anholt I've finally understood (at least I hope) how to
count texture indirections and save up some. Texture sampling dependent
on the result of some math counts as an indirection phase. Grouped
texture lookups with no math involved count as a single indirection.
Math on the coordinates count as indirection.
So the best thing is to group all the math involving coordinates and
then do all the lookups.
This saves enough indirections to make glfilterblur and glow effect
work, albeit a bit slowly, on i915.
Remove unused uniforms from the laplacian filter. Also remove if
kernel[i] != 0 checks so that it compiles where IF is not available.
Again, big thanks to Eric Anholt for the hints.
Apparently assigning gl_TexCoord to a temp count as an indirection.
Using it directly avoids it and limits indirections to four not
exceeding i915 limit. Now xpro effect works on i915.
Get rid of polar coordinates in the twirl effect. The same can be done
using a rotation matrix, saving alu instructions and, most of all,
avoiding the use of the evil atan() function (which uses IF operators).
Calculate rotation angle in a saner, understandable way.
Works on i915! (Hope it still works elsewhere too as I'm not able to
test at the moment)
Get rid of polar coordinates in the tunnel effect as the same can easily
be done just clamping the radius and multiplying.
Remove the evil atan() call that uses branching and a lot of unneeded alu
instructions. Now works on i915!
Generate a normalized gaussian kernel with given size and standard
deviation on the fly.
Remove "norm_const" uniform from convolution shaders and provide a
normalized kernel instead. Remove norm_offset uniform as it was always
zero, will reintroduce it if really needed in the future. Thanks to Eric
Anholt for suggesting it.
Save some ALU instruction calculating directly the coordinate for
texture lookup instead of summing an offset.
Still exceed maximum indirect texture lookups on i915, the only solution
I see is using a 3x3 kernel.
Reduce the number of register calculating texture lookup offset on the
fly. It was just a simple sequence, no need to store it in a array.
Fixes maximum number of registers exceeded error with i915. Still
exceed maximum indirect texture lookups and maximum ALU instructions.
Maybe we should gave up some blur goodness and use lightly more little
kernels.
Apparently saving up some texture lookup for zero kernel elements is
definitely not worth the use of branching. This way convolution
fragment programs also work where IF operator is not supported (tested
on i915 and nouveau). See also discussion on bug #615696.
Thanks to Eric Anholt for spotting this.