Settle with 7x7 gaussian convolution kernels, maybe slightly less
accurate than previous 9x9 but fast enough to be able to use it on i915.
About a 20% percent speed gain (again, roughly measured with
videotestsrc and glimagesink sync=false). No noticeable rendering
difference with current effects.
Rework Sobel a little bit again making it work as the old one:
1. desaturate input texture
2. calculate horizontal convolution for x gradient and vertical
convolution for y gradient at the same time (halves the number of
needed texture lookups)
3. store results in a single texture (red and green channel)
4. calculate remaining convolution (same as above switching vertical and
horizontal)
5. calculate length of gradient using red and green as x and y
components.
Optimize wherever possible, store kernels as constants in the shaders,
remove unneeded uniforms. Restore invert property carefully avoiding
using IF.
Still not sure if "full color" convolution will be needed, glfiltersobel
is to be intended as a demo filter and xray, the only effect which uses
sobel only needs edge intensity. Dropping it for now.
Reimplement sobel in a multipass fully separated convolution:
- calculate x gradient map convolving first horizontally with blurring
kernel and then vertically with differentiating kernel
- calculate y gradient map convolving first vertically with blurring
kernel and then horizonally with differentiating kernel
- calculate length of the gradient vector
Particular care was needed with normalization of the blurring kernel and
with grey level offset of the differentiating one to prevent overflow of
rgb values from the [0.0,1.0] range in intermediate passes.
Now works on i915.
Generate a normalized gaussian kernel with given size and standard
deviation on the fly.
Remove "norm_const" uniform from convolution shaders and provide a
normalized kernel instead. Remove norm_offset uniform as it was always
zero, will reintroduce it if really needed in the future. Thanks to Eric
Anholt for suggesting it.
Save some ALU instruction calculating directly the coordinate for
texture lookup instead of summing an offset.
Still exceed maximum indirect texture lookups on i915, the only solution
I see is using a 3x3 kernel.
For now only identity, mirror and squeeze effects are available.
Maybe some factorization is needed about compilation shader
before to put the other effects since only a copy/past is needed,
at least until effect number 9: heat.
The effects from 10:sepia to 15:glow require more work.