Don't try to interpolate the chroma samples, the used algorithm only
works for horizontal cositing. Let's switch to a faster and safer
version until we handle chroma siting correctly in the fastpaths.
Rework the orc code to be around 10% faster and support arbitrary matrices.
Pass the matrix parameters to the YUV->RGB functions to make them work
for all matrices. This enables more and faster fastpath conversions.
See https://bugzilla.gnome.org/show_bug.cgi?id=721701
This fast-path was adding 128 to every component including
alpha while it should only be done for all components except
alpha. This caused wrong alpha values to be generated.
Also remove the high-quality I420 to BGRA fast-path as it needs
the same fix, which causes an additional instruction, which causes
orc to emit more than 96 variables, which then just crashes.
This can only be fixed in orc by breaking ABI and allowing more
variables.