ffmpeg/libswscale
Sebastian Pop bd83191271 swscale/aarch64: use multiply accumulate and increase vector factor to 4
This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
and bumps the vectorization factor from 2 to 4.
The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus:

$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
after:  t:0.032168 avg:0.032215 max:0.033081 min:0.032146

The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181
after:  t:0.014015 avg:0.014096 max:0.015018 min:0.013971

Tested with `make check` on aarch64-linux.

Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-17 23:41:47 +01:00
..
aarch64 swscale/aarch64: use multiply accumulate and increase vector factor to 4 2019-12-17 23:41:47 +01:00
arm arm: swscale: Only compile the rgb2yuv asm if .dn aliases are supported 2018-03-31 21:54:56 +03:00
ppc swscale: Fix AltiVec/VSX build with recent GCC 2019-10-04 08:58:17 +03:00
tests swscale/tests/swscale: Lengthen pixfmt name buffer to 21 bytes 2019-05-13 13:39:49 +02:00
x86 swscale/x86/swscale: Fix undefined left shifts of negative numbers 2019-09-28 17:24:32 +02:00
Makefile Merge commit '92db5083077a8b0f8e1050507671b456fd155125' 2017-05-04 19:59:30 -03:00
alphablend.c avutil: Rename FF_CEIL_COMPAT to AV_CEIL_COMPAT 2016-01-27 16:36:46 +00:00
bayer_template.c
gamma.c swscale: re-enable gamma 2015-09-04 19:00:20 -03:00
hscale.c avutil: Rename FF_CEIL_COMPAT to AV_CEIL_COMPAT 2016-01-27 16:36:46 +00:00
hscale_fast_bilinear.c
input.c swscale: Add support for NV24 and NV42 2019-05-12 07:51:02 -07:00
libswscale.v build: Change structure of the linker version script templates 2016-05-29 16:43:11 +02:00
log2_tab.c
options.c swscale/options: Use AV_OPT_TYPE_PIXEL_FMT 2016-11-20 13:00:22 +01:00
output.c swscale/output: Avoid 64bit in Alpha in yuv2ya16_X_c_template() 2019-10-16 19:17:57 +02:00
rgb2rgb.c swscale/rgb : move shuffle func shuffle_bytes_1230, shuffle_bytes_3012, shuffle_bytes_3210 in order to add SIMD 2018-03-24 20:22:02 +01:00
rgb2rgb.h swscale/rgb2rgb : cosmetic, move shuffle_bytes func declaration 2018-03-24 20:22:17 +01:00
rgb2rgb_template.c lsws/rgb2rgb_template: Do not compile unneeded shuffle functions on big-endian. 2018-06-10 03:22:59 +02:00
slice.c lsws/slice: Move a misplaced const. 2017-03-08 00:33:21 +01:00
swscale.c swscale/swscale: cosmetics 2019-09-27 10:58:30 +02:00
swscale.h doxygen: Standardize root-level modules 2016-08-02 22:15:25 -07:00
swscale_internal.h swscale/ppc: Move VSX-using code to its own file 2018-12-04 02:59:07 +01:00
swscale_unscaled.c swscale/swscale_unscaled: add AV_PIX_FMT_GBRAP10 for LE and BE conversion wrapper 2019-12-10 16:09:14 +01:00
swscaleres.rc
utils.c swscale/utils: Fix invalid left shifts of negative numbers 2019-09-28 17:24:32 +02:00
version.h Bump minor versions again on master to keep 4.2 versions separate from master 2019-07-21 18:36:31 +02:00
vscale.c swscale: cleanup unused code 2016-03-31 16:36:16 -03:00
yuv2rgb.c swscale/yuv2rgb: Return a more specific error code from ff_yuv2rgb_c_init_tables() 2019-01-01 21:11:47 +01:00