Commit Graph

24 Commits

Author SHA1 Message Date
Wu Jianhua
4041c1029b libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512()
We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm.
In a nutshell, the new algorithm does three things, gathering data from
8/16 rows, blurring data, and scattering data back to the image buffer.
Here we used a customized transpose 8x8/16x16 to avoid the huge overhead
brought by gather and scatter instructions, which is dependent on the
temporary buffer called localbuf added newly.

Performance data:
ff_horiz_slice_avx2(old): 109.89
ff_horiz_slice_avx2(new): 666.67
ff_horiz_slice_avx512: 1000

Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com>
Co-authored-by: Jin Jun <jun.i.jin@intel.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
2021-08-29 19:58:33 +02:00
Wu Jianhua
68a2722aee libavfilter/x86/vf_gblur: add ff_verti_slice_avx2/512()
The new vertical slice with AVX2/512 acceleration can significantly
improve the performance of Gaussian Filter 2D.

Performance data:
ff_verti_slice_c: 32.57
ff_verti_slice_avx2: 476.19
ff_verti_slice_avx512: 833.33

Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com>
Co-authored-by: Jin Jun <jun.i.jin@intel.com>
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
2021-08-29 19:58:33 +02:00
Andreas Rheinhardt
8be701d9f7 avfilter/avfilter: Add numbers of (in|out)pads directly to AVFilter
Up until now, an AVFilter's lists of input and output AVFilterPads
were terminated by a sentinel and the only way to get the length
of these lists was by using avfilter_pad_count(). This has two
drawbacks: first, sizeof(AVFilterPad) is not negligible
(i.e. 64B on 64bit systems); second, getting the size involves
a function call instead of just reading the data.

This commit therefore changes this. The sentinels are removed and new
private fields nb_inputs and nb_outputs are added to AVFilter that
contain the number of elements of the respective AVFilterPad array.

Given that AVFilter.(in|out)puts are the only arrays of zero-terminated
AVFilterPads an API user has access to (AVFilterContext.(in|out)put_pads
are not zero-terminated and they already have a size field) the argument
to avfilter_pad_count() is always one of these lists, so it just has to
find the filter the list belongs to and read said number. This is slower
than before, but a replacement function that just reads the internal numbers
that users are expected to switch to will be added soon; and furthermore,
avfilter_pad_count() is probably never called in hot loops anyway.

This saves about 49KiB from the binary; notice that these sentinels are
not in .bss despite being zeroed: they are in .data.rel.ro due to the
non-sentinels.

Reviewed-by: Nicolas George <george@nsup.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-08-20 12:53:58 +02:00
Andreas Rheinhardt
1b20853fb3 avfilter/internal: Factor out executing a filter's execute_func
The current way of doing it involves writing the ctx parameter twice.

Reviewed-by: Nicolas George <george@nsup.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-08-15 21:33:25 +02:00
Andreas Rheinhardt
18ec426a86 avfilter/formats: Factor common function combinations out
Several combinations of functions happen quite often in query_format
functions; e.g. ff_set_common_formats(ctx, ff_make_format_list(sample_fmts))
is very common. This commit therefore adds functions that are equivalent
to commonly used function combinations in order to reduce code
duplication.

Reviewed-by: Nicolas George <george@nsup.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-08-13 17:36:22 +02:00
Andreas Rheinhardt
a04ad248a0 avfilter: Constify all AVFilters
This is possible now that the next-API is gone.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-04-27 11:48:05 -03:00
James Almer
3c77584be8 avfilter/vf_gblur: add missing arch check
Removed by mistake in 2b4da1cb8c where it
should have been replaced instead.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-17 15:45:40 -03:00
James Almer
2b4da1cb8c x86/vf_gblur: fix postscale_slice prologue
x86_32 ABI does not pass float arguments directly on xmm regs, and the Win64
ABI uses only the first four regs for this purpose.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-17 13:33:20 -03:00
Paul B Mahol
44cf3a2b16 avfilter/x86/vf_gblur: add postscale SIMD 2021-02-16 21:12:11 +01:00
Paul B Mahol
058db59e16 avfilter/vf_gblur: factor out postscale function 2021-02-16 21:12:11 +01:00
Paul B Mahol
34922dffca avfilter/vf_gblur: add float format support 2021-02-12 21:09:51 +01:00
Paul B Mahol
1b26f27026 avfilter/vf_gblur: add support for 12bit yuva formats 2019-11-18 17:26:59 +01:00
Paul B Mahol
1e35519fe0 avfilter/vf_gblur: fix undefined behaviour
Fixes #8292
2019-10-16 19:29:56 +02:00
Paul B Mahol
64a805883d avfilter/vf_gblur: fix heap-buffer overflow
Fixes #8282
2019-10-16 12:13:04 +02:00
Paul B Mahol
33e69806aa avfilter/vf_gblur: switch to ff_filter_process_command() 2019-10-14 11:40:17 +02:00
Paul B Mahol
da9337c911 avfilter/vf_gblur: add support for commands 2019-10-06 15:34:28 +02:00
Ruiling Song
83f9da7768 avfilter/vf_gblur: add x86 SIMD optimizations
The horizontal pass get ~2x performance with the patch
under single thread.

Tested overall performance using the command(avx2 enabled):
./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null
./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null
For single thread, the fps improves from 43 to 60, about 40%.
For multi-thread, the fps improves from 110 to 130, about 20%.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
2019-06-12 08:53:11 +08:00
Ruiling Song
06ba4783a0 lavfi/gblur: doing several columns at the same time
Instead of doing each column one by one, doing several columns
together gives about 30% better performance.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
2019-05-08 10:39:43 +08:00
Paul B Mahol
bd6c57d532 avfilter: add support for gray14 format 2018-09-09 19:10:44 +02:00
Paul B Mahol
bac508fec1 avfilter: add support for GRAY9 and GBRAP10 2017-08-07 13:11:09 +02:00
Paul B Mahol
27ebdcf079 avfilter: add GRAY10 and GRAY12 to some filters
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-04-10 18:13:02 +02:00
Michael Niedermayer
6294247730 avfilter/vf_gblur: Increase supported pixel count from 31bit to 32bit in filter_postscale()
Fixes CID1396252

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-27 22:16:37 +01:00
Paul B Mahol
443c9fab57 avfilter/vf_gblur: add sigmaV option, different vertical filtering
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2016-09-04 23:59:45 +02:00
Paul B Mahol
ee605aa730 avfilter: add gblur filter
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2016-09-04 15:33:05 +02:00