Commit Graph

38 Commits

Author SHA1 Message Date
Diego Biurrun
fd502f4f5f build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.

(Cherry-picked from libav commit 39e208f4d4)

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-21 17:00:29 -03:00
Paul B Mahol
49bbfb9d13 avfilter: add arbitrary audio FIR filter
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-05-09 20:47:52 +02:00
Muhammad Faiz
1e69ac9246 avfilter/avf_showcqt: cqt_calc optimization on x86
on x86_64:
        time    PSNR
plain   3.303   inf
SSE     1.649   107.087535
SSE3    1.632   107.087535
AVX     1.409   106.986771
FMA3    1.265   107.108437

on x86_32 (PSNR compared to x86_64 plain):
        time    PSNR
plain   7.225   103.951979
SSE     1.827   105.859282
SSE3    1.819   105.859282
AVX     1.533   105.997661
FMA3    1.384   105.885377

FMA4 test is not available

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
2016-06-08 16:09:43 +07:00
Ronald S. Bultje
5ce703a6bf vf_colorspace: x86-64 SIMD (SSE2) optimizations. 2016-04-12 16:42:48 -04:00
Thomas Mundt
5024a82e95 avfilter/vf_bwdif: add x86 SIMD
Signed-off-by: Thomas Mundt <loudmax@yahoo.de>
2016-03-13 10:06:21 +01:00
Paul B Mahol
5740dc27e1 avfilter/vf_w3fdif: add x86 SIMD
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-10 17:33:43 +02:00
Paul B Mahol
ac74e857a2 avfilter/vf_stereo3d: add x86 SIMD for anaglyph outputs
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-06 21:01:24 +02:00
Paul B Mahol
9762554dd0 avfilter/vf_blend: add x86 SIMD for some modes
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-03 21:26:17 +02:00
Paul B Mahol
160556c9ad avfilter/vf_maskedmerge: add SIMD for maskedmerge with 8 bit depth input
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-02 17:40:57 +02:00
James Darnley
bff7242608 avfilter/vf_removegrain: add x86 and x86_64 SSE2 functions
Speed of all modes increased by a factor between 7.4 and 19.8 largely depending
on whether bytes are unpacked into words.  Modes 2, 3, and 4 have been sped-up
by a factor of 43 (thanks quick sort!)

All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20,
21, and 22 are available on x86 due to the number of SIMD registers used.

With a contribution from James Almer <jamrial@gmail.com>
2015-07-14 23:50:50 +00:00
Ronald S. Bultje
ae4c9ddebc vf_psnr: sse2 optimizations for sum-squared-error.
The internal line accumulator for 16bit can overflow, so I changed that
from int to uint64_t in the C code. The matching assembly looks a little
weird but output looks correct.

(avx2 should be trivial to add later.)

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-14 17:57:14 +02:00
Ronald S. Bultje
dfc58584b4 vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.
Both are 2-2.5x faster than their C counterpart.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-07-14 05:07:07 +02:00
Arwa Arif
4c38e960d0 avfilter: Port mp=eq/eq2 to lavfi
Code adapted from James Darnley's port
Some fixes from Paul B Mahol <onemda@gmail.com>

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-01-26 00:14:04 +01:00
James Almer
da02ee127a x86/vf_pp7: port dctB_mmx to yasm
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-09 20:02:27 -03:00
Arwa Arif
a299cd5ab3 lavfi: port mp=pp7 to libavfilter
The only difference with mp=pp7 is that default mode is "medium", as stated
in the MPlayer docs, rather than "hard".

Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
2015-01-09 17:26:31 +01:00
James Almer
466e32bf25 x86/vf_fspp: port inline asm to yasm
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-26 15:39:51 -03:00
Arwa Arif
bdc4db0ee3 lavfi: port mp=fspp to a native libavfilter filter
Signed-off-by: Stefano Sabatini <stefasab@gmail.com>
2014-12-24 16:29:18 +01:00
Michael Niedermayer
fb3eb57369 avfilter/tinterlace: add Support for ff_lowpass_line_avx() & ff_lowpass_line_sse2()
Based-on: 2e1704059a by Kieran Kunhya

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-15 04:02:33 +01:00
Michael Niedermayer
6f373d75e8 Merge commit '2e1704059ae8625beda2ffde847ad22c5ba416dc'
* commit '2e1704059ae8625beda2ffde847ad22c5ba416dc':
  vf_interlace: Add SIMD for lowpass filter

Conflicts:
	libavfilter/vf_interlace.c
	libavfilter/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-15 02:39:49 +01:00
Kieran Kunhya
2e1704059a vf_interlace: Add SIMD for lowpass filter
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2014-11-15 00:35:31 +01:00
James Almer
864f9326fb x86/vf_noise: move asm code to a separate file
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-10-17 00:44:35 -03:00
skal
406a9ccffe avfilter/vf_idet: MMX/MMXEXT/SSE2 implementation of idet's filter_line()
integration by Neil Birkbeck, with help from Vitor Sessak.
core SSE2 loop by Skal (pascal.massimino@gmail.com)

Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-04 22:19:00 +02:00
Robert Krüger
4a38eeec38 Revert "Revert "vf_yadif: move x86 init code to x86/yadif.c""
This reverts commit 975110a85e.

Signed-off-by: Robert Krüger <krueger@lesspain.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-14 14:19:14 +01:00
Michael Niedermayer
975110a85e Revert "vf_yadif: move x86 init code to x86/yadif.c"
This reverts commit a87b17f328.
This reduces the amount of non LGPL code, making a relicensing to LGPL
easier

Conflicts:

	libavfilter/vf_yadif.c
	libavfilter/x86/yadif.c
	libavfilter/x86/yadif_template.c
	libavfilter/yadif.h

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-12-01 20:26:26 +01:00
Michael Niedermayer
1ea28ffc4d Merge commit '0e730494160d973400aed8d2addd1f58a0ec883e'
* commit '0e730494160d973400aed8d2addd1f58a0ec883e':
  avfilter: x86: Port gradfun filter optimizations to yasm

Conflicts:
	libavfilter/x86/vf_gradfun_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-24 10:35:39 +02:00
Daniel Kang
0e73049416 avfilter: x86: Port gradfun filter optimizations to yasm
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-10-23 14:50:27 +02:00
Paul B Mahol
9c774459a9 avfilter: port pullup filter from libmpcodecs
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2013-09-17 17:03:36 +00:00
Clément Bœsch
a2c547ffec lavfi: add spp filter. 2013-06-14 01:27:22 +02:00
James Darnley
0a5814c9ba yadif: x86 assembly for 9 to 14-bit samples
These smaller samples do not need to be unpacked to double words
allowing the code to process more pixels every iteration (still 2 in MMX
but 6 in SSE2).  It also avoids emulating the missing double word
instructions on older instruction sets.

Like with the previous code for 16-bit samples this has been tested on
an Athlon64 and a Core2Quad.

Athlon64:
1809275 decicycles in C,    32718 runs, 50 skips
 911675 decicycles in mmx,  32727 runs, 41 skips, 2.0x faster
 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster

Core2Quad:
 921363 decicycles in C,     32756 runs, 12 skips
 486537 decicycles in mmx,   32764 runs,  4 skips, 1.9x faster
 293296 decicycles in sse2,  32759 runs,  9 skips, 3.1x faster
 284910 decicycles in ssse3, 32759 runs,  9 skips, 3.2x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:32:54 +01:00
James Darnley
17e7b49501 yadif: x86 assembly for 16-bit samples
This is a fairly dumb copy of the assembly for 8-bit samples but it
works and produces identical output to the C version.  The options have
been tested on an Athlon64 and a Core2Quad.

Athlon64:
1810385 decicycles in C,    32726 runs, 42 skips
1080744 decicycles in mmx,  32744 runs, 24 skips, 1.7x faster
 818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster

Core2Quad:
 924025 decicycles in C,     32750 runs, 18 skips
 623995 decicycles in mmx,   32767 runs,  1 skips, 1.5x faster
 406223 decicycles in sse2,  32764 runs,  4 skips, 2.3x faster
 387842 decicycles in ssse3, 32767 runs,  1 skips, 2.4x faster
 307726 decicycles in sse4,  32763 runs,  5 skips, 3.0x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:32:34 +01:00
Diego Biurrun
e66240f22e avfilter: x86: consistent filenames for filter optimizations 2013-02-04 15:00:47 +01:00
Diego Biurrun
76d90125cd vf_hqdn3d: x86: Add proper arch optimization initialization 2013-02-01 13:11:45 +01:00
Daniel Kang
899157b308 yadif: Port inline assembly to yasm
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-09 18:41:02 +01:00
Justin Ruggles
f96f1e06a4 x86: af_volume: add SSE2-optimized s16 volume scaling 2012-12-05 11:23:37 -05:00
Diego Biurrun
f6c38c5f4e avfilter: call x86 init functions under if (ARCH_X86), not if (HAVE_MMX) 2012-10-12 19:58:51 +02:00
Loren Merritt
7a1944b907 vf_hqdn3d: x86 asm
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
2012-08-26 10:49:14 +00:00
Nolan L
d5f187fd33 Add gradfun filter, ported from MPlayer.
Patch by Nolan L nol888 <=> gmail >=< com.

See thread:
Subject: [FFmpeg-devel] [PATCH] Port gradfun to libavfilter (GCI)
Date: Mon, 29 Nov 2010 07:18:14 -0500

Originally committed as revision 25942 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-12-12 17:59:10 +00:00
Aurelien Jacobs
fa6f4ebc08 use a Makefile in x86 subdir
Originally committed as revision 25234 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-09-27 21:50:26 +00:00