ffmpeg

Commit Graph

Author	SHA1	Message	Date
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2023-02-01 04:23:55 +01:00
Andreas Rheinhardt	2b94f23b06	swresample/x86/audio_convert: Remove obsolete MMX functions x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-14 01:28:29 +02:00
James Almer	acdd672506	x86/audio_convert: fix clobbering of xmm registers Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-10-01 22:40:50 -03:00
James Almer	f37a5dcb55	swresample/x86: add missing colon to labels Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>	2015-07-26 02:51:13 -03:00
James Almer	f7ed997a6d	x86/swr: make pack_8ch functions work with compilers without aligned stack Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-15 13:57:37 -03:00
James Almer	59ac93f6af	x86/swr: add SSE/AVX unpack_6ch functions int32/float only Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-12 15:40:03 -03:00
James Almer	6abf00d615	x86/swr: load constants outside the loop in pack_6ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-11 01:11:46 -03:00
James Almer	975ff6a3c6	x86/swr: disable pack_8ch functions on msvc/icl x86_32 Until a proper fix is committed. Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-31 16:38:33 -03:00
James Almer	5f14f9e984	x86/swr: add missing alignment check to pack_6ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-31 13:35:11 -03:00
James Almer	37b35feb64	x86/swr: add SSE2/AVX pack_8ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-30 23:05:27 -03:00
James Almer	edff061fb0	x86/swr: add ff_float_to_int32_a_avx2 13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips 8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-11-07 15:01:35 -03:00
James Almer	b385c4c6a3	x86/swr: replace sse4 instructions in pack_6ch with sse ones There's no benefit from using blendps here except on CPUs with AVX, where it's faster than shufps according to Intel's documentation. As such, rename the sse4 functions to sse/sse2 and use shufps instead. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-11-06 20:54:00 -03:00
Ronald S. Bultje	ad75d2b590	x86: Fix compilation with nasm on PPC & OS/2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 12:36:19 +02:00
Michael Niedermayer	ca2818b881	swresample/x86/audio_convert: add emms to CONV Might fix Ticket1874 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-06-18 02:26:36 +02:00
Michael Niedermayer	3174616f59	Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73' * commit '6860b4081d046558c44b1b42f22022ea341a2a73': x86: include x86inc.asm in x86util.asm cng: Reindent some incorrectly indented lines cngdec: Allow flushing the decoder cngdec: Make the dbov variable have the right unit cngdec: Fix the memset size to cover the full array cngdec: Update the LPC coefficients after averaging the reflection coefficients configure: fix print_config() with broke awks Conflicts: libavcodec/x86/ac3dsp.asm libavcodec/x86/dct32.asm libavcodec/x86/deinterlace.asm libavcodec/x86/dsputil.asm libavcodec/x86/dsputilenc.asm libavcodec/x86/fft.asm libavcodec/x86/fmtconvert.asm libavcodec/x86/h264_chromamc.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264_deblock_10bit.asm libavcodec/x86/h264_idct.asm libavcodec/x86/h264_idct_10bit.asm libavcodec/x86/h264_intrapred.asm libavcodec/x86/h264_intrapred_10bit.asm libavcodec/x86/h264_weight.asm libavcodec/x86/vc1dsp.asm libavcodec/x86/vp3dsp.asm libavcodec/x86/vp56dsp.asm libavcodec/x86/vp8dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-10-31 13:43:33 +01:00
Carl Eugen Hoyos	52be5428c0	Add some missing _EXTERNAL suffixes to yasm source files.	2012-08-31 15:39:03 +02:00
Michael Niedermayer	c88e60af76	swr/x86: 10l, missed some SSE2 instructions in code marked as SSE. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-07-05 15:28:10 +02:00
Michael Niedermayer	a927641e7a	libswresample-simd: Add ff_pack_6ch_float_to_int32_a_avx and ff_pack_6ch_float_to_int32_a_sse4 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-13 20:56:18 +02:00
Michael Niedermayer	ca986a06ad	libswresample-simd: add ff_pack_6ch_int32_to_float_a_avx and ff_pack_6ch_int32_to_float_a_sse4 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-13 20:53:30 +02:00
Michael Niedermayer	c4047ad9e0	libswresample: make NOP_N macro less picky on its parameters Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-13 20:45:32 +02:00
Michael Niedermayer	57bc91c710	libswresample: Change FLOAT_TO_INT32_N to need 1 register less same speed on sandy bridge Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-13 20:44:08 +02:00
Michael Niedermayer	ecfdd125f1	libswresample-simd: rename 6ch pack to what it is Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-13 20:31:12 +02:00
Michael Niedermayer	429b964e25	libswresample-simd: make the converter registers parameters Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-13 20:30:13 +02:00
Michael Niedermayer	b3915c4b70	libswresample: cosmetics Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-13 19:32:06 +02:00
Michael Niedermayer	24c0d1583c	libswresample: unaligned AVX/SSE4 float and int32 6ch pack Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-13 19:31:59 +02:00
Justin Ruggles	6f67d9833b	libswresample: Implement MMX, SSE4 and AVX 6ch float and int32 packing function. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-13 19:31:59 +02:00
Michael Niedermayer	cbbc472467	swr-x86-simd: add ff_unpack_2ch_int16_to_int16/int32/float_a_ssse3 more than 10% faster (tested on sandybridge) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-06 19:39:52 +02:00
Michael Niedermayer	72ae583b7d	swr-x86-simd: stereo unpack S16/S32/FLT-> S16/S32/FLT SSE/SSE2 (16 new SIMD functions) Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-06 17:25:52 +02:00
Michael Niedermayer	adfa53b91f	swr-x86-SIMD: 3 instructions less for stereo planar->packed s32/flt->s16 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-06 17:25:52 +02:00
Michael Niedermayer	5f4e18cd16	swr: replace the remaining 2 audio convert SIMD macros by the new ones Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-05 19:59:57 +02:00
Michael Niedermayer	df5ff103cd	swr: fix internal asm labels Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-05 19:43:11 +02:00
Michael Niedermayer	b6f4f0d9ef	swr: fix PACK_2CH register count Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-05 19:42:52 +02:00
Michael Niedermayer	aae3119643	swr: replace planar->planar/packed->packed FLT<->S16/S32 SIMD by new macros this simplifies the code Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-05 19:41:39 +02:00
Michael Niedermayer	47055b8913	swr: implement stereo S16/S32/FLT->S16/S32/FLT planar->packed in SSE/SSE2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-05 18:32:34 +02:00
Michael Niedermayer	e8dd7928c8	swr: change simd len argument to be in samples instead of dst bytes. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-05 18:32:34 +02:00
Michael Niedermayer	c1fe2db376	swr: add ff_int32_to_float_a_avx Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-05-03 15:58:51 +02:00
Michael Niedermayer	65722e7fc5	swr: int32_to_int16_mmx/sse Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-04-29 14:20:35 +02:00
Michael Niedermayer	73edb58c3c	swr: float_to_int16_sse2() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-04-29 12:18:14 +02:00
Michael Niedermayer	5932938c9a	swr: float_to_int32_sse2() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-04-29 11:37:32 +02:00
Michael Niedermayer	b72a0f9c23	swr: add int16_to_float_sse2() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-04-28 19:07:30 +02:00
Michael Niedermayer	832c3b10d2	swr: add int32_to_float_sse2 could be done for sse/3dnow too if someone wants Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-04-28 17:06:11 +02:00
Michael Niedermayer	95057b1972	swr: int16->int32: use the old index negate trick to avoid 2 adds Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-04-28 17:06:11 +02:00
Michael Niedermayer	113738d6c2	swr: more correct cglobal parameters to int16->int32 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-04-28 17:06:11 +02:00
Michael Niedermayer	fa5daaca0d	swr: seperate functions for aligned & unaligned If someone has an idea on how to do this cleaner, its welcome Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-04-28 13:15:44 +02:00
Michael Niedermayer	bcc66ff0e4	swr: add int16_to_int32_mmx/sse Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-04-28 13:15:44 +02:00

45 Commits