Commit Graph

45 Commits

Author SHA1 Message Date
Lynne bbe95f7353
x86: replace explicit REP_RETs with RETs
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)

x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.

The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.

In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2023-02-01 04:23:55 +01:00
Andreas Rheinhardt 2b94f23b06 swresample/x86/audio_convert: Remove obsolete MMX functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-14 01:28:29 +02:00
James Almer acdd672506 x86/audio_convert: fix clobbering of xmm registers
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-01 22:40:50 -03:00
James Almer f37a5dcb55 swresample/x86: add missing colon to labels
Silences warnings with Nasm

Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-26 02:51:13 -03:00
James Almer f7ed997a6d x86/swr: make pack_8ch functions work with compilers without aligned stack
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-15 13:57:37 -03:00
James Almer 59ac93f6af x86/swr: add SSE/AVX unpack_6ch functions
int32/float only

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-12 15:40:03 -03:00
James Almer 6abf00d615 x86/swr: load constants outside the loop in pack_6ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-11 01:11:46 -03:00
James Almer 975ff6a3c6 x86/swr: disable pack_8ch functions on msvc/icl x86_32
Until a proper fix is committed.

Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-31 16:38:33 -03:00
James Almer 5f14f9e984 x86/swr: add missing alignment check to pack_6ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-31 13:35:11 -03:00
James Almer 37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
James Almer edff061fb0 x86/swr: add ff_float_to_int32_a_avx2
13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips
8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-07 15:01:35 -03:00
James Almer b385c4c6a3 x86/swr: replace sse4 instructions in pack_6ch with sse ones
There's no benefit from using blendps here except on CPUs with AVX, where
it's faster than shufps according to Intel's documentation.
As such, rename the sse4 functions to sse/sse2 and use shufps instead.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-06 20:54:00 -03:00
Ronald S. Bultje ad75d2b590 x86: Fix compilation with nasm on PPC & OS/2
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 12:36:19 +02:00
Michael Niedermayer ca2818b881 swresample/x86/audio_convert: add emms to CONV
Might fix Ticket1874

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-18 02:26:36 +02:00
Michael Niedermayer 3174616f59 Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
  x86: include x86inc.asm in x86util.asm
  cng: Reindent some incorrectly indented lines
  cngdec: Allow flushing the decoder
  cngdec: Make the dbov variable have the right unit
  cngdec: Fix the memset size to cover the full array
  cngdec: Update the LPC coefficients after averaging the reflection coefficients
  configure: fix print_config() with broke awks

Conflicts:
	libavcodec/x86/ac3dsp.asm
	libavcodec/x86/dct32.asm
	libavcodec/x86/deinterlace.asm
	libavcodec/x86/dsputil.asm
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/fft.asm
	libavcodec/x86/fmtconvert.asm
	libavcodec/x86/h264_chromamc.asm
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264_deblock_10bit.asm
	libavcodec/x86/h264_idct.asm
	libavcodec/x86/h264_idct_10bit.asm
	libavcodec/x86/h264_intrapred.asm
	libavcodec/x86/h264_intrapred_10bit.asm
	libavcodec/x86/h264_weight.asm
	libavcodec/x86/vc1dsp.asm
	libavcodec/x86/vp3dsp.asm
	libavcodec/x86/vp56dsp.asm
	libavcodec/x86/vp8dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 13:43:33 +01:00
Carl Eugen Hoyos 52be5428c0 Add some missing _EXTERNAL suffixes to yasm source files. 2012-08-31 15:39:03 +02:00
Michael Niedermayer c88e60af76 swr/x86: 10l, missed some SSE2 instructions in code marked as SSE.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-05 15:28:10 +02:00
Michael Niedermayer a927641e7a libswresample-simd: Add ff_pack_6ch_float_to_int32_a_avx and ff_pack_6ch_float_to_int32_a_sse4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:56:18 +02:00
Michael Niedermayer ca986a06ad libswresample-simd: add ff_pack_6ch_int32_to_float_a_avx and ff_pack_6ch_int32_to_float_a_sse4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:53:30 +02:00
Michael Niedermayer c4047ad9e0 libswresample: make NOP_N macro less picky on its parameters
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:45:32 +02:00
Michael Niedermayer 57bc91c710 libswresample: Change FLOAT_TO_INT32_N to need 1 register less
same speed on sandy bridge

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:44:08 +02:00
Michael Niedermayer ecfdd125f1 libswresample-simd: rename 6ch pack to what it is
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:31:12 +02:00
Michael Niedermayer 429b964e25 libswresample-simd: make the converter registers parameters
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:30:13 +02:00
Michael Niedermayer b3915c4b70 libswresample: cosmetics
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 19:32:06 +02:00
Michael Niedermayer 24c0d1583c libswresample: unaligned AVX/SSE4 float and int32 6ch pack
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 19:31:59 +02:00
Justin Ruggles 6f67d9833b libswresample: Implement MMX, SSE4 and AVX 6ch float and int32 packing function.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 19:31:59 +02:00
Michael Niedermayer cbbc472467 swr-x86-simd: add ff_unpack_2ch_int16_to_int16/int32/float_a_ssse3
more than 10% faster (tested on sandybridge)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-06 19:39:52 +02:00
Michael Niedermayer 72ae583b7d swr-x86-simd: stereo unpack S16/S32/FLT-> S16/S32/FLT SSE/SSE2 (16 new SIMD functions)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-06 17:25:52 +02:00
Michael Niedermayer adfa53b91f swr-x86-SIMD: 3 instructions less for stereo planar->packed s32/flt->s16
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-06 17:25:52 +02:00
Michael Niedermayer 5f4e18cd16 swr: replace the remaining 2 audio convert SIMD macros by the new ones
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:59:57 +02:00
Michael Niedermayer df5ff103cd swr: fix internal asm labels
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:43:11 +02:00
Michael Niedermayer b6f4f0d9ef swr: fix PACK_2CH register count
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:42:52 +02:00
Michael Niedermayer aae3119643 swr: replace planar->planar/packed->packed FLT<->S16/S32 SIMD by new macros
this simplifies the code

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:41:39 +02:00
Michael Niedermayer 47055b8913 swr: implement stereo S16/S32/FLT->S16/S32/FLT planar->packed in SSE/SSE2
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 18:32:34 +02:00
Michael Niedermayer e8dd7928c8 swr: change simd len argument to be in samples instead of dst bytes.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 18:32:34 +02:00
Michael Niedermayer c1fe2db376 swr: add ff_int32_to_float_a_avx
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-03 15:58:51 +02:00
Michael Niedermayer 65722e7fc5 swr: int32_to_int16_mmx/sse
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-29 14:20:35 +02:00
Michael Niedermayer 73edb58c3c swr: float_to_int16_sse2()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-29 12:18:14 +02:00
Michael Niedermayer 5932938c9a swr: float_to_int32_sse2()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-29 11:37:32 +02:00
Michael Niedermayer b72a0f9c23 swr: add int16_to_float_sse2()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 19:07:30 +02:00
Michael Niedermayer 832c3b10d2 swr: add int32_to_float_sse2
could be done for sse/3dnow too if someone wants

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 17:06:11 +02:00
Michael Niedermayer 95057b1972 swr: int16->int32: use the old index negate trick to avoid 2 adds
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 17:06:11 +02:00
Michael Niedermayer 113738d6c2 swr: more correct cglobal parameters to int16->int32
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 17:06:11 +02:00
Michael Niedermayer fa5daaca0d swr: seperate functions for aligned & unaligned
If someone has an idea on how to do this cleaner, its welcome

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 13:15:44 +02:00
Michael Niedermayer bcc66ff0e4 swr: add int16_to_int32_mmx/sse
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 13:15:44 +02:00