Commit Graph

45 Commits

Author SHA1 Message Date
Lynne
bbe95f7353
x86: replace explicit REP_RETs with RETs
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)

x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.

The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.

In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2023-02-01 04:23:55 +01:00
Andreas Rheinhardt
2b94f23b06 swresample/x86/audio_convert: Remove obsolete MMX functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-14 01:28:29 +02:00
James Almer
acdd672506 x86/audio_convert: fix clobbering of xmm registers
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-01 22:40:50 -03:00
James Almer
f37a5dcb55 swresample/x86: add missing colon to labels
Silences warnings with Nasm

Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-26 02:51:13 -03:00
James Almer
f7ed997a6d x86/swr: make pack_8ch functions work with compilers without aligned stack
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-15 13:57:37 -03:00
James Almer
59ac93f6af x86/swr: add SSE/AVX unpack_6ch functions
int32/float only

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-12 15:40:03 -03:00
James Almer
6abf00d615 x86/swr: load constants outside the loop in pack_6ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-11 01:11:46 -03:00
James Almer
975ff6a3c6 x86/swr: disable pack_8ch functions on msvc/icl x86_32
Until a proper fix is committed.

Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-31 16:38:33 -03:00
James Almer
5f14f9e984 x86/swr: add missing alignment check to pack_6ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-31 13:35:11 -03:00
James Almer
37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
James Almer
edff061fb0 x86/swr: add ff_float_to_int32_a_avx2
13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips
8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-07 15:01:35 -03:00
James Almer
b385c4c6a3 x86/swr: replace sse4 instructions in pack_6ch with sse ones
There's no benefit from using blendps here except on CPUs with AVX, where
it's faster than shufps according to Intel's documentation.
As such, rename the sse4 functions to sse/sse2 and use shufps instead.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-06 20:54:00 -03:00
Ronald S. Bultje
ad75d2b590 x86: Fix compilation with nasm on PPC & OS/2
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 12:36:19 +02:00
Michael Niedermayer
ca2818b881 swresample/x86/audio_convert: add emms to CONV
Might fix Ticket1874

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-18 02:26:36 +02:00
Michael Niedermayer
3174616f59 Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
  x86: include x86inc.asm in x86util.asm
  cng: Reindent some incorrectly indented lines
  cngdec: Allow flushing the decoder
  cngdec: Make the dbov variable have the right unit
  cngdec: Fix the memset size to cover the full array
  cngdec: Update the LPC coefficients after averaging the reflection coefficients
  configure: fix print_config() with broke awks

Conflicts:
	libavcodec/x86/ac3dsp.asm
	libavcodec/x86/dct32.asm
	libavcodec/x86/deinterlace.asm
	libavcodec/x86/dsputil.asm
	libavcodec/x86/dsputilenc.asm
	libavcodec/x86/fft.asm
	libavcodec/x86/fmtconvert.asm
	libavcodec/x86/h264_chromamc.asm
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264_deblock_10bit.asm
	libavcodec/x86/h264_idct.asm
	libavcodec/x86/h264_idct_10bit.asm
	libavcodec/x86/h264_intrapred.asm
	libavcodec/x86/h264_intrapred_10bit.asm
	libavcodec/x86/h264_weight.asm
	libavcodec/x86/vc1dsp.asm
	libavcodec/x86/vp3dsp.asm
	libavcodec/x86/vp56dsp.asm
	libavcodec/x86/vp8dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-31 13:43:33 +01:00
Carl Eugen Hoyos
52be5428c0 Add some missing _EXTERNAL suffixes to yasm source files. 2012-08-31 15:39:03 +02:00
Michael Niedermayer
c88e60af76 swr/x86: 10l, missed some SSE2 instructions in code marked as SSE.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-05 15:28:10 +02:00
Michael Niedermayer
a927641e7a libswresample-simd: Add ff_pack_6ch_float_to_int32_a_avx and ff_pack_6ch_float_to_int32_a_sse4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:56:18 +02:00
Michael Niedermayer
ca986a06ad libswresample-simd: add ff_pack_6ch_int32_to_float_a_avx and ff_pack_6ch_int32_to_float_a_sse4
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:53:30 +02:00
Michael Niedermayer
c4047ad9e0 libswresample: make NOP_N macro less picky on its parameters
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:45:32 +02:00
Michael Niedermayer
57bc91c710 libswresample: Change FLOAT_TO_INT32_N to need 1 register less
same speed on sandy bridge

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:44:08 +02:00
Michael Niedermayer
ecfdd125f1 libswresample-simd: rename 6ch pack to what it is
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:31:12 +02:00
Michael Niedermayer
429b964e25 libswresample-simd: make the converter registers parameters
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 20:30:13 +02:00
Michael Niedermayer
b3915c4b70 libswresample: cosmetics
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 19:32:06 +02:00
Michael Niedermayer
24c0d1583c libswresample: unaligned AVX/SSE4 float and int32 6ch pack
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 19:31:59 +02:00
Justin Ruggles
6f67d9833b libswresample: Implement MMX, SSE4 and AVX 6ch float and int32 packing function.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-13 19:31:59 +02:00
Michael Niedermayer
cbbc472467 swr-x86-simd: add ff_unpack_2ch_int16_to_int16/int32/float_a_ssse3
more than 10% faster (tested on sandybridge)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-06 19:39:52 +02:00
Michael Niedermayer
72ae583b7d swr-x86-simd: stereo unpack S16/S32/FLT-> S16/S32/FLT SSE/SSE2 (16 new SIMD functions)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-06 17:25:52 +02:00
Michael Niedermayer
adfa53b91f swr-x86-SIMD: 3 instructions less for stereo planar->packed s32/flt->s16
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-06 17:25:52 +02:00
Michael Niedermayer
5f4e18cd16 swr: replace the remaining 2 audio convert SIMD macros by the new ones
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:59:57 +02:00
Michael Niedermayer
df5ff103cd swr: fix internal asm labels
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:43:11 +02:00
Michael Niedermayer
b6f4f0d9ef swr: fix PACK_2CH register count
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:42:52 +02:00
Michael Niedermayer
aae3119643 swr: replace planar->planar/packed->packed FLT<->S16/S32 SIMD by new macros
this simplifies the code

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 19:41:39 +02:00
Michael Niedermayer
47055b8913 swr: implement stereo S16/S32/FLT->S16/S32/FLT planar->packed in SSE/SSE2
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 18:32:34 +02:00
Michael Niedermayer
e8dd7928c8 swr: change simd len argument to be in samples instead of dst bytes.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-05 18:32:34 +02:00
Michael Niedermayer
c1fe2db376 swr: add ff_int32_to_float_a_avx
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-05-03 15:58:51 +02:00
Michael Niedermayer
65722e7fc5 swr: int32_to_int16_mmx/sse
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-29 14:20:35 +02:00
Michael Niedermayer
73edb58c3c swr: float_to_int16_sse2()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-29 12:18:14 +02:00
Michael Niedermayer
5932938c9a swr: float_to_int32_sse2()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-29 11:37:32 +02:00
Michael Niedermayer
b72a0f9c23 swr: add int16_to_float_sse2()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 19:07:30 +02:00
Michael Niedermayer
832c3b10d2 swr: add int32_to_float_sse2
could be done for sse/3dnow too if someone wants

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 17:06:11 +02:00
Michael Niedermayer
95057b1972 swr: int16->int32: use the old index negate trick to avoid 2 adds
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 17:06:11 +02:00
Michael Niedermayer
113738d6c2 swr: more correct cglobal parameters to int16->int32
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 17:06:11 +02:00
Michael Niedermayer
fa5daaca0d swr: seperate functions for aligned & unaligned
If someone has an idea on how to do this cleaner, its welcome

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 13:15:44 +02:00
Michael Niedermayer
bcc66ff0e4 swr: add int16_to_int32_mmx/sse
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-28 13:15:44 +02:00