Commit Graph

36 Commits

Author SHA1 Message Date
Rémi Denis-Courmont c177108ae1 lavu/riscv: add <intmath.h> optimisations
This provides some micro-optimisations for signed integer clipping, and
support for bit weight with the Zbb extension.
2022-09-13 16:50:43 -03:00
James Almer 28d5a3a74a lavu: rename and move ff_parity to av_parity
av_popcount is not defined in intmath.h.

Reviewed-by: ubitux
Signed-off-by: James Almer <jamrial@gmail.com>
2016-01-07 20:04:24 -03:00
Clément Bœsch 2ce29d1765 lavu: add ff_parity() 2016-01-07 22:51:31 +01:00
Ganesh Ajjanagadde 0dd8a3d71e lavu/intmath: add faster clz support
This should be useful for the sofalizer filter.

Reviewed-by: Kieran Kunhya <kierank@ob-encoder.com>
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
2015-12-19 09:35:34 -08:00
Michael Niedermayer 00efaa7983 avutil/intmath: fix undefined behavior in ff_ctzll_c()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-22 14:10:42 +02:00
Matt Oliver b0bb1dc62d lavu/intmath.h: Move x86 only msvc/icl functions to x86 specific header.
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2015-10-19 13:40:51 +11:00
Ganesh Ajjanagadde 55d3e97970 avutil/intmath: use de Bruijn based ff_ctz
It has already been demonstrated that the de Bruijn method has benefits
over the current implementation: commit 971d12b7f9.
That commit implemented it for long long, this extends it to the int version.

Tested with FATE.

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2015-10-14 13:39:42 -04:00
Ronald S. Bultje 93866c2aa2 intmath: remove av_ctz.
It's a non-installed header and only used in one place (flacenc).
Since ff_ctz is static inline, it's fine to use that instead.
2015-10-11 18:03:10 -04:00
Michael Niedermayer 2a4d1a66e8 avutil/intmath: Change debruijn_ctz64 to use 8bit elements
This reduces the memory & cache need from 256 to 64 bytes
the code also seems faster with this change

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-11 04:21:01 +02:00
Ganesh Ajjanagadde 971d12b7f9 avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm
This uses Stein's binary GCD algorithm:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm
to get a roughly 4x speedup over Euclidean GCD on standard architectures
with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise.
At the moment, the compiler intrinsic is used on GCC and Clang due to
its easy availability.

Quick note regarding overflow: yes, subtractions on int64_t can, but the
llabs takes care of that. The llabs is also guaranteed to be safe, with
no annoying INT64_MIN business since INT64_MIN being a power of 2, is
shifted down before being sent to llabs.

The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On
GCC, this is provided by a built-in. On Microsoft, there is a
BitScanForward64 analog of BitScanForward that should work; but I can't confirm.
Apparently it is not available on 32 bit builds; so this may or may not
work correctly. On Intel, per the documentation there is only an
intrinsic for _bit_scan_forward and people have posted on forums
regarding _bit_scan_forward64, but often their documentation is
woeful. Again, I don't have it, so I can't test.

As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest
use a compiled version based on the De-Bruijn method of Leiserson et al:
http://supertech.csail.mit.edu/papers/debruijn.pdf.

Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell)
with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a
make fate.

aac-am00_88.err:
builtin:
714 decicycles in av_gcd,    4095 runs,      1 skips

de-bruijn:
1440 decicycles in av_gcd,    4096 runs,      0 skips

previous:
2889 decicycles in av_gcd,    4096 runs,      0 skips

Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-11 04:08:41 +02:00
Timothy Gu c5d9e9b354 doxygen: Remove lavu_internal group
There is no use in an internal group for a public API documentation.
2015-08-22 10:07:05 -07:00
James Almer 78347549a4 avutil/intmath: check for ICC before GCC
Intel compiler also defines __GNUC__, so the Intel specific intrinsics were not
really being used.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-18 19:59:34 -03:00
James Almer bc65abc8d7 libavutil: add x86 optimized av_popcount
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-25 19:58:00 -03:00
Michael Niedermayer f8607cfb0a avutil/intmath: Add () to protect the ff_log2() argument
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-17 00:20:49 +01:00
Matthew Oliver 2060f4cbba avutil/intmath: enable builtin intrinsics for icl and msvc.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-10-26 17:20:55 +01:00
Reimar Döffinger 1a558cec64 intmath.h: Remove duplicated ARM include.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2014-08-31 18:33:27 +02:00
Michael Niedermayer 7d26be63c2 Merge commit '5ff998a233d759d0de83ea6f95c383d03d25d88e'
* commit '5ff998a233d759d0de83ea6f95c383d03d25d88e':
  flacenc: use uint64_t for bit counts
  flacenc: remove wasted trailing 0 bits
  lavu: add av_ctz() for trailing zero bit count
  flacenc: use a separate buffer for byte-swapping for MD5 checksum on big-endian
  fate: aac: Place LATM tests and general AAC tests in different groups
  build: The A64 muxer depends on rawenc.o for ff_raw_write_packet()

Conflicts:
	doc/APIchanges
	libavutil/version.h
	tests/fate/aac.mak

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-05 22:51:20 +01:00
Justin Ruggles dfde8a34e5 lavu: add av_ctz() for trailing zero bit count 2012-11-05 15:32:29 -05:00
Michael Niedermayer aa760b1735 Merge commit '2d09b36c0379fcda8f984bc8ad8816c8326fd7bd'
* commit '2d09b36c0379fcda8f984bc8ad8816c8326fd7bd':
  doc/platform: Add info on shared builds with MSVC
  doc/platform: Move a caveat down to the notes section
  ARM: reinstate optimised intmath.h
  ffv1: update to ffv1 version 3

Conflicts:
	doc/platform.texi
	libavcodec/ffv1.c
	libavcodec/ffv1.h
	libavcodec/ffv1dec.c
	libavcodec/ffv1enc.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-21 16:13:55 +02:00
Michael Niedermayer dcbff35199 Merge commit 'd15c21e5fa3961f10026da1a3080a3aa3cf4cec9'
* commit 'd15c21e5fa3961f10026da1a3080a3aa3cf4cec9':
  avutil: Add a copy of ff_sqrt_tab back into avutil to restore ABI compatibility
  avutil: make some tables visible again
  avutil: remove inline av_log2 from public API
  celp_math: rename ff_log2 to ff_log2_q15

Conflicts:
	libavutil/libavutil.v

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-21 13:35:42 +02:00
Mans Rullgard ebe46b8063 ARM: reinstate optimised intmath.h
Use of the ARM optimised intmath.h was accidentally dropped in 9734b8b.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-10-20 17:26:37 +01:00
Mans Rullgard 8c0a3d5fe0 avutil: remove inline av_log2 from public API
This removes inline av_log2 and av_log2_16bit from the public API,
instead exporting them as regular functions.  In-tree code still
gets the inline and otherwise optimised variants.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-10-20 12:28:45 +01:00
Michael Niedermayer e335658370 Merge commit '9734b8ba56d05e970c353dfd5baafa43fdb08024'
* commit '9734b8ba56d05e970c353dfd5baafa43fdb08024':
  Move avutil tables only used in libavcodec to libavcodec.

Conflicts:
	libavcodec/mathtables.c
	libavutil/intmath.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-10-12 14:26:46 +02:00
Diego Biurrun 9734b8ba56 Move avutil tables only used in libavcodec to libavcodec. 2012-10-11 18:29:36 +02:00
Mans Rullgard 5b170c0bea x86: remove FASTDIV inline asm
GCC 4.3 and later do the right thing with the plain C code.  Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value.  At least
two gcc configurations miscompile the inline asm in some situations.

In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr.  On Intel i7 and later, this imul is faster 32-bit mul.  On
older Intel and all AMD, it is slightly slower.  On Atom it is much
slower.

Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible.  If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-22 14:29:10 +01:00
Michael Niedermayer 3699960690 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  build: x86: Only compile mpegvideo optimizations when necessary
  configure: Drop fastdiv option
  build: Make the E-AC-3 encoder select the AC-3 encoder
  fate: flac: Only run tests requiring samples when samples are available

Conflicts:
	configure

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-22 14:37:03 +02:00
Diego Biurrun 66baa45801 configure: Drop fastdiv option
There is no point in having the user disable any fastdiv macros.
Besides the condition implementation was broken and only disabled
the C implementation, but no platform specific assembly versions.
2012-08-22 01:02:18 +02:00
Michael Niedermayer 0b9a69f244 Merge remote-tracking branch 'qatar/master'
* qatar/master: (22 commits)
  aacdec: Fix PS in ADTS.
  avconv: Consistently use PIX_FMT_NONE.
  dsputil: use cpuflags in x86 emu_edge_core
  dsputil: use movups instead of movdqu in ff_emu_edge_core_sse()
  wma: initialize prev_block_len_bits, next_block_len_bits, and block_len_bits.
  mov: Remove some redundant and obsolete comments.
  Add libavutil/mathematics.h #includes for INFINITY
  doxy: structure libavformat groups
  doxy: introduce an empty structure in libavcodec
  doxy: provide a start page and document libavutil
  doxy: cleanup pixfmt.h
  regtest: split video encode/decode tests into individual targets
  ARM: add explicit .arch and .fpu directives to asm.S
  pthread: do not touch has_b_frames
  avconv: cleanup the transcoding loop in output_packet().
  avconv: split subtitle transcoding out of output_packet().
  avconv: split video transcoding out of output_packet().
  avconv: split audio transcoding out of output_packet().
  avconv: reindent.
  avconv: move streamcopy-only code out of decoding loop.
  ...

Conflicts:
	avconv.c
	libavcodec/aaccoder.c
	libavcodec/pthread.c
	libavcodec/version.h
	libavutil/audioconvert.h
	libavutil/avutil.h
	libavutil/mem.h
	tests/ref/vsynth1/dv
	tests/ref/vsynth1/mpeg2thread
	tests/ref/vsynth2/dv
	tests/ref/vsynth2/mpeg2thread

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-11-23 04:02:17 +01:00
Luca Barbato 757cd8d876 doxy: provide a start page and document libavutil
Introduce a basic layout, the subpages are currently left empty.

Split libavutil in multiple groups as example of the structure
2011-11-22 17:16:02 +01:00
Mans Rullgard 2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Måns Rullgård a955b59658 Remove macro duplication between common.h and intmath.h
Originally committed as revision 24086 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-07 17:27:43 +00:00
Måns Rullgård 2e874c7704 intmath: whitespace cosmetics
Originally committed as revision 24085 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-07 17:27:39 +00:00
Måns Rullgård b90b1b4c3c Fix build on configurations without fast av_log2()
This is a bit hackish.  I will try to think of something nicer, but
this will do for now.

Originally committed as revision 22366 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-09 01:19:28 +00:00
Måns Rullgård 94ca624fbc Move ff_sqrt() to libavutil/intmath.h
Originally committed as revision 22345 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-08 21:19:56 +00:00
Måns Rullgård 75fb5c24ed Move FASTDIV macro to intmath.h
Originally committed as revision 21335 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-19 23:25:36 +00:00
Måns Rullgård 544f5a922f Optimise av_log2 with clz when available
10% faster flac decoding on x86 and ARM.

Originally committed as revision 21217 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-14 19:58:12 +00:00