Commit Graph

13 Commits

Author SHA1 Message Date
Lynne
1978b143eb
checkasm: add av_tx FFT SIMD testing code
This sadly required making changes to the code itself,
due to the same context needing to be reused for both versions.
The lookup table had to be duplicated for both versions.
2021-04-24 17:19:17 +02:00
Lynne
ff71671d88
lavu/tx: add parity revtab generator version
This will be used for SIMD support.
2021-04-24 17:17:30 +02:00
Lynne
0072a42388
lavu/tx: add full-sized iMDCT transform flag 2021-04-24 17:17:27 +02:00
Lynne
89da62f2fc
lavu/tx: refactor power-of-two FFT
This commit refactors the power-of-two FFT, making it faster and
halving the size of all tables, making the code much smaller on
all systems.
This removes the big/small pass split, because on modern systems
the "big" pass is always faster, and even on older machines there
is no measurable speed difference.
2021-04-24 17:17:20 +02:00
Lynne
aa910a7ecd
lavu/tx: minor code style improvements and additional comments 2021-04-24 17:17:15 +02:00
Lynne
8e94b7cff0
lavu/tx: invert permutation lookups
out[lut[i]] = in[i] lookups were 4.04 times(!) slower than
out[i] = in[lut[i]] lookups for an out-of-place FFT of length 4096.

The permutes remain unchanged for anything but out-of-place monolithic
FFT, as those benefit quite a lot from the current order (it means
there's only 1 lookup necessary to add to an offset, rather than
a full gather).

The code was based around non-power-of-two FFTs, so this wasn't
benchmarked early on.
2021-02-27 04:21:05 +01:00
Lynne
5ca40d6d94
lavu/tx: support in-place FFT transforms
This commit adds support for in-place FFT transforms. Since our
internal transforms were all in-place anyway, this only changes
the permutation on the input.

Unfortunately, research papers were of no help here. All focused
on dry hardware implementations, where permutes are free, or on
software implementations where binary bloat is of no concern so
storing dozen times the transforms for each permutation and version
is not considered bad practice.
Still, for a pure C implementation, it's only around 28% slower
than the multi-megabyte FFTW3 in unaligned mode.

Unlike a closed permutation like with PFA, split-radix FFT bit-reversals
contain multiple NOPs, multiple simple swaps, and a few chained swaps,
so regular single-loop single-state permute loops were not possible.
Instead, we filter out parts of the input indices which are redundant.
This allows for a single branch, and with some clever AVX512 asm,
could possibly be SIMD'd without refactoring.

The inplace_idx array is guaranteed to never be larger than the
revtab array, and in practice only requires around log2(len) entries.

The power-of-two MDCTs can be done in-place as well. And it's
possible to eliminate a copy in the compound MDCTs too, however
it'll be slower than doing them out of place, and we'd need to dirty
the input array.
2021-02-21 17:05:16 +01:00
Lynne
06a8596825
lavu: support arbitrary-point FFTs and all even (i)MDCT transforms
This patch adds support for arbitrary-point FFTs and all even MDCT
transforms.
Odd MDCTs are not supported yet as they're based on the DCT-II and DCT-III
and they're very niche.

With this we can now write tests.
2021-01-13 17:34:13 +01:00
Lynne
91e1625db1
lavu/tx: clip when converting table values to fixed-point
INT32_MAX (2147483647) isn't exactly representable by a floating point
value, with the closest being 2147483648.0. So when rescaling a value
of 1.0, this could overflow when casting the 64-bit value returned from
lrintf() into 32 bits.
Unfortunately the properties of integer overflows don't match up well
with how a Fourier Transform operates. So clip the value before
casting to a 32-bit int.

Should be noted we don't have overflows with the table values we're
currently using. However, converting a Kaiser-Bessel window function
with a length of 256 and a parameter of 5.0 to fixed point did create
overflows. So this is more of insurance to save debugging time
in case something changes in the future.
The macro is only used during init, so it being a little slower is
not a problem.
2021-01-09 20:54:56 +01:00
Anton Khirnov
e15371061d lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump
They are not properly namespaced and not intended for public use.
2021-01-01 14:14:57 +01:00
Lynne
e1c84856bb lavu/tx: improve 3-point fixed precision
There's just no reason not to when its so easy (albeit messy) and its also
reducing the precision of all non-power-of-two transforms that use it.
2020-02-14 19:58:14 +00:00
Lynne
e8f054b095 lavu/tx: implement 32 bit fixed point FFT and MDCT
Required minimal changes to the code so made sense to implement.
FFT and MDCT tested, the output of both was properly rounded.
Fun fact: the non-power-of-two fixed-point FFT and MDCT are the fastest ever
non-power-of-two fixed-point FFT and MDCT written.
This can replace the power of two integer MDCTs in aac and ac3 if the
MIPS optimizations are ported across.
Unfortunately the ac3 encoder uses a 16-bit fixed point forward transform,
unlike the encoder which uses a 32bit inverse transform, so some modifications
might be required there.

The 3-point FFT is somewhat less accurate than it otherwise could be,
having minor rounding errors with bigger transforms. However, this
could be improved later, and the way its currently written is the way one
would write assembly for it.
Similar rounding errors can also be found throughout the power of two FFTs
as well, though those are more difficult to correct.
Despite this, the integer transforms are more than accurate enough.
2020-02-13 17:10:34 +00:00
Lynne
42e2319ba9 lavu/tx: add support for double precision FFT and MDCT
Simply moves and templates the actual transforms to support an
additional data type.
Unlike the float version, which is equal or better than libfftw3f,
double precision output is bit identical with libfftw3.
2019-08-02 01:19:52 +01:00