Commit Graph

2534 Commits

Author SHA1 Message Date
Andreas Rheinhardt b189550137 lib*/version.h: Bump Versions after release/5.0 branch
This is done a second time for 5.0 because master was
merged into 5.0 so that it contains the recent DOVI additions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 14:29:06 +01:00
Andreas Rheinhardt c512be9a90 lib*/version.h: Bump Versions before release/5.0 branch
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 13:40:03 +01:00
Andreas Rheinhardt 20b0d24c2f Makefile: Redo duplicating object files in shared builds
In case of shared builds, some object files containing tables
are currently duplicated into other libraries: log2_tab.c,
golomb.c, reverse.c. The check for whether this is duplicated
is simply whether CONFIG_SHARED is true. Yet this is crude:
E.g. libavdevice includes reverse.c for shared builds, but only
needs it for the decklink input device, which given that decklink
is not enabled by default will be unused in most libavdevice.so.

This commit changes this by making it more explicit about what
to duplicate from other libraries. To do this, two new Makefile
variables were added: SHLIBOBJS and STLIBOBJS. SHLIBOBJS contains
the objects that are duplicated from other libraries in case of
shared builds; STLIBOBJS contains stuff that a library has to
provide for other libraries in case of static builds. These new
variables provide a way to enable/disable with a finer granularity
than just whether shared builds are enabled or not. E.g. lavd's
Makefile now contains: SHLIBOBJS-$(CONFIG_DECKLINK_INDEV) += reverse.o

Another example is provided by the golomb tables. These are provided
by lavc for static builds, even if one uses a build configuration
that makes only lavf use them. Therefore lavc's Makefile contains
STLIBOBJS-$(CONFIG_MXF_MUXER) += golomb.o, whereas lavf's Makefile
has a corresponding SHLIBOBJS-$(CONFIG_MXF_MUXER) += golomb_tab.o.
E.g. in case the MXF muxer is the only component needing these tables
only libavformat.so will contain them for shared builds; currently
libavcodec.so does so, too.
(There is currently a CONFIG_EXTRA group for golomb. But actually
one would need two groups (golomb_avcodec and golomb_avformat) in
order to know when and where to include these tables. Therefore
this commit uses a Makefile-based approach for this and stops
using these groups for the users in libavformat.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-01-04 05:01:04 +01:00
Michael Niedermayer 4be85c9331 lib*/version.h: Bump Versions after release/5.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-01-03 22:10:46 +01:00
Michael Niedermayer f3964a59e1 lib*/version.h: Bump Versions before release/5.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-01-03 22:08:31 +01:00
rcombs 3e00b9e395 swscale/x86/init: use isSemiPlanarYUV
Fixes P210/P410 cases introduced (and broken) in 88d804b7ff
2021-12-23 01:41:03 -06:00
rcombs 88d804b7ff swscale: add P210/P410/P216/P416 output 2021-12-22 18:38:40 -06:00
Alan Kelly eebe406c80 libswscale: Test AV_CPU_FLAG_SLOW_GATHER for hscale functions.
This is instead of EXTERNAL_AVX2_FAST so that the avx2 hscale functions
are only used where they are faster.
2021-12-21 17:44:53 -03:00
James Almer eab91c3e2e x86/scale_avx2: don't use $ for hex literals
Fixes compilation with AVX2 enabled yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 17:29:21 -03:00
Alan Kelly 9092e58c44 x86/scale_avx2: Change asm indent from 2 to 4 spaces.
Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 13:42:04 -03:00
Alan Kelly 86663963e6 x86/swscale: fix minor coding style issues
Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 13:16:04 -03:00
James Almer 76a3f961f8 x86/scale_avx2: add missing check for AVX2 assembler support
Should fix compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-16 09:41:56 -03:00
Alan Kelly f900a19fa9 libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.
Fixes so that fate under 64 bit Windows passes.

These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-12-15 20:04:59 -03:00
Andreas Rheinhardt 3be6fe9a56 swscale/yuv2rgb: Silence a set-but-unused-variable warning
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-12-03 16:10:51 +01:00
rcombs f0204de47d swscale: add P210/P410/P216/P416 input 2021-11-28 16:40:43 -06:00
Mark Reid 3f4ce004b8 swscale/input: clip rgbf32 values before lrintf
if the float pixel * 65535.0f > 2147483647.0f
lrintf may overfow and return negative values, depending on implementation.
nan and +/-inf values may also be implementation defined

clip the value first so lrintf always works.

values <     0.0f, -inf, nan = 0.0f
values > 65535.0f, +inf      = 65535.0f

old timings
 195960 decicycles in planar_rgbf32le_to_uv,       1 runs,      0 skips
 186120 decicycles in planar_rgbf32le_to_uv,       2 runs,      0 skips
 188645 decicycles in planar_rgbf32le_to_uv,       4 runs,      0 skips
 183625 decicycles in planar_rgbf32le_to_uv,       8 runs,      0 skips
 181157 decicycles in planar_rgbf32le_to_uv,      16 runs,      0 skips
 177533 decicycles in planar_rgbf32le_to_uv,      32 runs,      0 skips
 175689 decicycles in planar_rgbf32le_to_uv,      64 runs,      0 skips

 232960 decicycles in planar_rgbf32be_to_uv,       1 runs,      0 skips
 221380 decicycles in planar_rgbf32be_to_uv,       2 runs,      0 skips
 216640 decicycles in planar_rgbf32be_to_uv,       4 runs,      0 skips
 213505 decicycles in planar_rgbf32be_to_uv,       8 runs,      0 skips
 211558 decicycles in planar_rgbf32be_to_uv,      16 runs,      0 skips
 210596 decicycles in planar_rgbf32be_to_uv,      32 runs,      0 skips
 210202 decicycles in planar_rgbf32be_to_uv,      64 runs,      0 skips

 161680 decicycles in planar_rgbf32le_to_y,       1 runs,      0 skips
 153540 decicycles in planar_rgbf32le_to_y,       2 runs,      0 skips
 148255 decicycles in planar_rgbf32le_to_y,       4 runs,      0 skips
 140600 decicycles in planar_rgbf32le_to_y,       8 runs,      0 skips
 132935 decicycles in planar_rgbf32le_to_y,      16 runs,      0 skips
 128531 decicycles in planar_rgbf32le_to_y,      32 runs,      0 skips
 140933 decicycles in planar_rgbf32le_to_y,      64 runs,      0 skips

 190980 decicycles in planar_rgbf32be_to_y,       1 runs,      0 skips
 176080 decicycles in planar_rgbf32be_to_y,       2 runs,      0 skips
 167980 decicycles in planar_rgbf32be_to_y,       4 runs,      0 skips
 164685 decicycles in planar_rgbf32be_to_y,       8 runs,      0 skips
 162751 decicycles in planar_rgbf32be_to_y,      16 runs,      0 skips
 162404 decicycles in planar_rgbf32be_to_y,      32 runs,      0 skips
 167849 decicycles in planar_rgbf32be_to_y,      64 runs,      0 skips

new timings
 183320 decicycles in planar_rgbf32le_to_uv,       1 runs,      0 skips
 175700 decicycles in planar_rgbf32le_to_uv,       2 runs,      0 skips
 179570 decicycles in planar_rgbf32le_to_uv,       4 runs,      0 skips
 172932 decicycles in planar_rgbf32le_to_uv,       8 runs,      0 skips
 168707 decicycles in planar_rgbf32le_to_uv,      16 runs,      0 skips
 165224 decicycles in planar_rgbf32le_to_uv,      32 runs,      0 skips
 163423 decicycles in planar_rgbf32le_to_uv,      64 runs,      0 skips

 184940 decicycles in planar_rgbf32be_to_uv,       1 runs,      0 skips
 185150 decicycles in planar_rgbf32be_to_uv,       2 runs,      0 skips
 185790 decicycles in planar_rgbf32be_to_uv,       4 runs,      0 skips
 185472 decicycles in planar_rgbf32be_to_uv,       8 runs,      0 skips
 185277 decicycles in planar_rgbf32be_to_uv,      16 runs,      0 skips
 185813 decicycles in planar_rgbf32be_to_uv,      32 runs,      0 skips
 185332 decicycles in planar_rgbf32be_to_uv,      64 runs,      0 skips

 145400 decicycles in planar_rgbf32le_to_y,       1 runs,      0 skips
 145100 decicycles in planar_rgbf32le_to_y,       2 runs,      0 skips
 143490 decicycles in planar_rgbf32le_to_y,       4 runs,      0 skips
 136687 decicycles in planar_rgbf32le_to_y,       8 runs,      0 skips
 131271 decicycles in planar_rgbf32le_to_y,      16 runs,      0 skips
 128698 decicycles in planar_rgbf32le_to_y,      32 runs,      0 skips
 127170 decicycles in planar_rgbf32le_to_y,      64 runs,      0 skips

 156020 decicycles in planar_rgbf32be_to_y,       1 runs,      0 skips
 146990 decicycles in planar_rgbf32be_to_y,       2 runs,      0 skips
 142020 decicycles in planar_rgbf32be_to_y,       4 runs,      0 skips
 141052 decicycles in planar_rgbf32be_to_y,       8 runs,      0 skips
 138973 decicycles in planar_rgbf32be_to_y,      16 runs,      0 skips
 138027 decicycles in planar_rgbf32be_to_y,      32 runs,      0 skips
 143939 decicycles in planar_rgbf32be_to_y,      64 runs,      0 skips

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-11-15 16:50:10 -03:00
Mark Reid 74e49cc583 swscale/input: unify grayf32 funcs with rgbf32 funcs
This is ment to be a cosmetic change

old timings:
  42780 UNITS in grayf32le,       1 runs,      0 skips
  56720 UNITS in grayf32le,       2 runs,      0 skips
  67265 UNITS in grayf32le,       4 runs,      0 skips
  58082 UNITS in grayf32le,       8 runs,      0 skips
  63512 UNITS in grayf32le,      16 runs,      0 skips
  52720 UNITS in grayf32le,      32 runs,      0 skips
  46491 UNITS in grayf32le,      64 runs,      0 skips

  68500 UNITS in grayf32be,       1 runs,      0 skips
  66930 UNITS in grayf32be,       2 runs,      0 skips
  62305 UNITS in grayf32be,       4 runs,      0 skips
  55510 UNITS in grayf32be,       8 runs,      0 skips
  50216 UNITS in grayf32be,      16 runs,      0 skips
  44480 UNITS in grayf32be,      32 runs,      0 skips
  42394 UNITS in grayf32be,      64 runs,      0 skips

new timings:
  46660 UNITS in grayf32le,       1 runs,      0 skips
  51830 UNITS in grayf32le,       2 runs,      0 skips
  53390 UNITS in grayf32le,       4 runs,      0 skips
  50910 UNITS in grayf32le,       8 runs,      0 skips
  44968 UNITS in grayf32le,      16 runs,      0 skips
  40349 UNITS in grayf32le,      32 runs,      0 skips
  38330 UNITS in grayf32le,      64 runs,      0 skips

  39980 UNITS in grayf32be,       1 runs,      0 skips
  49630 UNITS in grayf32be,       2 runs,      0 skips
  53540 UNITS in grayf32be,       4 runs,      0 skips
  59767 UNITS in grayf32be,       8 runs,      0 skips
  51206 UNITS in grayf32be,      16 runs,      0 skips
  44743 UNITS in grayf32be,      32 runs,      0 skips
  41468 UNITS in grayf32be,      64 runs,      0 skips

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-11-14 17:12:13 +01:00
Soft Works 58dce6f010 swscale/swscale: check SWS_PRINT_INFO flag for printing alignment warnings
This makes output consistent with a similar warning just few
lines above where this flag is checked in the same way.

Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
2021-11-13 19:55:32 +01:00
Mark Reid d2379bd6a0 swscale/input: fix planar_rgb16_to_a for gbrap10be and gbrap12be formats
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-11-04 11:52:33 +01:00
Michael Niedermayer 8316b2a15f swscale/swscale: Improve *ColorspaceDetails() doxy
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Michael Niedermayer 5f3a160b42 swscale/utils: Improve return codes of sws_setColorspaceDetails()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Michael Niedermayer c7699f95bb swscale/utils: Set all threads to the same colorspace even on failure
Fixes: ./ffplay dav.y4m -vf "scale=hd1080:threads=4"
Found-by: Paul
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-24 16:54:36 +02:00
Wu Jianhua 2c734a8496 libswscale/x86/rgb2rgb: add shuffle_bytes avx2
Performance data(Less is better):
    shuffle_bytes_ssse3   3.64654
    shuffle_bytes_avx2    0.94288

Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
2021-10-15 10:59:20 +02:00
Michael Niedermayer f801207568 swscale/swscale: Pass slice location into unscaled code also for dst scaling
Fixes: alphablend=checkerboard

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-03 20:38:29 +02:00
Michael Niedermayer 06d6726588 swscale/alphablend: Fix slice handling
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-03 20:38:29 +02:00
Michael Niedermayer 9f40b5badb swscale/swscale_internal: Avoid unsigned for slice parameters
Mixing unsigned and signed often leads to unexpected arithmetic results.
Fixes: out of array write
Found-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-30 19:47:15 +02:00
Manuel Stoeckl 32329397e2 swscale: add input/output support for X2BGR10LE
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-26 16:26:10 +02:00
Manuel Stoeckl ca594df622 swscale/yuv2rgb: fix conversion to X2RGB10
This resolves a problem where conversions from YUV to X2RGB10LE
would produce color values a factor 4 too small, because an 8-bit
value was placed in a 10-bit channel.

Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-09-26 16:26:10 +02:00
Andreas Rheinhardt 1ea3650823 Replace all occurences of av_mallocz_array() by av_calloc()
They do the same.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-20 01:03:52 +02:00
Andreas Rheinhardt 044a7c08dc swscale/swscale: Disable x86-specific code for other arches
SSE2 is x86 specific, yet due to the call to av_get_cpu_flags()
compilers were unable to optimize the checks (and the call) away
on other arches.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
Andreas Rheinhardt f440c422b7 swscale/swscale: Fix races when using unaligned strides/data
In this case the current code tries to warn once; to do so, it uses
ordinary static ints to store whether the warning has already been
emitted. This is both a data race (and therefore undefined behaviour)
as well as a race condition, because it is really possible for multiple
threads to be the one thread to emit the warning. This is actually
common since the introduction of the new multithreaded scaling API.

This commit fixes this by using atomic integers for the state;
furthermore, these are not static anymore, but rather contained
in the user-facing SwsContext (i.e. the parent SwsContext in case
of slice-threading).

Given that these atomic variables are not intended for synchronization
at all (but only for atomicity, i.e. only to output the warning once),
the atomic operations use memory_order_relaxed.

This affected the nv12, nv21, yuv420, yuv420p10, yuv422, yuv422p10 and
yuv444 filter-overlay FATE-tests.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
Andreas Rheinhardt a1255a350d libswscale/options: Add parent_log_context_offset to AVClass
This allows to associate log messages from slice contexts to
the user-visible SwsContext.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-19 23:52:37 +02:00
James Almer 5fe648d04a libswscale/swscale: initialize all dst plane pointers in sws_receive_slice()
Fixes valgrind warnings about use of uninitialised values.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-09-07 09:44:58 -03:00
Anton Khirnov d6fdc78e91 sws: implement slice threading 2021-09-06 09:17:53 +02:00
Anton Khirnov 42cd64c182 sws: add a new scaling API 2021-09-06 09:16:52 +02:00
Andreas Rheinhardt 2c05ee092b avutil/internal, swresample/audioconvert: Remove cpu.h inclusions
These inclusions are not necessary, as cpu.h is already included
wherever it is needed (via direct inclusion or via the arch-specific
headers).
Also remove other unnecessary cpu.h inclusions from ordinary
non-headers.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-07-22 14:33:45 +02:00
Michael Niedermayer 7874d40f10 swscale/slice: Fix wrong return on error
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 15:21:37 +02:00
Michael Niedermayer fa1e158ef6 swscale/utils: Use full chroma interpolation for rgb4/8 and dither none
Dither none is only implemented in full chroma interpolation for these rgb formats
Its also a obscure choice (producing less nice images) that implementing it in the
other code-paths makes no sense

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00
Michael Niedermayer 7528532550 swscale/output: Implement dither none for yuv2rgb_write_full()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00
Michael Niedermayer 997f9cfc12 swscale/slice: Check slice for allocation failure
Fixes: null pointer dereference
Fixes: alloc_slice.mp4

Found-by: Rafael Dutra <rafael.dutra@cispa.de>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-09 12:29:03 +02:00
Anton Khirnov 37c0fe49b7 sws: move updating the palette higher up
It does not interact in any way with the code setting up the image
pointers/strides, so it should not be intermixed with it.
2021-07-03 16:13:40 +02:00
Anton Khirnov d6649d9a3b sws: move initializing dither_error higher up
It does not interact in any way with the code setting up the image
pointers/strides, so it should not be intermixed with it.
2021-07-03 16:13:10 +02:00
Anton Khirnov e188985598 sws: move the early return for zero-sized slices higher up
Place it right after the input parameter validation. There is no point
in performing any setup if the sws_scale() call won't do anything.
2021-07-03 16:09:43 +02:00
Anton Khirnov a91e6c927e sws: simplify setting sliceDir 2021-07-03 16:09:21 +02:00
Anton Khirnov ff753f41dd sws: merge handling frame start into a single block
Also, return an error code on failure rather than 0.
2021-07-03 16:09:07 +02:00
Anton Khirnov 1b11a324fe sws: make checking for the start of a new frame more explicit 2021-07-03 16:07:22 +02:00
Anton Khirnov 0fb014b7bb sws: reset sliceDir at the end of sws_scale()
Makes it more clear that resetting it does not interact with the scaling
code that it is currently intermixed with.
2021-07-03 16:05:39 +02:00
Anton Khirnov 1f80789bf7 sws: rename SwsContext.swscale to convert_unscaled
That function pointer is now used only for unscaled conversion.
2021-07-03 15:57:53 +02:00
Anton Khirnov fe490ec165 sws: separate the calls to scaled vs unscaled conversion
Call the scaler function directly rather than through a function
pointer. Drop the now-unused return value from ff_getSwsFunc() and
rename the function to reflect its new role.

This will be useful in the following commits, where it will become
important that the amount of output is different for scaled vs unscaled
case.
2021-07-03 15:57:13 +02:00
Anton Khirnov 0f8e0957d2 sws: do not reallocate scratch buffers for each slice 2021-07-03 15:56:16 +02:00
Anton Khirnov 2730639259 sws: group the parameters validity checks together
Also, fail with an error code rather than 0.
2021-07-03 15:31:18 +02:00
Anton Khirnov c05cab34a9 sws: initialize {src,dst}Stride2 consistently with {src,dst}2 2021-07-03 15:31:08 +02:00
Anton Khirnov d3d8e09640 sws: cosmetics
Reindent after previous commit, rewrap long lines.
2021-07-03 15:30:56 +02:00
Anton Khirnov f136493d03 sws: factor out cascaded scaling 2021-07-03 15:30:34 +02:00
Anton Khirnov a2254aedc9 sws: cosmetics
Reindent after previous commit, split long lines.
2021-07-03 15:30:20 +02:00
Anton Khirnov 44f12718bf sws: factor out gamma-correct scaling 2021-07-03 15:29:50 +02:00
Anton Khirnov e355af9be9 sws: return an error code on invalid parameters to sws_scale() 2021-07-03 15:29:35 +02:00
Anton Khirnov 21a4e48f88 sws: reindent after previous commit 2021-07-03 15:29:22 +02:00
Anton Khirnov 27acca1af0 sws: factor out updating the palette 2021-07-03 15:28:46 +02:00
Anton Khirnov f8c21ccbfc sws: remove unnecessary braces
There used to be more code inside them, but it was removed in
6de58b4903.
2021-07-03 15:28:36 +02:00
Peter Lundblad da0abbbb01 libswscale: Make sws_init_context thread safe.
Call ff_sws_rgb2rgb_init via ff_thread_once instead of checking one of the
variables it updates.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-07-01 23:49:41 +02:00
Limin Wang 43295ae6a9 swscale/swscale_unscaled: don't use the optimized bgr24toYV12 unscaled conversion when width%2
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2021-06-06 12:34:05 +08:00
Anton Khirnov 85ba17f36d Bump major versions of all libraries.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-27 11:48:05 -03:00
Andreas Rheinhardt ea2d9b7a2e libswscale: Remove unused deprecated functions, make used ones static
Deprecated in 3b905b9fe6.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2021-04-27 10:43:11 -03:00
Andreas Rheinhardt f3c197b129 Include attributes.h directly
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-19 14:34:10 +02:00
Alan Kelly 3ce8d09244 libswscale/x86/yuv2yuvX: Removes unrolling for mmx and mmxext
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-04-01 20:47:52 +02:00
Alan Kelly dc57762cb4 libswscale/x86/swscale: Only call ff_yuv2yuvX functions if the input size is > 0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-04-01 20:47:52 +02:00
Michael Niedermayer c361fa9e21 Bump minor versions after release branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-03-20 01:02:11 +01:00
Michael Niedermayer c67d2a2875 Bump Versions before release/4.4 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-03-20 01:01:12 +01:00
Andreas Rheinhardt c23a5523b5 swscale/x86/swscale: Remove unused ASM constants
The last user of g15Mask, r15Mask, g16Mask and r16Mask was disabled
in 77a416e8aa and finally removed in
36e8de07ed62609df45d064b56501e3084d25723; b15Mask and b16Mask were
apparently always unused (except for in_asm_used_var_warning_killer,
a function that only existed to make the compiler not optimize ASM
constants away).
w10 is unused since d604bab901, w02
since ef423a6618.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:47:54 +01:00
Andreas Rheinhardt aad597a93c swscale/x86/rgb2rgb: Remove unused ASM constants
mask24hh etc. are unused since f099fbf5f3,
mask32b and mask32r since 296609f859,
mask32g since b38d487466 and mask32 since
f8a138be52.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:45:17 +01:00
Andreas Rheinhardt 49db6e4b4e swscale/x86/yuv2rgb: Remove unused ASM constants
mmx_grnmask is unused since 531f97b0c3,
the other constants since e934194b6a.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2021-02-24 09:43:14 +01:00
Chip Kerchner e7f53d6ac9 lsws/ppc/yuv2rgb_altivec: Fix build in non-VSX environments
Add inline function for vec_xl if VSX is not supported. vec_xl intrinsic
is only available on POWER 7 or higher.

Fixes ticket #8750.

Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>
2021-02-22 23:19:21 -05:00
James Almer 1a555d3c60 swscale/x86/yuv2yuvX: use the movsxdifnidn helper macro
Simplifies code

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:43 -03:00
James Almer ebb48d85a0 swscale/x86/yuv2yuvX: use movq to load 8 bytes in all non-AVX2 functions
mova expands to movq on non-XMM functions

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:43 -03:00
James Almer d512ebbaed swscale/x86/yuv2yuvX: use the SPLATW helper macro
Simplifies code

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:43 -03:00
James Almer c00567647e swscale/x86/swscale: fix mix of inline and external function definitions
This includes removing pointless static function forward declarations.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-18 18:47:42 -03:00
James Almer c2bf1dcace swscale/x86/swscale: fix compilation with old yasm
Where AVX2 may not be supported.

Signed-off-by: James Almer <jamrial@gmail.com>
2021-02-17 21:09:36 -03:00
Alan Kelly 554c2bc708 swscale: move yuv2yuvX_sse3 to yasm, unrolls main loop
And other small optimizations for ~20% speedup.
2021-02-17 21:21:03 +01:00
Carl Eugen Hoyos 2687070d9b lsws/ppc/yuv2rgb: Fix transparency converting from yuv->rgb32.
Based on 68363b69 by Reimar Döffinger.

Fixes ticket #9077.
2021-01-24 17:17:29 +01:00
Anton Khirnov e15371061d lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump
They are not properly namespaced and not intended for public use.
2021-01-01 14:14:57 +01:00
Anton Khirnov c8c2dfbc37 lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h
That is a more appropriate place for it.
2021-01-01 14:11:01 +01:00
Jeremy Leconte 29cef1bcd6 libswscale: avoid UB nullptr-with-offset.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-12-24 15:27:56 +01:00
Andriy Gelman 1200264fc4 swscale/rgb2rgb_template: use shuffle macro on big-endian arches
Fixes fate-qtrle-32bit on big-endian.

The macro does a simple byte swap on uint8 array without any casts, so
it's valid on big-endian arches.

The mentioned test was failing because the byteswap function
shuffle_bytes_3210_c() is used in the pixel format conversion
(argb->bgra).

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andriy Gelman <andriy.gelman@gmail.com>
2020-12-12 23:07:22 -05:00
Carl Eugen Hoyos 46e362b765 lsws/x86/yuv2rgb: Fix compilation with mmxext or ssse3 disabled.
Fixes ticket #8986.
2020-11-14 15:37:57 +01:00
Marton Balint 993429cfb4 swscale/x86/yuv2rgb: fix crashes when loading alpha from unaligned buffers
Regression since fc6a5883d6 on SSSE3 enabled
CPUs.

Fixes ticket #8955.

Signed-off-by: Marton Balint <cus@passwd.hu>
2020-11-02 00:31:34 +01:00
Jan Ekström 7ea4bcff7b swscale/utils: override forced-zero formats back to full range
Fixes vf_scale outputting RGB AVFrames with limited range flagged
in case either input or output specifically sets the range.

This is the reverse of the logic utilized for RGB and PAL8 content
in sws_setColorspaceDetails.
2020-10-11 12:58:13 +03:00
Jan Ekström 3fe24fe232 swscale/utils: split range override check into its own function 2020-10-11 12:58:13 +03:00
Mark Reid a48adcd136 libswcale/input: use more accurate planer rgb16 yuv conversions
These conversion appears to be exhibiting the same rounding error as the rgbf32 formats where.
I seperated the rounding value from the 16 and 128 offsets, I think it makes it a little more clear.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-10-06 17:56:52 +02:00
Mark Reid 453004fde6 libswcale/input: use more accurate rgbf32 yuv conversions
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-10-02 14:59:52 +02:00
Mark Reid 6bf57c6a2a libswscale/tests: add floatimg_cmp test
changes since v1:
- made into fate test
- fixed c90 warnings
- tests more intermediate formats
- tested on BE mips too

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-10-02 14:59:52 +02:00
James Almer 621e2625e0 swscale/x86/output: add missing AVX2 support preprocessor wrappers
Fixes compilation with old yasm

Signed-off-by: James Almer <jamrial@gmail.com>
2020-08-20 15:14:56 -03:00
Paul B Mahol 9d58cdb4ba swscale: do not drop half of bits from 16bit bayer formats 2020-08-08 12:03:42 +02:00
Limin Wang 7c8ad72f1c swscale/yuv2rgb: cosmetics
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2020-07-25 10:20:42 +08:00
Fei Wang 8544783280 swscale/yuv2rgb: consider x2rgb10le on big endian hardware
This fixed FATE fail report by filter-pixfmts* for x2rgb10le on big
endian hardware.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-20 21:00:00 +02:00
Michael Niedermayer 663f024415 swscale/tests/swscale: use 1 for indicating erros
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-16 17:44:53 +02:00
Michael Niedermayer 24c575e0aa swscale/tests/swscale: Initialize res to a non random error code
Regression since: 3adffab073

-1 is consistent what other error paths return

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-14 22:05:02 +02:00
Michael Niedermayer ec27c1827c swscale/tests/swscale: Fix incorrect return code check
Regression since: 3adffab073

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-07-14 22:05:02 +02:00
James Almer ba3e771a42 x86/yuv2rgb: fix crashes when storing data on unaligned buffers
Regression since fc6a5883d6 on SSSE3 enabled
CPUs.

Fixes ticket #8747

Signed-off-by: James Almer <jamrial@gmail.com>
2020-07-14 14:06:04 -03:00
Lynne 3adffab073
swscale/tests: check return value of sws_scale 2020-07-09 10:33:19 +01:00
Lynne 3e098cca6e
aarch64/yuv2rgb_neon: fix return value
We return 0 for this particular architecture but should instead be
returning the number of lines.
Fixes users who check the return value matches what they expect.
2020-07-09 10:33:14 +01:00
Nelson Gomez 360be03b8a swscale: cosmetic fixes
Signed-off-by: Nelson Gomez <nelson.gomez@microsoft.com>
2020-06-14 16:34:07 +01:00
Nelson Gomez bc01337db4 swscale/x86/output: add AVX2 version of yuv2nv12cX
256 bits is just wide enough to fit all the operands needed to vectorize
the software implementation, but AVX2 is needed to for a couple of
instructions like cross-lane permutation.

Output is bit-for-bit identical to C.

Signed-off-by: Nelson Gomez <nelson.gomez@microsoft.com>
2020-06-14 16:34:07 +01:00
Nelson Gomez 7c39c3c1a6 swscale: make yuv2interleavedX more asm-friendly
Extracting information from SwsContext in assembly is difficult, and
rearranging SwsContext just for asm access didn't look good. These
functions only need a couple of fields from it anyway, so just make
them parameters in their own right.

Signed-off-by: Nelson Gomez <nelson.gomez@microsoft.com>
2020-06-14 16:34:07 +01:00
Limin Wang 67a07dc778 swscale/utils: return better error code from initFilter()
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2020-06-14 21:54:40 +08:00
Limin Wang 8efecc9063 swscale/utils: reindent
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2020-06-14 21:54:40 +08:00
Limin Wang a408d03ee6 swscale/utils: remove FF_ALLOC_ARRAY_OR_GOTO macros
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
2020-06-13 06:59:19 +08:00
Fei Wang c721b45014 swscale: Add swscale input/output support for X2RGB10LE
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2020-06-12 17:56:15 +01:00
Michael Niedermayer c5079bf3bc Bump minor versions after branching 4.3
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-06-08 22:49:04 +02:00
Michael Niedermayer 0a8a96c251 Bump minor versions to separate 4.3 from master
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-06-08 22:49:04 +02:00
Martin Storsjö e0604d508e swscale: aarch64: Add a NEON implementation of interleaveBytes
This allows speeding up format conversions from yuv420 to nv12.

                             Cortex A53      A72      A73
interleave_bytes_c:             86077.5  51433.0  66972.0
interleave_bytes_neon:          19701.7  23019.2  15859.2
interleave_bytes_aligned_c:     86603.0  52017.2  67484.2
interleave_bytes_aligned_neon:   9061.0   7623.0   6309.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 23:38:17 +03:00
Josh de Kock 70b14cc8d6 swscale: arm: fix NEON hscale init
The NEON hscale function only supports X8 filter sizes and should only
be selected when these are being used. At the moment filterAlign is
set to 8 but in the future when extra NEON assembly for specific sizes is
added they will need to have checks here too.

The immediate usecase for this change is making the hscale checkasm
test easier and without NEON specific edge-cases (x86 already has these
guards).

This applies the same fix from 718c8f9aa5
on the 32 bit arm version of the function, fixing fate-checkasm-sw_scale
there.

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 23:33:46 +03:00
Josh de Kock 718c8f9aa5 swscale: fix NEON hscale init
The NEON hscale function only supports X8 filter sizes and should only
be selected when these are being used. At the moment filterAlign is
set to 8 but in the future when extra NEON assembly for specific sizes is
added they will need to have checks here too.

The immediate usecase for this change is making the hscale checkasm
test easier and without NEON specific edge-cases (x86 already has these
guards).

Signed-off-by: Josh de Kock <josh@itanimul.li>
2020-05-15 10:29:30 +01:00
Mark Reid fabeef22d9 libswscale: fix for floating point formats, require full chroma
upon more floating point testing, looks like I missed adding this bit.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-12 01:00:28 +02:00
Mark Reid b4967fc71c libswscale: add output support for AV_PIX_FMT_GBRAPF32
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-05 20:06:58 +02:00
Mark Reid ba5d0515a6 libswscale: add input support AV_PIX_FMT_GBRAPF32
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-05 20:06:58 +02:00
Andreas Rheinhardt 2fae000994 swscale/vscale: Increase type strictness
libswscale/vscale.c makes extensive use of function pointers and in
doing so it converts these function pointers to and from a pointer to
void. Yet this is actually against the C standard:
C90 only guarantees that one can convert a pointer to any incomplete
type or object type to void* and back with the result comparing equal
to the original which makes pointers to void generic pointers to
incomplete or object type. Yet C90 lacks a generic function pointer
type.
C99 additionally guarantees that a pointer to a function of one type may
be converted to a pointer to a function of another type with the result
and the original comparing equal when converting back.
This makes any function pointer type a generic function pointer type.
Yet even this does not make pointers to void generic function pointers.

Both GCC and Clang emit warnings for this when in pedantic mode.

This commit fixes this by using a union that can hold one member of any
of the required function pointer types to store the function pointer.
This works even for C90.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
2020-04-27 23:34:31 +02:00
Martin Storsjö 9025d5c5ce swscale: aarch64: Don't clobber callee-saved registers v8-v15
Signed-off-by: Martin Storsjö <martin@martin.st>
2020-04-21 23:41:13 +03:00
Martin Storsjö 872790b1f9 swscale: aarch64: Avoid using the x18 register
The x18 is a reserved platform register on Darwin and Windows.

x8/w8 seems to be unused in this function though (and same about
x10 and x14), so there's really no reason to use x18 here - just change
the uses of x18/w18 into x8/w8 instead without any further rewrites.

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-04-20 00:09:34 +03:00
Michael Niedermayer be3c29e379 swscale/yuv2rgb: Fix vertical dither offset with slices
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-04-12 16:36:47 +02:00
Michael Niedermayer e057e83a4f swscale/output: Fix integer overflow in yuv2rgb_write_full() with out of range input
Fixes: signed integer overflow: 1169365504 + 981452800 cannot be represented in type 'int'
Fixes: ticket8293

Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-04-04 22:09:46 +02:00
Michael Niedermayer 49ba1879ad swscale/output: Fix integer overflow in alpha computation in yuv2gbrp16_full_X_c()
Fixes: signed integer overflow: 524280 * 4432 cannot be represented in type 'int'
Fixes: ticket8322

Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-04-04 22:09:46 +02:00
Ruiling Song 4700f7d6fc swscale/swscale: remove useless code
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-04-03 00:58:07 +02:00
Carl Eugen Hoyos 5f8c383452 lsws/input: Do not change transparency range.
Fixes ticket #8509.
2020-03-11 22:55:49 +01:00
Ting Fu 828f7db5d9 libswscale/x86/yuv2rgb: Fix Segmentation Fault when load unaligned data
Fixes ticket #8532

Signed-off-by: Ting Fu <ting.fu@intel.com>
2020-02-26 11:10:46 +01:00
Linjie Fu d2aa1fbfd4 swscale: Add swscale input support for Y210LE
Add swscale input support for Y210LE, output support and fate
test could be added later if there is requirement for software
CSC to this packed format.

Signed-off-by: Linjie Fu <linjie.fu@intel.com>
2020-02-24 00:09:51 +00:00
Ting Fu fc6a5883d6 libswscale/x86/yuv2rgb: add ssse3 version
Tested using this command:
/ffmpeg -pix_fmt yuv420p -s 1920*1080 -i ArashRawYuv420.yuv \
-vcodec rawvideo -s 1920*1080 -pix_fmt rgb24 -f null /dev/null

The fps increase from 389 to 640 on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz

Signed-off-by: Ting Fu <ting.fu@intel.com>
2020-02-10 15:08:33 +01:00
Gautam Ramakrishnan da399e2135 libswscale/utils.c: Fix bug #8255
Bug #8255 points out a double free error in libwscale/utils.c file.
The double free is because the pointer to cascaded_context of an
sw_context is not set to NULL after freeing it. When the sw_context
is later freed, sws_freeContext is called on the cascaded_context,
causing a double free.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-02-09 23:33:18 +01:00
Ting Fu e934194b6a libswscale/x86/yuv2rgb: Change inline assembly into nasm code
The original inline assembly and nasm code have the same fps when called by command.
NASM code almost has no impact on the perfromance.

Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-02-05 17:41:59 +01:00
Michael Niedermayer d48e510124 swscale/input: Fix several invalid shifts related to rgb2yuv constants
Fixes: Invalid shifts
Fixes: #8140
Fixes: #8146

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-22 21:50:49 +01:00
Michael Niedermayer 7b7f97532b swscale/output: Fix several invalid shifts in yuv2rgb_full_1_c_template()
Fixes: Invalid shifts
Fixes: #8320

Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-22 18:41:46 +01:00
Michael Niedermayer a6ca22c118 swscale/swscale: Fix several invalid shifts related to vChrDrop
Fixes: Invalid shifts
Fixes: #8166
Fixes: filter-crop_scale_vflip FATE-test

Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-22 18:41:46 +01:00
Carl Eugen Hoyos 96fab29e96 Silence "string-plus-int" warning shown by clang.
libswscale/utils.c:89:42: warning: adding 'unsigned long' to a string does not append to the string [-Wstring-plus-int]
2020-01-06 22:38:56 +01:00
Sebastian Pop c3a17ffff6 swscale/aarch64: use multiply accumulate and shift-right narrow
This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and
horizontal adds by using fused multiply adds. The patch also uses ld1r to load
one element and replicate it across all lanes of the vector. The patch also
improves the clipping code by removing the shift right instructions and
performing the shift with the shift-right narrow instructions.

I see 8% difference on an m6g instance with neoverse-n1 CPUs:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971
after:  t:0.012985 avg:0.013013 max:0.013996 min:0.012818

Tested with `make check` on aarch64-linux.

Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-01-04 20:59:31 +01:00
Zhao Zhili 1e3e547a5b swscale/utils: remove access of AV_PIX_FMT_NB
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-31 12:37:47 +01:00
Sebastian Pop bd83191271 swscale/aarch64: use multiply accumulate and increase vector factor to 4
This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
and bumps the vectorization factor from 2 to 4.
The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus:

$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
after:  t:0.032168 avg:0.032215 max:0.033081 min:0.032146

The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181
after:  t:0.014015 avg:0.014096 max:0.015018 min:0.013971

Tested with `make check` on aarch64-linux.

Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-17 23:41:47 +01:00
Limin Wang 8558c231fb swscale/swscale_unscaled: add AV_PIX_FMT_GBRAP10 for LE and BE conversion wrapper
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-10 16:09:14 +01:00
Ting Fu 039a0ebe6f libswscale/swscale_unscaled.c: remove redundant code
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-06 11:25:29 +01:00
Limin Wang a5e24be52a swscale/swscale_unscaled: fix gbrap10be md5 different on big endian system
You can reproduce it by below command:
./ffmpeg -f lavfi -i "testsrc=duration=1:rate=30" -vf format=gbrap10 -vcodec rawvideo \
    -pix_fmt gbrap10le -flags +bitexact -sws_flags +accurate_rnd+bitexact -fflags +bitexact  \
    -frames:v 1 -f nut md5:

little-endian:
f91e2edd8098276579c1929e5e160416
big-endian:
ba4d011dbbdc78ccbf6cc7d698630929

Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-11-01 14:43:16 +01:00
Michael Niedermayer d260621089 swscale/output: Avoid 64bit in Alpha in yuv2ya16_X_c_template()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-16 19:17:57 +02:00
Michael Niedermayer 3e6682931b swscale/output: Correct Alpha in yuv2ya16_X_c_template()
Untested, no testcase

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-16 19:17:57 +02:00
Michael Niedermayer 4f4ca675e5 swscale/output: Implement Luma computation from yuv2ya16_X_c_template() without 64bit
This also reverts 21838cad2f
The revert is in this commit to avoid 2 fate updates

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-16 19:17:57 +02:00
Daniel Kolesa e6625ca41f swscale: Fix AltiVec/VSX build with recent GCC
The argument to vec_splat_u16 must be a literal. By making the
function always inline and marking the arguments const, gcc can
turn those into literals, and avoid build errors like:

swscale_vsx.c:165:53: error: argument 1 must be a 5-bit signed literal

Fixes #7861.

Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Lauri Kasanen <cand@gmx.com>
2019-10-04 08:58:17 +03:00
Daniel Kolesa 1bdb47b734 swscale: Replace illegal vector keyword usage in altivec code
While this technically compiles in current ffmpeg, this is only
because ffmpeg is compiled in strict ISO C mode, which disables
the builtin 'vector' keyword for AltiVec/VSX. Instead this gets
replaced with a macro inside altivec.h, which defines vector to
be actually __vector, which accepts random types.

Normally, the vector keyword should be used only with plain
scalar non-typedef types, such as unsigned int. But we have the
vec_(s|u)(8|16|32) macros, which can be used in a portable manner,
in util_altivec.h in libavutil.

This is also consistent with other AltiVec/VSX code elsewhere in
the tree.

Fixes #7861.

Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Lauri Kasanen <cand@gmx.com>
2019-10-04 08:58:17 +03:00
Andreas Rheinhardt e2646e23be swscale/utils: Fix invalid left shifts of negative numbers
Affected the FATE-tests vsynth_lena-dv-411, vsynth1-dv-411,
vsynth2-dv-411 and hevc-paramchange-yuv420p.yuv420p10.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-28 17:24:32 +02:00
Andreas Rheinhardt 736c7c20e7 swscale/x86/swscale: Fix undefined left shifts of negative numbers
This affected many FATE-tests: The number of failing tests went down
from 663 to 344. (Both numbers exclude tests that failed because of
unaligned accesses in code that is inside #if HAVE_FAST_UNALIGNED.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-28 17:24:32 +02:00
Limin Wang cde1d70a39 swscale/swscale: cosmetics
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-27 10:58:30 +02:00
Paul B Mahol 21838cad2f swscale/output: fix signed integer overflow for ya16
Fixes #7666.
2019-09-26 15:56:47 +02:00
Limin Wang 29bde4b3b6 swscale/swscale: delete unwanted assignments
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-09 18:16:06 +02:00
Linjie Fu ef1342650f swscale/output: fix some code indentations
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-06 22:06:12 +02:00
Chip Kerchner 3a557c5d88 lsws/ppc/yuv2rgb_altivec: Replace vec_lvsl/vec_perm with vec_xl
gcc 6.x and 7.x generate wrong code for little endian machines
for the vec_lvsl/vec_perm instruction combos in some cases.
The bug was fixed in version 8.x
If these instructions are replaced with vec_xl, the problem goes
away for all versions of the compilers.

Fixes ticket #7124.
2019-08-13 02:21:24 +02:00
Michael Niedermayer 80bb65fafa Bump minor versions again on master to keep 4.2 versions separate from master
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-07-21 18:36:31 +02:00
Michael Niedermayer 22db337a40 Bump minor versions to separate 4.2 from master
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-07-21 18:36:18 +02:00
Michael Niedermayer 9d269301f0 swscale/tests/swscale: Lengthen pixfmt name buffer to 21 bytes
Some formats use longer names than 12.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-05-13 13:39:49 +02:00
Adam Richter b8ed493061 libswcale: Fix possible string overflow in test.
In libswcale/tests/swcale.c, the function fileTest() calls sscanf in
an argument of "%12s" on character srcStr[] and dstStr[], which are
only 12 bytes.  So, if the input string is 12 characters, a
terminating null byte can be written past the end of these arrays.

This bug was found by cppcheck.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-05-13 13:39:40 +02:00
Philip Langdale 4fa4f1d7a9 swscale: Add test for isSemiPlanarYUV to pixdesc_query
Lauri had asked me what the semi planar formats were and that reminded
me that we could add it to pixdesc_query so we know exactly what the
list is.
2019-05-12 07:51:02 -07:00
Philip Langdale cd48318035 swscale: Add support for NV24 and NV42
The implementation is pretty straight-forward. Most of the existing
NV12 codepaths work regardless of subsampling and are re-used as is.
Where necessary I wrote the slightly different NV24 versions.

Finally, the one thing that confused me for a long time was the
asm specific x86 path that did an explicit exclusion check for NV12.
I replaced that with a semi-planar check and also updated the
equivalent PPC code, which Lauri kindly checked.
2019-05-12 07:51:02 -07:00
Lauri Kasanen e25bddf5fc swscale/ppc: Shorten power8 tests via a var 2019-05-07 10:08:16 +03:00
Lauri Kasanen a2a16206aa swscale/ppc: VSX-optimize hScale16To*
./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw

./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p -nostats test.raw

32-bit mul, power8 only

2x speedup for hScale8To19_vsx (x86 SSE2 is 2.37):
  30896 UNITS in hscale,    8192 runs,      0 skips
  63956 UNITS in hscale,    8192 runs,      0 skips

2.06 for hScale16To15_vsx:
  30531 UNITS in hscale,    8192 runs,      0 skips
  63161 UNITS in hscale,    8192 runs,      0 skips
2019-05-07 10:08:16 +03:00
Lauri Kasanen 3437111f17 swscale/ppc: Indent 2019-05-07 10:08:16 +03:00
Lauri Kasanen 9456adc223 swscale/ppc: VSX-optimize hScale8To19
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw

2.26 speedup (x86 SSE2 is 2.32):
  23772 UNITS in hscale,    4096 runs,      0 skips
  53862 UNITS in hscale,    4096 runs,      0 skips
2019-05-07 10:08:16 +03:00
Lauri Kasanen d0e4d0429e swscale/ppc: VSX-optimize hscale_fast
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw

4.27 speedup for hyscale_fast:
  24796 UNITS in hyscale_fast,    4096 runs,      0 skips
   5797 UNITS in hyscale_fast,    4096 runs,      0 skips

4.48 speedup for hcscale_fast:
  19911 UNITS in hcscale_fast,    4095 runs,      1 skips
   4437 UNITS in hcscale_fast,    4096 runs,      0 skips
2019-04-30 14:41:28 +03:00
Lauri Kasanen ce92ee4b4f swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

~2x speedup:

rgb24
  24431 UNITS in yuv2packed2,   16384 runs,      0 skips
  13783 UNITS in yuv2packed2,   16383 runs,      1 skips
bgr24
  24396 UNITS in yuv2packed2,   16384 runs,      0 skips
  14059 UNITS in yuv2packed2,   16384 runs,      0 skips
rgba
  26815 UNITS in yuv2packed2,   16383 runs,      1 skips
  12797 UNITS in yuv2packed2,   16383 runs,      1 skips
bgra
  27060 UNITS in yuv2packed2,   16384 runs,      0 skips
  13138 UNITS in yuv2packed2,   16384 runs,      0 skips
argb
  26998 UNITS in yuv2packed2,   16384 runs,      0 skips
  12728 UNITS in yuv2packed2,   16381 runs,      3 skips
bgra
  26651 UNITS in yuv2packed2,   16384 runs,      0 skips
  13124 UNITS in yuv2packed2,   16384 runs,      0 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
2019-04-11 09:08:51 +03:00
Lauri Kasanen 8607e29fa3 swscale/ppc: VSX-optimize yuv2rgb_full_X
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -

32-bit mul, power8 only.

~6.4x speedup:

rgb24
 214278 UNITS in yuv2packedX,   16384 runs,      0 skips
  33249 UNITS in yuv2packedX,   16384 runs,      0 skips
bgr24
 214616 UNITS in yuv2packedX,   16384 runs,      0 skips
  33233 UNITS in yuv2packedX,   16384 runs,      0 skips
rgba
 214517 UNITS in yuv2packedX,   16384 runs,      0 skips
  33271 UNITS in yuv2packedX,   16384 runs,      0 skips
bgra
 214973 UNITS in yuv2packedX,   16384 runs,      0 skips
  33397 UNITS in yuv2packedX,   16384 runs,      0 skips
argb
 214613 UNITS in yuv2packedX,   16384 runs,      0 skips
  33310 UNITS in yuv2packedX,   16384 runs,      0 skips
bgra
 214637 UNITS in yuv2packedX,   16384 runs,      0 skips
  33330 UNITS in yuv2packedX,   16384 runs,      0 skips
2019-04-07 09:20:34 +03:00
Lauri Kasanen 3256e949be swscale/ppc: VSX-optimize yuv2rgb_full_2
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
            -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -

32-bit mul, power8 only.

~4x speedup:

rgb24
  52763 UNITS in yuv2packed2,   16384 runs,      0 skips
  13453 UNITS in yuv2packed2,   16384 runs,      0 skips
bgr24
  53144 UNITS in yuv2packed2,   16384 runs,      0 skips
  13616 UNITS in yuv2packed2,   16384 runs,      0 skips
rgba
  52796 UNITS in yuv2packed2,   16384 runs,      0 skips
  12904 UNITS in yuv2packed2,   16384 runs,      0 skips
bgra
  52732 UNITS in yuv2packed2,   16384 runs,      0 skips
  13262 UNITS in yuv2packed2,   16384 runs,      0 skips
argb
  52661 UNITS in yuv2packed2,   16384 runs,      0 skips
  12879 UNITS in yuv2packed2,   16384 runs,      0 skips
bgra
  52662 UNITS in yuv2packed2,   16384 runs,      0 skips
  12932 UNITS in yuv2packed2,   16384 runs,      0 skips
2019-04-07 09:20:33 +03:00
Lauri Kasanen 50e672bc54 swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

1.8-2.3x speedup:

rgb24
  18192 UNITS in yuv2packed1,   32767 runs,      1 skips
   9983 UNITS in yuv2packed1,   32760 runs,      8 skips
bgr24
  18665 UNITS in yuv2packed1,   32766 runs,      2 skips
   9925 UNITS in yuv2packed1,   32763 runs,      5 skips
rgba
  20239 UNITS in yuv2packed1,   32767 runs,      1 skips
   8794 UNITS in yuv2packed1,   32759 runs,      9 skips
bgra
  20354 UNITS in yuv2packed1,   32768 runs,      0 skips
   8770 UNITS in yuv2packed1,   32761 runs,      7 skips
argb
  20185 UNITS in yuv2packed1,   32768 runs,      0 skips
   8761 UNITS in yuv2packed1,   32761 runs,      7 skips
bgra
  20360 UNITS in yuv2packed1,   32766 runs,      2 skips
   8759 UNITS in yuv2packed1,   32764 runs,      4 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
2019-04-07 09:20:31 +03:00
Lauri Kasanen 7adce3e64c swscale/ppc: VSX-optimize yuv2422_X
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
          -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
          -cpuflags 0 -v error -

7.2x speedup:

yuyv422
 126354 UNITS in yuv2packedX,   16384 runs,      0 skips
  16383 UNITS in yuv2packedX,   16382 runs,      2 skips
yvyu422
 117669 UNITS in yuv2packedX,   16384 runs,      0 skips
  16271 UNITS in yuv2packedX,   16379 runs,      5 skips
uyvy422
 117310 UNITS in yuv2packedX,   16384 runs,      0 skips
  16226 UNITS in yuv2packedX,   16382 runs,      2 skips
2019-03-31 12:41:34 +03:00
Lauri Kasanen 9a2db4dc61 swscale/ppc: VSX-optimize yuv2422_2
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -

5.1x speedup:

yuyv422
  19339 UNITS in yuv2packed2,   16384 runs,      0 skips
   3718 UNITS in yuv2packed2,   16383 runs,      1 skips
yvyu422
  19438 UNITS in yuv2packed2,   16384 runs,      0 skips
   3800 UNITS in yuv2packed2,   16380 runs,      4 skips
uyvy422
  19128 UNITS in yuv2packed2,   16384 runs,      0 skips
   3721 UNITS in yuv2packed2,   16380 runs,      4 skips
2019-03-31 12:41:33 +03:00
Lauri Kasanen a6a31ca3d9 swscale/ppc: VSX-optimize yuv2422_1
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
            -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -

15.3x speedup:

yuyv422
  14513 UNITS in yuv2packed1,   32768 runs,      0 skips
    949 UNITS in yuv2packed1,   32767 runs,      1 skips
yvyu422
  14516 UNITS in yuv2packed1,   32767 runs,      1 skips
    943 UNITS in yuv2packed1,   32767 runs,      1 skips
uyvy422
  14530 UNITS in yuv2packed1,   32767 runs,      1 skips
    941 UNITS in yuv2packed1,   32766 runs,      2 skips
2019-03-31 12:41:32 +03:00
Michael Niedermayer 8865ae959b swscale/swscale_unscaled: Fix chroma slice height
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-03-28 22:47:32 +01:00
Dong, Jerry c47fada298 swscale/swscale_unscaled: fixed the issue that when width/height is not 2-multiple, transition of nv12 to u/v planes is not completed.
Signed-off-by: Dong, Jerry <jerry.dong@intel.com>
Signed-off-by: Decai Lin <decai.lin@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-03-28 20:28:43 +01:00
Lauri Kasanen 681957b88d swscale/ppc: VSX-optimize yuv2rgb_full
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

This uses 32-bit mul, so POWER8 only.

The following output formats get about 4.5x speedup:

rgb24
  39980 UNITS in yuv2packed1,   32768 runs,      0 skips
   8774 UNITS in yuv2packed1,   32768 runs,      0 skips
bgr24
  40069 UNITS in yuv2packed1,   32768 runs,      0 skips
   8772 UNITS in yuv2packed1,   32766 runs,      2 skips
rgba
  39759 UNITS in yuv2packed1,   32768 runs,      0 skips
   8681 UNITS in yuv2packed1,   32767 runs,      1 skips
bgra
  39729 UNITS in yuv2packed1,   32768 runs,      0 skips
   8696 UNITS in yuv2packed1,   32766 runs,      2 skips
argb
  39766 UNITS in yuv2packed1,   32768 runs,      0 skips
   8672 UNITS in yuv2packed1,   32766 runs,      2 skips
bgra
  39784 UNITS in yuv2packed1,   32768 runs,      0 skips
   8659 UNITS in yuv2packed1,   32767 runs,      1 skips
2019-03-27 09:05:08 +02:00
Lauri Kasanen 81a4719d8e swscale: Remove duplicated code
In this function, the exact same clamping happens both in the if and unconditionally.
2019-03-27 09:00:06 +02:00
Lauri Kasanen 6b5ea90eac swscale/ppc: Add av_unused to template vars only used in one includer 2019-03-20 10:21:55 +02:00
Lauri Kasanen ac3062f1a4 swscale/ppc: Clean up some mixed decl warnings 2019-03-20 10:21:53 +02:00
Lauri Kasanen 8522d219ce libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
-s 1920x1728 -f null -vframes 100 -v error -nostats -

9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
Fate passes, each format tested with an image to video conversion.

Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
of the 16-bit function. This includes the vec_mulo/mule functions too,
not just vmuluwm.

With TIMER_REPORT skips disabled:
yuv420p9le
  12412 UNITS in planarX,  131072 runs,      0 skips
  73136 UNITS in planarX,  131072 runs,      0 skips
yuv420p9be
  12481 UNITS in planarX,  131072 runs,      0 skips
  73410 UNITS in planarX,  131072 runs,      0 skips
yuv420p10le
  12322 UNITS in planarX,  131072 runs,      0 skips
  72546 UNITS in planarX,  131072 runs,      0 skips
yuv420p10be
  12291 UNITS in planarX,  131072 runs,      0 skips
  72935 UNITS in planarX,  131072 runs,      0 skips
yuv420p12le
  12316 UNITS in planarX,  131072 runs,      0 skips
  72708 UNITS in planarX,  131072 runs,      0 skips
yuv420p12be
  12319 UNITS in planarX,  131072 runs,      0 skips
  72577 UNITS in planarX,  131072 runs,      0 skips
yuv420p14le
  12259 UNITS in planarX,  131072 runs,      0 skips
  72516 UNITS in planarX,  131072 runs,      0 skips
yuv420p14be
  12440 UNITS in planarX,  131072 runs,      0 skips
  72962 UNITS in planarX,  131072 runs,      0 skips
yuv420p16le
  10548 UNITS in planarX,  131072 runs,      0 skips
  73429 UNITS in planarX,  131072 runs,      0 skips
yuv420p16be
  10634 UNITS in planarX,  131072 runs,      0 skips
 150959 UNITS in planarX,  131072 runs,      0 skips

Signed-off-by: Lauri Kasanen <cand@gmx.com>
2019-02-05 09:34:53 +02:00
Michael Niedermayer fe17f9b956 swscale/yuv2rgb: Return a more specific error code from ff_yuv2rgb_c_init_tables()
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-01-01 21:11:47 +01:00
Lauri Kasanen 8dd9df9ecd swscale/output: Altivec-optimize float yuv2plane1
This function wouldn't benefit from VSX instructions, so I put it
under altivec.

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt grayf32le \
-f null -vframes 100 -v error -nostats -

3743 UNITS in planar1,   65495 runs,     41 skips

-cpuflags 0

23511 UNITS in planar1,   65530 runs,      6 skips

grayf32be

4647 UNITS in planar1,   65449 runs,     87 skips

-cpuflags 0

28608 UNITS in planar1,   65530 runs,      6 skips

The native speedup is 6.28133, and the bswapping one 6.15623.
Fate passes, each format tested with an image to video conversion.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-26 20:28:58 +01:00
Lauri Kasanen b4c8c03b00 swscale/output: VSX-optimize 16-bit yuv2plane1
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16le \
-f null -vframes 100 -v error -nostats -

2120 UNITS in planar1,   65393 runs,    143 skips

-cpuflags 0

19157 UNITS in planar1,   65512 runs,     24 skips

9.03632 speedup, 16be similarly.

Fate passes, each format tested with an image to video conversion.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-14 19:09:11 +01:00
Lauri Kasanen 1046cba24b swscale/output: VSX-optimize nbps yuv2plane1
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p9le \
-f null -vframes 100 -v error -nostats -

Speedups:
yuv2plane1_9BE_vsx	11.2042
yuv2plane1_9LE_vsx	11.156
yuv2plane1_10BE_vsx	9.89428
yuv2plane1_10LE_vsx	10.3637
yuv2plane1_12BE_vsx	9.71923
yuv2plane1_12LE_vsx	11.0404
yuv2plane1_14BE_vsx	10.1763
yuv2plane1_14LE_vsx	11.2728

Fate passes, each format tested with an image to video conversion.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-12 01:56:57 +01:00
Lauri Kasanen 78c7ff7d25 swscale/ppc: Move VSX-using code to its own file
Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied).

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Tested-by: Michael Kostylev on BE
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-04 02:59:07 +01:00
Lauri Kasanen 46c5693ea3 swscale/output: Altivec-optimize yuv2plane1_8
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \
-f null -vframes 100 -v error -nostats -

1158 UNITS in planar1,   65528 runs,      8 skips

-cpuflags 0

19082 UNITS in planar1,   65533 runs,      3 skips

16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
takes as many cycles as the x86 SSE2 version, yikes it's fast.

Note that this function uses VSX instructions, but is not marked so.
This is because several existing functions also make that mistake.
I'll submit a patch moving them once this is reviewed.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-11-26 02:56:25 +01:00
Martin Vignali 86e6f0dbc7 swscale : add support for YUVA444P12 and YUVA422P12 2018-11-24 16:24:47 +01:00
Michael Niedermayer 517573a670 Bump minor version for master after 4.1 branchpoint
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-11-02 00:53:07 +01:00
Michael Niedermayer 780d5e30a0 Bump minor versions for branching 4.1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-11-02 00:15:32 +01:00
Martin Vignali 156120fcf8 swscale/swscale_unscaled : rename packed_16bpc_bswap
is used for packed and planar format
2018-10-24 21:21:20 +02:00
Martin Vignali 26bf4a4050 swscale/unscaled : add grayf32 le to be 2018-10-24 21:21:14 +02:00
Martin Vignali 3db33b446f swscale/utils : simplify unscaled initial test for float pixfmt 2018-10-24 21:21:10 +02:00
Martin Vignali db4771af81 swscale : add YA16 LE/BE output 2018-10-18 21:43:24 +02:00
Martin Vignali 658bbc0060 swscale/x86/rgb2rgb.asm : add Ivo Van Poorten name to the top of the file
suggested by Carl Eugen Hoyos
2018-10-18 21:43:19 +02:00
Martin Vignali 296609f859 swscale/x86/rgb2rgb : port shuffle 2103 mmxext to external asm and remove inline asm version 2018-10-13 14:12:41 +02:00
Martin Vignali 04afdbb560 swscale/x86/rgb2rgb : remove mmx version for shuffle2103 2018-10-13 14:12:36 +02:00
Paul B Mahol 931e7c050e swscale/swscale_unscaled: add gbrap -> packed rgb path 2018-09-09 22:58:26 +02:00
Martin Vignali bdd6754648 swscale/swscale : small cosmetic 2018-08-22 11:36:15 +02:00
Martin Vignali 3af1c4ea7d swscale : treat float input data as uint 16bpc
Currently float are converted to 16b uint in input part
using src depth (32 bits) in hScale16To19 and hScale16to15,
make an invalid shift for the data

So shift the value when using float input
like 16 bpc uint.
2018-08-22 11:36:09 +02:00
Sergey Lavrushkin 582bc5a348 libswscale: Adds conversions from/to float gray format.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-08-14 18:22:39 +02:00
Carl Eugen Hoyos 3a56ade1f3 lsws/rgb2rgb_template: Do not compile unneeded shuffle functions on big-endian.
Fixes the following warnings:
In file included from libswscale/rgb2rgb.c:128:0:
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3210_c' defined but not used
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3012_c' defined but not used
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_1230_c' defined but not used
2018-06-10 03:22:59 +02:00
Paul B Mahol b9dd058f7a swscale: add gray14 support
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2018-05-05 21:35:31 +02:00
Martin Vignali 07a566e7d6 swscale/swscale_unscaled : add X86_64 (SSE2 and AVX) for uyvyto422
and checkasm test
2018-04-22 19:15:32 +02:00
Michael Niedermayer 3c1ecb057d Bump minor versions after release/4.0 branching
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-04-16 12:35:12 +02:00