Skip to content

Flaky JSON error in encoder tests #1330

@Dan-Flores

Description

@Dan-Flores

Context:

We have encoder tests that rely on ffprobe to compare our encoded media's metadata with ffmpeg's.

Problem:

These tests are flaky and sometimes fail on Windows + FFmpeg8 with the error: json.decoder.JSONDecodeError: Expecting ',' delimiter.

These tests normally pass after a retry, but currently they add noisy failures to PRs, so we should find and resolve the underlying cause, or switch to a more reliable output format such as ffprobe's default format, which we use in _get_video_metadata.

Example failures:

TestAudioEncoder::test_against_cli on Windows + FFmpeg8, for mp3 and flac:

Details
================================== FAILURES ===================================
__ TestAudioEncoder.test_against_cli[to_file-flac-32000-1-999999999-asset0] ___

self = <test.test_encoders.TestAudioEncoder object at 0x000001D7D3AB19A0>
asset = TestAudio(filename='nasa_13013.mp4.audio.mp3', default_stream_index=0, stream_infos={0: TestAudioStreamInfo(sample_rat...,  2.3536e-04,  2.7501e-04,  2.1080e-04,
         -2.1618e-05, -8.9567e-05, -4.4332e-04, -5.0099e-04, -2.7716e-04]])]})
bit_rate = 999999999, num_channels = 1, sample_rate = 32000, format = 'flac'
method = 'to_file'
tmp_path = WindowsPath('C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_against_cli_to_file_flac_38')
capfd = <_pytest.capture.CaptureFixture object at 0x000001D7D599EAD0>
with_ffmpeg_debug_logs = None

    @needs_ffmpeg_cli
    @pytest.mark.parametrize("asset", (NASA_AUDIO_MP3, SINE_MONO_S32))
    @pytest.mark.parametrize("bit_rate", (None, 0, 44_100, 999_999_999))
    @pytest.mark.parametrize("num_channels", (None, 1, 2))
    @pytest.mark.parametrize("sample_rate", (8_000, 32_000))
    @pytest.mark.parametrize(
        "format",
        [
            # TODO: https://github.com/pytorch/torchcodec/issues/837
            pytest.param(
                "mp3",
                marks=pytest.mark.skipif(
                    IS_WINDOWS and ffmpeg_major_version <= 5,
                    reason="Encoding mp3 on Windows is weirdly buggy",
                ),
            ),
            pytest.param(
                "wav",
                marks=pytest.mark.skipif(
                    ffmpeg_major_version == 4,
                    reason="Swresample with FFmpeg 4 doesn't work on wav files",
                ),
            ),
            "flac",
        ],
    )
    @pytest.mark.parametrize("method", ("to_file", "to_tensor", "to_file_like"))
    def test_against_cli(
        self,
        asset,
        bit_rate,
        num_channels,
        sample_rate,
        format,
        method,
        tmp_path,
        capfd,
        with_ffmpeg_debug_logs,
    ):
        # Encodes samples with our encoder and with the FFmpeg CLI, and checks
        # that both decoded outputs are equal
    
        encoded_by_ffmpeg = tmp_path / f"ffmpeg_output.{format}"
        subprocess.run(
            ["ffmpeg", "-i", str(asset.path)]
            + (["-b:a", f"{bit_rate}"] if bit_rate is not None else [])
            + (["-ac", f"{num_channels}"] if num_channels is not None else [])
            + ["-ar", f"{sample_rate}"]
            + [
                str(encoded_by_ffmpeg),
            ],
            capture_output=True,
            check=True,
        )
    
        encoder = AudioEncoder(self.decode(asset).data, sample_rate=asset.sample_rate)
        params = dict(
            bit_rate=bit_rate, num_channels=num_channels, sample_rate=sample_rate
        )
        if method == "to_file":
            encoded_by_us = tmp_path / f"output.{format}"
            encoder.to_file(dest=str(encoded_by_us), **params)
        elif method == "to_tensor":
            encoded_by_us = encoder.to_tensor(format=format, **params)
        elif method == "to_file_like":
            file_like = io.BytesIO()
            encoder.to_file_like(file_like, format=format, **params)
            encoded_by_us = file_like.getvalue()
        else:
            raise ValueError(f"Unknown method: {method}")
    
        captured = capfd.readouterr()
        if format == "wav":
            assert "Timestamps are unset in a packet" not in captured.err
        if format == "mp3":
            assert "Queue input is backward in time" not in captured.err
        if format in ("flac", "wav"):
            assert "Encoder did not produce proper pts" not in captured.err
        if format in ("flac", "mp3"):
            assert "Application provided invalid" not in captured.err
    
        assert_close = torch.testing.assert_close
        if sample_rate != asset.sample_rate:
            if platform.machine().lower() == "aarch64":
                rtol, atol = 0, 1e-2
            else:
                rtol, atol = 0, 1e-3
    
            if sys.platform == "darwin":
                assert_close = partial(assert_tensor_close_on_at_least, percentage=99)
        elif format == "wav":
            rtol, atol = 0, 1e-4
        elif format == "mp3" and asset is SINE_MONO_S32 and num_channels == 2:
            # Not sure why, this one needs slightly higher tol. With default
            # tolerances, the check fails on ~1% of the samples, so that's
            # probably fine. It might be that the FFmpeg CLI doesn't rely on
            # libswresample for converting channels?
            rtol, atol = 0, 1e-3
        else:
            rtol, atol = None, None
    
        if IS_WINDOWS_WITH_FFMPEG_LE_70 and format == "mp3":
            # We're getting a "Could not open input file" on Windows mp3 files when decoding.
            # TODO: https://github.com/pytorch/torchcodec/issues/837
            return
    
        samples_by_us = self.decode(encoded_by_us)
        samples_by_ffmpeg = self.decode(encoded_by_ffmpeg)
    
        assert_close(
            samples_by_us.data,
            samples_by_ffmpeg.data,
            rtol=rtol,
            atol=atol,
        )
        assert samples_by_us.pts_seconds == samples_by_ffmpeg.pts_seconds
        assert samples_by_us.duration_seconds == samples_by_ffmpeg.duration_seconds
        assert samples_by_us.sample_rate == samples_by_ffmpeg.sample_rate
    
        if method == "to_file":
>           validate_frames_properties(actual=encoded_by_us, expected=encoded_by_ffmpeg)

test\test_encoders.py:387: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test\test_encoders.py:64: in validate_frames_properties
    frames_actual, frames_expected = (
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
test\test_encoders.py:65: in <genexpr>
    json.loads(
C:\Users\runneradmin\miniconda3\envs\test\Lib\json\__init__.py:352: in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\Users\runneradmin\miniconda3\envs\test\Lib\json\decoder.py:345: in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <json.decoder.JSONDecoder object at 0x000001D7B88BF8C0>
s = '{\n    "frames": [\n        {\n            "pts": 0,\n            "pts_time": "0.000000",\n            "duration": 23...": "0.024000",\n            "sample_fmt": "s32",\n            "nb_samples": 768,\n            "channels": 1\n        }'
idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.
    
        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.
    
        """
        try:
>           obj, end = self.scan_once(s, idx)
                       ^^^^^^^^^^^^^^^^^^^^^^
E           json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1640 column 10 (char 44777)

C:\Users\runneradmin\miniconda3\envs\test\Lib\json\decoder.py:361: JSONDecodeError
---------------------------- Captured stderr call -----------------------------
[AVFormatContext @ 000001D7CFE9D4C0] Opening 'C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_against_cli_to_file_flac_38\output.flac' for reading
[file @ 000001D7D30FBB00] Setting default whitelist 'file,crypto,data'
[flac @ 000001D7CFE9D4C0] Format flac probed with size=2048 and score=100
[flac @ 000001D7CFE9D4C0] Before avformat_find_stream_info() pos: 8286 bytes read:32768 seeks:0 nb_streams:1
[flac @ 000001D7CFE9D4C0] All info found
[flac @ 000001D7CFE9D4C0] After avformat_find_stream_info() pos: 28766 bytes read:32768 seeks:0 frames:1
[SWR @ 000001D7D338CA80] Using fltp internally between filters
[flac @ 000001D7CFE9D4C0] first_dts 0 not matching first dts 396288 (pts 396288, duration 2304) in the queue
[AVIOContext @ 000001D7CB58BB40] Statistics: 489375 bytes read, 0 seeks
[AVFormatContext @ 000001D7CFE9DCC0] Opening 'C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_against_cli_to_file_flac_38\ffmpeg_output.flac' for reading
[file @ 000001D7D30FC340] Setting default whitelist 'file,crypto,data'
[flac @ 000001D7CFE9DCC0] Format flac probed with size=2048 and score=100
[flac @ 000001D7CFE9DCC0] Before avformat_find_stream_info() pos: 8365 bytes read:32768 seeks:0 nb_streams:1
[flac @ 000001D7CFE9DCC0] All info found
[flac @ 000001D7CFE9DCC0] After avformat_find_stream_info() pos: 28845 bytes read:32768 seeks:0 frames:1
[SWR @ 000001D7D338CA80] Using fltp internally between filters
[flac @ 000001D7CFE9DCC0] first_dts 0 not matching first dts 396288 (pts 396288, duration 2304) in the queue
[AVIOContext @ 000001D7CB58F7C0] Statistics: 489454 bytes read, 0 seeks
=========================== short test summary info ===========================
FAILED test/test_encoders.py::TestAudioEncoder::test_against_cli[to_file-flac-32000-1-999999999-asset0] - json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1640 column 10 (char 44777)
=========== 1 failed, 1232 passed, 595 skipped in 547.98s (0:09:07) ===========

TestVideoEncoder::test_video_encoder_against_ffmpeg_cli on Windows + FFmpeg8 for flv, probably others.

Details
================================== FAILURES ===================================
_ TestVideoEncoder.test_video_encoder_against_ffmpeg_cli[30-to_file-encode_params2-flv] _

self = <test.test_encoders.TestVideoEncoder object at 0x000001D382E05950>
tmp_path = WindowsPath('C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_video_encoder_against_ffm14')
format = 'flv'
encode_params = {'crf': None, 'pixel_format': 'yuv420p', 'preset': 'ultrafast'}
method = 'to_file', frame_rate = 30

    @needs_ffmpeg_cli
    @pytest.mark.parametrize(
        "format",
        (
            "mov",
            "mp4",
            "avi",
            "mkv",
            "flv",
            pytest.param(
                "webm",
                marks=[
                    pytest.mark.slow,
                    pytest.mark.skipif(
                        ffmpeg_major_version == 4
                        or (IS_WINDOWS and ffmpeg_major_version >= 6),
                        reason="Codec for webm is not available in this FFmpeg installation.",
                    ),
                ],
            ),
        ),
    )
    @pytest.mark.parametrize(
        "encode_params",
        [
            {"pixel_format": "yuv444p", "crf": 0, "preset": None},
            {"pixel_format": "yuv420p", "crf": 30, "preset": None},
            {"pixel_format": "yuv420p", "crf": None, "preset": "ultrafast"},
            {"pixel_format": "yuv420p", "crf": None, "preset": None},
        ],
    )
    @pytest.mark.parametrize("method", ("to_file", "to_tensor", "to_file_like"))
    @pytest.mark.parametrize("frame_rate", [30, 29.97])
    def test_video_encoder_against_ffmpeg_cli(
        self, tmp_path, format, encode_params, method, frame_rate
    ):
        pixel_format = encode_params["pixel_format"]
        crf = encode_params["crf"]
        preset = encode_params["preset"]
    
        if format in ("avi", "flv") and pixel_format == "yuv444p":
            pytest.skip(f"Default codec for {format} does not support {pixel_format}")
    
        source_frames = self.decode(TEST_SRC_2_720P.path)
    
        # Encode with FFmpeg CLI
        temp_raw_path = str(tmp_path / "temp_input.raw")
        with open(temp_raw_path, "wb") as f:
            f.write(source_frames.permute(0, 2, 3, 1).cpu().numpy().tobytes())
    
        ffmpeg_encoded_path = str(tmp_path / f"ffmpeg_output.{format}")
        # Some codecs (ex. MPEG4) do not support CRF or preset.
        # Flags not supported by the selected codec will be ignored.
        ffmpeg_cmd = [
            "ffmpeg",
            "-y",
            "-f",
            "rawvideo",
            "-pix_fmt",
            "rgb24",  # Input format
            "-s",
            f"{source_frames.shape[3]}x{source_frames.shape[2]}",
            "-r",
            str(frame_rate),
            "-i",
            temp_raw_path,
        ]
        if pixel_format is not None:  # Output format
            ffmpeg_cmd.extend(["-pix_fmt", pixel_format])
        if preset is not None:
            ffmpeg_cmd.extend(["-preset", preset])
        if crf is not None:
            ffmpeg_cmd.extend(["-crf", str(crf)])
        # Output path must be last
        ffmpeg_cmd.append(ffmpeg_encoded_path)
        subprocess.run(ffmpeg_cmd, check=True)
        ffmpeg_frames = self.decode(ffmpeg_encoded_path).data
    
        # Encode with our video encoder
        encoder = VideoEncoder(frames=source_frames, frame_rate=frame_rate)
        encoder_output_path = str(tmp_path / f"encoder_output.{format}")
    
        if method == "to_file":
            encoder.to_file(
                dest=encoder_output_path,
                pixel_format=pixel_format,
                crf=crf,
                preset=preset,
            )
            encoder_frames = self.decode(encoder_output_path)
        elif method == "to_tensor":
            encoded_output = encoder.to_tensor(
                format=format,
                pixel_format=pixel_format,
                crf=crf,
                preset=preset,
            )
            encoder_frames = self.decode(encoded_output)
        elif method == "to_file_like":
            file_like = io.BytesIO()
            encoder.to_file_like(
                file_like=file_like,
                format=format,
                pixel_format=pixel_format,
                crf=crf,
                preset=preset,
            )
            encoder_frames = self.decode(file_like.getvalue())
        else:
            raise ValueError(f"Unknown method: {method}")
    
        assert ffmpeg_frames.shape[0] == encoder_frames.shape[0]
    
        # MPEG codec used for avi format does not accept CRF
        percentage = 94 if format == "avi" else 99
    
        # Check that PSNR between both encoded versions is high
        for ff_frame, enc_frame in zip(ffmpeg_frames, encoder_frames):
            res = psnr(ff_frame, enc_frame)
            assert res > 30
            assert_tensor_close_on_at_least(
                ff_frame, enc_frame, percentage=percentage, atol=2
            )
    
        # Only compare video metadata on ffmpeg versions >= 6, as older versions
        # are often missing metadata
        if ffmpeg_major_version >= 6 and method == "to_file":
            fields = [
                "duration",
                "duration_ts",
                "r_frame_rate",
                "time_base",
                "nb_frames",
            ]
            ffmpeg_metadata = self._get_video_metadata(
                ffmpeg_encoded_path,
                fields=fields,
            )
            encoder_metadata = self._get_video_metadata(
                encoder_output_path,
                fields=fields,
            )
            assert ffmpeg_metadata == encoder_metadata
    
            # Check that frame timestamps and duration are the same
            fields = ("pts", "pts_time")
            if format != "flv":
                fields += ("duration", "duration_time")
>           ffmpeg_frames_info = self._get_frames_info(
                ffmpeg_encoded_path, fields=fields
            )

test\test_encoders.py:1172: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test\test_encoders.py:669: in _get_frames_info
    frames = json.loads(result.stdout)["frames"]
             ^^^^^^^^^^^^^^^^^^^^^^^^^
C:\Users\runneradmin\miniconda3\envs\test\Lib\json\__init__.py:352: in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
C:\Users\runneradmin\miniconda3\envs\test\Lib\json\decoder.py:345: in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <json.decoder.JSONDecoder object at 0x000001D3E8A36E40>
s = '{\n    "frames": [\n        {\n            "pts": 0,\n            "pts_time": "0.000000"\n        },\n        {\n    ... "pts_time": "0.933000"\n        },\n        {\n            "pts": 967,\n            "pts_time": "0.967000"\n        }'
idx = 0

    def raw_decode(self, s, idx=0):
        """Decode a JSON document from ``s`` (a ``str`` beginning with
        a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.
    
        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.
    
        """
        try:
>           obj, end = self.scan_once(s, idx)
                       ^^^^^^^^^^^^^^^^^^^^^^
E           json.decoder.JSONDecodeError: Expecting ',' delimiter: line 122 column 10 (char 2412)

C:\Users\runneradmin\miniconda3\envs\test\Lib\json\decoder.py:361: JSONDecodeError
---------------------------- Captured stderr call -----------------------------
ffmpeg version 8.0.1 Copyright (c) 2000-2025 the FFmpeg developers

  built with clang version 22.1.0

  configuration: --prefix=/d/bld/ffmpeg_1773007679189/_h_env/Library --cc=clang.exe --cxx=clang++.exe --nm=llvm-nm --ar=llvm-ar --disable-doc --enable-openssl --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libharfbuzz --enable-libfontconfig --enable-libopenh264 --enable-libdav1d --ld=lld-link --target-os=win64 --enable-cross-compile --toolchain=msvc --host-cc=clang.exe --extra-libs=ucrt.lib --extra-libs=vcruntime.lib --extra-libs=oldnames.lib --strip=llvm-strip --disable-stripping --host-extralibs= --disable-libopenvino --enable-gpl --enable-libx264 --enable-libx265 --enable-libmp3lame --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libvorbis --enable-libopus --enable-librsvg --enable-libjxl --enable-libwebp --enable-ffplay --enable-vulkan --enable-libshaderc --pkg-config=/d/bld/ffmpeg_1773007679189/_build_env/Library/bin/pkg-config

  libavutil      60.  8.100 / 60.  8.100

  libavcodec     62. 11.100 / 62. 11.100

  libavformat    62.  3.100 / 62.  3.100

  libavdevice    62.  1.100 / 62.  1.100

  libavfilter    11.  4.100 / 11.  4.100

  libswscale      9.  1.100 /  9.  1.100

  libswresample   6.  1.100 /  6.  1.100

[rawvideo @ 00000230F0BD3FC0] Estimating duration from bitrate, this may be inaccurate

Input #0, rawvideo, from 'C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_video_encoder_against_ffm14\temp_input.raw':

  Duration: 00:00:01.00, start: 0.000000, bitrate: 663552 kb/s

  Stream #0:0: Video: rawvideo (RGB[24] / 0x18424752), rgb24, 1280x720, 663552 kb/s, 30 tbr, 30 tbn

[out#0/flv @ 00000230F0B97FC0] Codec AVOption preset (Encoding preset) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some decoder which was not actually used for any stream.

Stream mapping:

  Stream #0:0 -> #0:0 (rawvideo (native) -> flv1 (flv))

Press [q] to stop, [?] for help

Output #0, flv, to 'C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_video_encoder_against_ffm14\ffmpeg_output.flv':

  Metadata:

    encoder         : Lavf62.3.100

  Stream #0:0: Video: flv1 ([2][0][0][0] / 0x0002), yuv420p(tv, progressive), 1280x720, q=2-31, 200 kb/s, 30 fps, 1k tbn

    Metadata:

      encoder         : Lavc62.11.100 flv

    Side data:

      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A

[out#0/flv @ 00000230F0B97FC0] video:334KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.201926%

frame=   30 fps=0.0 q=31.0 Lsize=     335KiB time=00:00:01.00 bitrate=2743.2kbits/s speed=9.94x elapsed=0:00:00.10    

=========================== short test summary info ===========================
FAILED test/test_encoders.py::TestVideoEncoder::test_video_encoder_against_ffmpeg_cli[30-to_file-encode_params2-flv] - json.decoder.JSONDecodeError: Expecting ',' delimiter: line 122 column 10 (char 2412)
=========== 1 failed, 1217 passed, 596 skipped in 525.29s (0:08:45) ===========

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions