Skip to content

DXF goroutine leak after create index job is cancelled #64129

@tangenta

Description

@tangenta

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Enable tidb_enable_dist_task.

Run adding index job and cancel the ddl job when reorg timeout occurred.

2. What did you expect to see? (Required)

The DDL job should be rolled back and no DXF global task is running.

3. What did you see instead (Required)

The job is in "rollback done" state, but the global task & subtasks are still running.

mysql> admin show ddl jobs where job_id = 8501;
+--------+---------+------------+-----------+--------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------------+----------+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE  | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME                | START_TIME                 | END_TIME                   | STATE         | COMMENTS |
+--------+---------+------------+-----------+--------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------------+----------+
|   8501 | c66_dc  | orders_1   | add index | none         |      8497 |     8414 | 112374347 | 2025-10-21 14:01:32.171000 | 2025-10-21 14:01:32.221000 | 2025-10-21 14:07:59.871000 | rollback done |          |
+--------+---------+------------+-----------+--------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------------+----------+
1 row in set (0.05 sec)
[2025/10/23 20:15:05.176 +00:00] [WARN] [engine_mgr.go:64] ["build ingest engine failed"] [keyspaceName=SYSTEM] [task-id=1] [task-key=keyspace_a/ddl/backfill/8501] [subtaskID=2] [step=read-index] ["job ID"=8501] ["index ID"=8] [error="lock held by current process"] [errorVerbose="lock held by current process\n(1) attached stack trace\n  -- stack trace:\n  | github.com/cockroachdb/pebble/vfs.defaultFS.Lock\n  | \t/root/go/pkg/mod/github.com/cockroachdb/pebble@v1.1.4-0.20250120151818-5dd133a1e6fb/vfs/file_lock_unix.go:50\n  | github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFS).Lock\n  | \t/root/go/pkg/mod/github.com/cockroachdb/pebble@v1.1.4-0.20250120151818-5dd133a1e6fb/vfs/disk_health.go:679\n  | github.com/cockroachdb/pebble.LockDirectory\n  | \t/root/go/pkg/mod/github.com/cockroachdb/pebble@v1.1.4-0.20250120151818-5dd133a1e6fb/open.go:1068\n  | github.com/cockroachdb/pebble.Open\n  | \t/root/go/pkg/mod/github.com/cockroachdb/pebble@v1.1.4-0.20250120151818-5dd133a1e6fb/open.go:116\n  | github.com/pingcap/tidb/pkg/lightning/backend/local.(*engineManager).openEngineDB\n  | \t/workspace/source/tidb/pkg/lightning/backend/local/engine_mgr.go:230\n  | github.com/pingcap/tidb/pkg/lightning/backend/local.(*engineManager).openEngine\n  | \t/workspace/source/tidb/pkg/lightning/backend/local/engine_mgr.go:236\n  | github.com/pingcap/tidb/pkg/lightning/backend/local.(*Backend).OpenEngine\n  | \t/workspace/source/tidb/pkg/lightning/backend/local/local.go:833\n  | github.com/pingcap/tidb/pkg/lightning/backend.EngineManager.OpenEngine\n  | \t/workspace/source/tidb/pkg/lightning/backend/backend.go:271\n  | github.com/pingcap/tidb/pkg/ddl/ingest.(*litBackendCtx).Register\n  | \t/workspace/source/tidb/pkg/ddl/ingest/engine_mgr.go:62\n  | github.com/pingcap/tidb/pkg/ddl.(*readIndexStepExecutor).buildLocalStorePipeline\n  | \t/workspace/source/tidb/pkg/ddl/backfilling_read_index.go:358\n  | github.com/pingcap/tidb/pkg/ddl.(*readIndexStepExecutor).runLocalPipeline\n  | \t/workspace/source/tidb/pkg/ddl/backfilling_read_index.go:160\n  | github.com/pingcap/tidb/pkg/ddl.(*readIndexStepExecutor).RunSubtask\n  | \t/workspace/source/tidb/pkg/ddl/backfilling_read_index.go:206\n  | github.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*BaseTaskExecutor).runSubtask.func1\n  | \t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/task_executor.go:466\n  | github.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*BaseTaskExecutor).runSubtask\n  | \t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/task_executor.go:467\n  | github.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*BaseTaskExecutor).Run\n  | \t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/task_executor.go:344\n  | github.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*Manager).startTaskExecutor.func2\n  | \t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/manager.go:345\n  | github.com/pingcap/tidb/pkg/util.(*WaitGroupWrapper).RunWithLog.func1\n  | \t/workspace/source/tidb/pkg/util/wait_group_wrapper.go:181\n  | runtime.goexit\n  | \t/usr/local/go/src/runtime/asm_amd64.s:1700\nWraps: (2) lock held by current process\nError types: (1) *withstack.withStack (2) *errutil.leafError\ngithub.xm233.cn/pingcap/errors.Trace\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20250523034308-74f78ae071ee/juju_adaptor.go:15\ngithub.xm233.cn/pingcap/tidb/pkg/lightning/backend/local.(*engineManager).openEngineDB\n\t/workspace/source/tidb/pkg/lightning/backend/local/engine_mgr.go:231\ngithub.xm233.cn/pingcap/tidb/pkg/lightning/backend/local.(*engineManager).openEngine\n\t/workspace/source/tidb/pkg/lightning/backend/local/engine_mgr.go:236\ngithub.xm233.cn/pingcap/tidb/pkg/lightning/backend/local.(*Backend).OpenEngine\n\t/workspace/source/tidb/pkg/lightning/backend/local/local.go:833\ngithub.xm233.cn/pingcap/tidb/pkg/lightning/backend.EngineManager.OpenEngine\n\t/workspace/source/tidb/pkg/lightning/backend/backend.go:271\ngithub.xm233.cn/pingcap/tidb/pkg/ddl/ingest.(*litBackendCtx).Register\n\t/workspace/source/tidb/pkg/ddl/ingest/engine_mgr.go:62\ngithub.xm233.cn/pingcap/tidb/pkg/ddl.(*readIndexStepExecutor).buildLocalStorePipeline\n\t/workspace/source/tidb/pkg/ddl/backfilling_read_index.go:358\ngithub.xm233.cn/pingcap/tidb/pkg/ddl.(*readIndexStepExecutor).runLocalPipeline\n\t/workspace/source/tidb/pkg/ddl/backfilling_read_index.go:160\ngithub.xm233.cn/pingcap/tidb/pkg/ddl.(*readIndexStepExecutor).RunSubtask\n\t/workspace/source/tidb/pkg/ddl/backfilling_read_index.go:206\ngithub.xm233.cn/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*BaseTaskExecutor).runSubtask.func1\n\t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/task_executor.go:466\ngithub.xm233.cn/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*BaseTaskExecutor).runSubtask\n\t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/task_executor.go:467\ngithub.xm233.cn/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*BaseTaskExecutor).Run\n\t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/task_executor.go:344\ngithub.xm233.cn/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*Manager).startTaskExecutor.func2\n\t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/manager.go:345\ngithub.xm233.cn/pingcap/tidb/pkg/util.(*WaitGroupWrapper).RunWithLog.func1\n\t/workspace/source/tidb/pkg/util/wait_group_wrapper.go:181\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700"]
goroutine 115111222 [semacquire, 7999 minutes]:
sync.runtime_Semacquire(0xc0b75cfde8?)
        /usr/local/go/src/runtime/sema.go:71 +0x25
sync.(*WaitGroup).Wait(0x7298560?)
        /usr/local/go/src/sync/waitgroup.go:118 +0x48
golang.org/x/sync/errgroup.(*Group).Wait(0xc045bb72c0)
        /root/go/pkg/mod/golang.org/x/sync@v0.16.0/errgroup/errgroup.go:56 +0x25
github.com/pingcap/tidb/pkg/ddl.(*worker).executeDistTask(0xc051eb25b0, 0xc032990000, {0x7fe8b58, 0xc016de09a0}, 0xc016ddb300)
        /workspace/source/tidb/pkg/ddl/index.go:2917 +0xbfb
github.com/pingcap/tidb/pkg/ddl.(*worker).addTableIndex(0xc051eb25b0, 0xc0166a9600?, {0x7fe8b58, 0xc016de09a0}, 0xc016ddb300)
        /workspace/source/tidb/pkg/ddl/index.go:2599 +0x75
github.com/pingcap/tidb/pkg/ddl.runReorgJobAndHandleErr.func1()
        /workspace/source/tidb/pkg/ddl/index.go:1713 +0xc6
github.com/pingcap/tidb/pkg/ddl.(*worker).runReorgJob.func1()
        /workspace/source/tidb/pkg/ddl/reorg.go:380 +0x29
github.com/pingcap/tidb/pkg/util.(*WaitGroupWrapper).Run.func1()
        /workspace/source/tidb/pkg/util/wait_group_wrapper.go:167 +0x4c
created by github.com/pingcap/tidb/pkg/util.(*WaitGroupWrapper).Run in goroutine 115096206
        /workspace/source/tidb/pkg/util/wait_group_wrapper.go:165 +0x73

4. What is your TiDB version? (Required)

master (e4f8ba9)

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects-8.1This bug affects the 8.1.x(LTS) versions.affects-8.5This bug affects the 8.5.x(LTS) versions.component/ddlThis issue is related to DDL of TiDB.severity/majortype/bugThe issue is confirmed as a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions