Skip to content

<fix>[kvm]: remove TPM from VM individually instead of batch delete#3688

Closed
MatheMatrix wants to merge 1 commit intofeature-zsv-5.0.0-vm-support-vtpm-and-secucebootfrom
sync/wenhao.zhang/zsv-ldap-3
Closed

<fix>[kvm]: remove TPM from VM individually instead of batch delete#3688
MatheMatrix wants to merge 1 commit intofeature-zsv-5.0.0-vm-support-vtpm-and-secucebootfrom
sync/wenhao.zhang/zsv-ldap-3

Conversation

@MatheMatrix
Copy link
Copy Markdown
Owner

Change the TPM deletion logic from batch SQL delete to individual
deletion using While loop. This ensures proper cleanup for each TPM
by calling removeTpmFromVm method. Added error handling to continue
even if deletion fails for individual TPM.

Changes:

  • Replace batch SQL delete with While loop iteration
  • Call removeTpmFromVm for each TPM to ensure proper cleanup
  • Add error handling to continue on individual failures
  • Log warnings for failed TPM deletions

Resolves: ZSV-11439
Related: ZSV-11310

Change-Id: I616d746d6a6f677a6772796f63676e6177676371

sync from gitlab !9550

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 645e7af1-c015-43f1-abce-67a512c9cda6

📥 Commits

Reviewing files that changed from the base of the PR and between d2feb84 and ab338a6.

📒 Files selected for processing (1)
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java

Walkthrough

KvmTpmManager 在 "persist-TPM-VO" 回滚中将原先的按 UUID 批量删除 TpmVO 改为遍历 reply.getInventories(),对每个克隆出的 TPM 构建 RemoveTpmFromVmContext(新增公有布尔字段 force)并调用 removeTpmFromVm(),在 context.force==true 时跳过 VM 状态校验;每项失败记录警告但不阻断,全部完成后统一调用 trigger.rollback()

Changes

Cohort / File(s) Summary
TPM 回滚与上下文字段
plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java
handle(CloneVmTpmMsg) 中回滚从按 UUID 批量删除 TpmVO 改为遍历 reply.getInventories(),对每个 TPM 构建 RemoveTpmFromVmContext(新增公有字段 boolean force)并逐条调用 removeTpmFromVm(...);每次调用在 success/fail 后完成其 completion,失败记录警告但继续。removeTpmFromVm"check-vm-status" 步骤新增 skipIf(data -> context.force) 以在强制模式下跳过 VM 状态校验。

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,200,255,0.5)
    participant Caller
    end
    rect rgba(200,255,200,0.5)
    participant KvmTpmManager
    participant WhileLoop as While
    end
    rect rgba(255,200,200,0.5)
    participant VMService as removeTpmFromVm
    participant DB
    end

    Caller->>KvmTpmManager: 触发 rollback for "persist-TPM-VO"
    KvmTpmManager->>WhileLoop: 遍历 reply.getInventories()
    WhileLoop->>VMService: removeTpmFromVm(RemoveTpmFromVmContext{vmUuid,tpmUuid,force=true})
    VMService->>VMService: skip "check-vm-status" if context.force
    VMService->>DB: 删除/更新 TPM 关联记录
    DB-->>VMService: 返回 成功/失败
    VMService-->>WhileLoop: completion.done()(成功或失败均完成)
    WhileLoop-->>KvmTpmManager: 所有 TPM 操作完成
    KvmTpmManager->>Caller: trigger.rollback()
Loading

预估代码审查工作量

🎯 3 (Moderate) | ⏱️ ~20 分钟

诗句

🐰 我是一只小兔子,逐个小心又温柔,

强制一扯不问状况,失败也笑着走。
日志里留了脚印,回滚后呼吸悠,
最后一声轻轻唤:事务已收。

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题清晰准确地描述了主要变更:将TPM批量删除改为逐个删除,这与代码实现的核心改动完全一致。
Description check ✅ Passed 描述详细说明了从批量SQL删除改为While循环逐个删除、调用removeTpmFromVm方法以及错误处理等变更,与实际代码改动相关联。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/wenhao.zhang/zsv-ldap-3

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java`:
- Around line 461-475: The rollback path can leave orphaned TpmVO/VmHostFile*
records because removeTpmFromVm(RemoveTpmFromVmContext) performs VM state checks
and detachKeyProviderFromTpm() and returns on failure; modify KvmTpmManager to
perform a best-effort cleanup when that call fails in the While.each fail()
handler: add a new helper (e.g., forceRemoveTpmFromVm or forceDeleteTpmRecords)
that accepts the same RemoveTpmFromVmContext and deletes the TpmVO and related
VmHostFile* rows directly (skip VM state validation and key-detach logic), or
alternatively execute the equivalent SQL delete/update statements here in the
fail() block as a best-effort cleanup, log any errors but do not propagate them,
and call whileCompletion.done() after attempting the force-delete so rollback
does not leave orphan records.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fbd0b81e-824d-4cd4-b231-8be364c84249

📥 Commits

Reviewing files that changed from the base of the PR and between 912b02f and c366e06.

📒 Files selected for processing (1)
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java

@zstack-robot-1
Copy link
Copy Markdown
Collaborator

Comment from yaohua.wu:

Review: MR !9550 — ZSV-11439

Jira 上下文: 未找到关联 ZSTAC-格式 Jira issue(MR 引用 ZSV-11439 / ZSV-11310,非标准项目 key,无法拉取详情)。以下 review 基于代码变更本身。
目标分支: feature-zsv-5.0.0-vm-support-vtpm-and-secuceboot | 合并状态: can_be_merged | 冲突: 无

变更概述

handle(CloneVmTpmMsg)"persist-TPM-VO" 步骤的 rollback 逻辑从批量 SQL 删除 TpmVO 改为逐个调用 removeTpmFromVm(),意图是确保回滚时做完整清理(host 文件删除 + key 解绑 + DB 记录删除),而非仅删 DB 记录。


Critical

1. [KvmTpmManager.java:~461] VM 状态校验会阻断 rollback 清理,导致孤儿记录

removeTpmFromVm() 的第一步是 check-vm-status,校验 VM 是否处于 SUPPORT_VM_STATES_FOR_TPM_OPERATION。但这是 clone rollback 路径 —— 目标 VM 正处于克隆创建过程中,状态极可能是 Creating 或其他过渡态,大概率 不在 支持列表内。

当 VM 状态校验失败时:

  • removeTpmFromVm() 整条 FlowChain 失败,不会执行到 step 5 的 DB 清理
  • 外层 fail() 回调仅 logger.warn(...) + whileCompletion.done() → TpmVO 记录永久残留
  • 旧实现 SQL.New(TpmVO.class).in(...).delete() 是无条件执行的,不受 VM 状态影响

影响: 克隆失败后数据库中残留孤儿 TpmVO 记录,对应的 VM 可能已经被清理,导致数据不一致。

建议:

  • 方案 A(推荐): 给 RemoveTpmFromVmContext 增加 skipVmStateCheck 标志,rollback 路径设置为 true,跳过 step 1 校验
  • 方案 B: 在 fail() 回调中补一个 best-effort 的 SQL.New(TpmVO.class).eq(TpmVO_.uuid, removeContext.tpmUuid).delete() 兜底
  • 方案 C: rollback 路径不走 removeTpmFromVm(),保留旧的批量 SQL 删除,仅在正式 API 删除路径走完整流程

Warning

1. [KvmTpmManager.java:~461-475] detachKeyProviderFromTpm 在 rollback 中可能失败

removeTpmFromVm() 的 step 4 无条件调用 tpmKeyBackend.detachKeyProviderFromTpm(tpmUuid)。在 clone rollback 场景下:

  • 如果 clone-encrypted-resource-key-if-needed(NoRollbackFlow)尚未执行中途失败,部分目标 TPM 可能从未绑定 key provider
  • detachKeyProviderFromTpm() 对不存在的绑定关系是否安全(no-op)需要确认
  • 若抛异常,整个 removeTpmFromVm() 失败 → 同样导致 step 5 的 DB 清理被跳过

建议: 确认 detachKeyProviderFromTpm() 在无绑定关系时的行为。如果会抛异常,需要在 step 4 加防御性检查或 try-catch。

2. [KvmTpmManager.java:~460] rollback 路径执行了不必要的 host 操作

Clone 流程中 persist-TPM-VO 只创建了 DB 记录,没有创建任何 host 文件(swtpm 状态文件由 host 侧克隆操作生成)。removeTpmFromVm() 的 step 2-3 会查询 VmHostFileVO 并发送 agent 命令 —— 在 rollback 场景下这些查询必定返回空、step 3 必定跳过。虽然不会出错,但增加了不必要的 DB 查询和异步开销。

进一步佐证 rollback 应该使用更轻量级的清理路径。


Suggestion

1. [KvmTpmManager.java:~460] 考虑使用 While.all() 替代 While.each()

每个 TPM 属于不同的目标 VM(msg.getDstVmUuidList() 各不相同),删除操作互不依赖。使用 .all() 并行执行可以加速多 VM 克隆失败后的 rollback 速度。


架构思考

这个变更的意图是好的 —— rollback 时做完整清理(而非仅删 DB 记录)更加健壮。但 removeTpmFromVm() 是为正常 API 操作路径设计的(用户主动删除 TPM),其前置校验(VM 状态检查)不适用于 rollback 场景。

核心问题是 正常路径rollback 路径 对清理的要求不同:

  • 正常路径:需要校验 → 发 host 命令 → 解绑 key → 删 DB(严格流程)
  • Rollback 路径:需要 best-effort 清理一切,不应有前置条件阻断(宽容流程)

建议拆分或参数化 removeTpmFromVm(),使其能适应两种场景。

Verdict: REVISION_REQUIRED


🤖 Robot Reviewer

@MatheMatrix MatheMatrix force-pushed the sync/wenhao.zhang/zsv-ldap-3 branch 2 times, most recently from b8c8bb9 to d2feb84 Compare April 7, 2026 09:00
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java`:
- Around line 305-309: The VM-state check in RemoveTpmFromVmContext is inverted:
change the condition that currently uses context.ignoreError to require the
check only when ignoreError is false (i.e. use !context.ignoreError && <vm state
check>) so rollback and normal RemoveTpmMsg paths correctly enforce VM-state
validation; also consider renaming the boolean field ignoreError to
skipVmStateCheck (and update all references) if its purpose is only to bypass VM
state validation to avoid future misuse.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8b6c33c4-742a-40c5-a4e4-44946d5e4f37

📥 Commits

Reviewing files that changed from the base of the PR and between c366e06 and b8c8bb9.

📒 Files selected for processing (1)
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java

Comment on lines 305 to +309
static class RemoveTpmFromVmContext {
String vmInstanceUuid;
String tpmUuid;
boolean ignoreError;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

ignoreError 的判断写反了,这个回滚修复实际上没有生效。

Line 328 现在只有 ignoreError=true 才会因为 VM 状态不支持而 fail(),而默认 false 的普通 RemoveTpmMsg 路径反而完全跳过了状态校验。这样 Line 467 的 clone rollback 在 transient/Creating 之类的状态下仍会失败,而外层 fail() 只记录 warning 就继续,刚创建的 TpmVO 还是会残留;同时正常删除路径也丢掉了原有的状态保护。这里应该改成 !context.ignoreError && ...。另外,这个标志如果只是给 rollback 跳过状态检查用,名字也建议收窄成 skipVmStateCheck 之类,避免后续再次误用。

🛠️ 建议修改
 static class RemoveTpmFromVmContext {
     String vmInstanceUuid;
     String tpmUuid;
-    boolean ignoreError;
+    boolean skipVmStateCheck;
@@
-                    if (context.ignoreError && !SUPPORT_VM_STATES_FOR_TPM_OPERATION.contains(vm.getState())) {
+                    if (!context.skipVmStateCheck && !SUPPORT_VM_STATES_FOR_TPM_OPERATION.contains(vm.getState())) {
                         trigger.fail(err(VM_STATE_ERROR,
                                 "The current VM state does not support removing TPM operations")
                                 .withOpaque("support.vm.state", SUPPORT_VM_STATES_FOR_TPM_OPERATION));
                         return;
                     }
@@
-                    removeContext.ignoreError = true;
+                    removeContext.skipVmStateCheck = true;

Also applies to: 328-333, 463-468

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java` around lines
305 - 309, The VM-state check in RemoveTpmFromVmContext is inverted: change the
condition that currently uses context.ignoreError to require the check only when
ignoreError is false (i.e. use !context.ignoreError && <vm state check>) so
rollback and normal RemoveTpmMsg paths correctly enforce VM-state validation;
also consider renaming the boolean field ignoreError to skipVmStateCheck (and
update all references) if its purpose is only to bypass VM state validation to
avoid future misuse.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java (1)

475-480: ⚠️ Potential issue | 🟠 Major

回滚失败时仍缺少兜底 DB 清理,可能留下孤儿记录。

removeTpmFromVm 失败时(例如 detachKeyProviderFromTpm 在 key 未绑定时抛异常),流程会在到达 remove-db-records 步骤之前中断。当前 fail() 回调仅记录日志并继续,不会执行任何兜底清理,导致刚创建的 TpmVO 记录残留。

Clone 场景下,persist-TPM-VO 只创建了 DB 记录,尚未进行 key 绑定。如果后续步骤失败触发回滚,detachKeyProviderFromTpm 可能会因无绑定记录而抛异常或失败,从而跳过 DB 删除。

建议在 fail() 回调中添加 best-effort 的 DB 清理:

🛠️ 建议的修复方案
 `@Override`
 public void fail(ErrorCode errorCode) {
     logger.warn(String.format("failed to delete tpm for VM[%s] but still continue: %s",
             tpm.getVmInstanceUuid(), errorCode.getReadableDetails()));
+    // Best-effort fallback: ensure DB records are cleaned up
+    try {
+        SQL.New(TpmVO.class).eq(TpmVO_.uuid, tpm.getUuid()).delete();
+        SQL.New(VmHostFileVO.class)
+                .eq(VmHostFileVO_.vmInstanceUuid, tpm.getVmInstanceUuid())
+                .eq(VmHostFileVO_.type, VmHostFileType.TpmState)
+                .delete();
+    } catch (Exception e) {
+        logger.warn(String.format("fallback DB cleanup for TPM[%s] also failed: %s",
+                tpm.getUuid(), e.getMessage()));
+    }
     whileCompletion.done();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java` around lines
475 - 480, The fail() callback in removeTpmFromVm currently only logs and
returns, which can leave a persisted TpmVO orphan when later steps (e.g.,
detachKeyProviderFromTpm) fail; update the fail() implementation to perform a
best-effort DB cleanup: lookup and remove the TpmVO created for the VM (use
tpm.getVmInstanceUuid() or tpm.getUuid() to find the record) inside a try/catch,
handle and log any exceptions but do not rethrow, then finally call
whileCompletion.done(); ensure the cleanup is idempotent so it won’t fail if the
record is already removed.
🧹 Nitpick comments (2)
plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java (2)

305-309: 字段命名建议改进:ignoreErrorskipVmStateCheck

ignoreError 这个名字暗示会忽略所有错误,但实际上它只用于跳过 VM 状态检查(Line 323 的 skipIf)。根据编码规范,建议使用更精确的命名如 skipVmStateCheck,以避免后续误用。

♻️ 建议重命名以提高代码自解释性
 static class RemoveTpmFromVmContext {
     String vmInstanceUuid;
     String tpmUuid;
-    boolean ignoreError;
+    boolean skipVmStateCheck;
 
     List<VmHostFileVO> hostFiles;

同时更新 Line 323 和 Line 468 的引用。

As per coding guidelines: "避免使用布尔型参数造成含义不明确。"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java` around lines
305 - 309, The boolean field ignoreError on class RemoveTpmFromVmContext is
misleading because it only controls skipping the VM state check; rename the
field to skipVmStateCheck and update all usages to match (e.g., the condition
passed into skipIf and any other references such as where RemoveTpmFromVmContext
is constructed or read) so the intent is explicit and conforms to the guideline
about boolean parameter clarity; ensure getters/setters or direct accesses to
ignoreError are replaced with skipVmStateCheck consistently.

464-464: 可考虑使用 While.all() 并行执行以加速回滚。

当前使用 While.each() 串行删除每个 TPM,各 VM 的 TPM 删除操作相互独立,可以改用 While.all() 并行执行以减少回滚耗时。

♻️ 使用并行执行
-new While<>(reply.getInventories()).each((tpm, whileCompletion) -> {
+new While<>(reply.getInventories()).all((tpm, whileCompletion) -> {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java` at line 464,
The rollback currently iterates serially using new
While<>(reply.getInventories()).each(...) which deletes TPMs one-by-one; change
this to use While.all(reply.getInventories()) with the same per-item lambda so
TPM deletions run in parallel, keeping the existing completion/exception
handling (referencing reply.getInventories(), While.each -> replace with
While.all, and the lambda used in KvmTpmManager) to reduce rollback time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java`:
- Around line 475-480: The fail() callback in removeTpmFromVm currently only
logs and returns, which can leave a persisted TpmVO orphan when later steps
(e.g., detachKeyProviderFromTpm) fail; update the fail() implementation to
perform a best-effort DB cleanup: lookup and remove the TpmVO created for the VM
(use tpm.getVmInstanceUuid() or tpm.getUuid() to find the record) inside a
try/catch, handle and log any exceptions but do not rethrow, then finally call
whileCompletion.done(); ensure the cleanup is idempotent so it won’t fail if the
record is already removed.

---

Nitpick comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java`:
- Around line 305-309: The boolean field ignoreError on class
RemoveTpmFromVmContext is misleading because it only controls skipping the VM
state check; rename the field to skipVmStateCheck and update all usages to match
(e.g., the condition passed into skipIf and any other references such as where
RemoveTpmFromVmContext is constructed or read) so the intent is explicit and
conforms to the guideline about boolean parameter clarity; ensure
getters/setters or direct accesses to ignoreError are replaced with
skipVmStateCheck consistently.
- Line 464: The rollback currently iterates serially using new
While<>(reply.getInventories()).each(...) which deletes TPMs one-by-one; change
this to use While.all(reply.getInventories()) with the same per-item lambda so
TPM deletions run in parallel, keeping the existing completion/exception
handling (referencing reply.getInventories(), While.each -> replace with
While.all, and the lambda used in KvmTpmManager) to reduce rollback time.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 37055ff3-afb8-405f-8f17-fd67f69cc799

📥 Commits

Reviewing files that changed from the base of the PR and between b8c8bb9 and d2feb84.

📒 Files selected for processing (1)
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java

Change the TPM deletion logic from batch SQL delete to individual
deletion using While loop. This ensures proper cleanup for each TPM
by calling removeTpmFromVm method. Added error handling to continue
even if deletion fails for individual TPM.

Changes:
- Replace batch SQL delete with While loop iteration
- Call removeTpmFromVm for each TPM to ensure proper cleanup
- Add error handling to continue on individual failures
- Log warnings for failed TPM deletions

Resolves: ZSV-11439
Related: ZSV-11310

Change-Id: I616d746d6a6f677a6772796f63676e6177676371
@MatheMatrix MatheMatrix force-pushed the sync/wenhao.zhang/zsv-ldap-3 branch from d2feb84 to ab338a6 Compare April 7, 2026 10:05
@zstack-robot-2 zstack-robot-2 deleted the sync/wenhao.zhang/zsv-ldap-3 branch April 7, 2026 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants