Skip to content

Running on a RTX5060Ti #67

@MovLab2

Description

@MovLab2

It works very slow with none as acceleration. When i use

(personalive) PS D:\PersonaLive> PS>python inference_online.py --acceleration tensorrt

host: 0.0.0.0
port: 7860
reload: False
mode: default
max_queue_size: 0
timeout: 0.0
safety_checker: False
taesd: True
ssl_certfile: None
ssl_keyfile: None
debug: False
acceleration: tensorrt
engine_dir: engines
config_path: ./configs/prompts/personalive_online.yaml

C:\Users\movla.conda\envs\personalive\lib\site-packages\diffusers\models\dual_transformer_2d.py:20: FutureWarning: DualTransformer2DModel is deprecated and will be removed in version 0.29. Importing DualTransformer2DModel from diffusers.models.dual_transformer_2d is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.dual_transformer_2d import DualTransformer2DModel, instead.
deprecate("DualTransformer2DModel", "0.29", deprecation_message)

host: 0.0.0.0
port: 7860
reload: False
mode: default
max_queue_size: 0
timeout: 0.0
safety_checker: False
taesd: True
ssl_certfile: None
ssl_keyfile: None
debug: False
acceleration: tensorrt
engine_dir: engines
config_path: ./configs/prompts/personalive_online.yaml

C:\Users\movla.conda\envs\personalive\lib\site-packages\diffusers\models\dual_transformer_2d.py:20: FutureWarning: DualTransformer2DModel is deprecated and will be removed in version 0.29. Importing DualTransformer2DModel from diffusers.models.dual_transformer_2d is deprecated and this will be removed in a future version. Please use from diffusers.models.transformers.dual_transformer_2d import DualTransformer2DModel, instead.
deprecate("DualTransformer2DModel", "0.29", deprecation_message)
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Loading engine from file ./pretrained_weights/tensorrt/unet_work.engine...
[EngineModel] Detected dynamic shape for d00: (1, -1, 320) -> Using Max Profile: (1, 16384, 320)
[EngineModel] Detected dynamic shape for d01: (1, -1, 320) -> Using Max Profile: (1, 16384, 320)
[EngineModel] Detected dynamic shape for d10: (1, -1, 640) -> Using Max Profile: (1, 4096, 640)
[EngineModel] Detected dynamic shape for d11: (1, -1, 640) -> Using Max Profile: (1, 4096, 640)
[EngineModel] Detected dynamic shape for d20: (1, -1, 1280) -> Using Max Profile: (1, 1024, 1280)
[EngineModel] Detected dynamic shape for d21: (1, -1, 1280) -> Using Max Profile: (1, 1024, 1280)
[EngineModel] Detected dynamic shape for m: (1, -1, 1280) -> Using Max Profile: (1, 256, 1280)
[EngineModel] Detected dynamic shape for u10: (1, -1, 1280) -> Using Max Profile: (1, 1024, 1280)
[EngineModel] Detected dynamic shape for u11: (1, -1, 1280) -> Using Max Profile: (1, 1024, 1280)
[EngineModel] Detected dynamic shape for u12: (1, -1, 1280) -> Using Max Profile: (1, 1024, 1280)
[EngineModel] Detected dynamic shape for u20: (1, -1, 640) -> Using Max Profile: (1, 4096, 640)
[EngineModel] Detected dynamic shape for u21: (1, -1, 640) -> Using Max Profile: (1, 4096, 640)
[EngineModel] Detected dynamic shape for u22: (1, -1, 640) -> Using Max Profile: (1, 4096, 640)
[EngineModel] Detected dynamic shape for u30: (1, -1, 320) -> Using Max Profile: (1, 16384, 320)
[EngineModel] Detected dynamic shape for u31: (1, -1, 320) -> Using Max Profile: (1, 16384, 320)
[EngineModel] Detected dynamic shape for u32: (1, -1, 320) -> Using Max Profile: (1, 16384, 320)
Failed to enable xformers: Refer to https://github.com/facebookresearch/xformers for more information on how to install xformers
D:\PersonaLive\inference_online.py:240: DeprecationWarning:
on_event is deprecated, use lifespan event handlers instead.

    Read more about it in the
    [FastAPI docs for Lifespan Events](https://fastapi.tiangolo.com/advanced/events/).

@self.app.on_event("shutdown")
init done
INFO: Started server process [15440]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)

I get this alternating /

Image

Then This //

Image

every second or so//
any ideas?

Update, I got it running on RTX5060TI but to slow for any real work. With "python inference_online.py --acceleration tensorrt"

Image

Summary of Fixes for RTX 5060 Ti (PersonaLive TensorRT)
Root Cause
PyTorch's ONNX tracer was forcing FP32 during autocast blocks while model weights were FP16, causing dtype mismatches at BatchNorm and Conv layers.

Files Modified (5 files)
File Change
torch2trt.py Force model to FP16 + eval mode; set auto_cast=False in export call
onnx_export.py Autocast manager set to torch.float16; only cast floating point tensors
framed_models.py Removed torch.amp.autocast block (no longer needed)
FAN_feature_extractor.py Removed x = x.float() from FAN_SA.forward
webcam/util.py Fixed array_to_image to handle FP16 value ranges ([-1,1] → [0,255])
Key Code Changes

  1. torch2trt.py

python
model = model.to(device=device, dtype=torch.float16)
model.eval()
export_onnx(..., auto_cast=False, dtype=torch.float16)
2. onnx_export.py

python
with torch.autocast("cuda", dtype=torch.float16): # was missing dtype

and

if torch.is_floating_point(tensor):
inputs[key] = tensor.to(dtype=dtype)
3. framed_models.py

python

REMOVED this entire block:

with torch.amp.autocast('cuda', dtype=torch.float16):

new_motion_hidden_states = self.motion_encoder(motion)

CHANGED to:

new_motion_hidden_states = self.motion_encoder(motion)
4. FAN_feature_extractor.py

python

REMOVED: x = x.float()

  1. webcam/util.py

python
def array_to_image(image_array, normalize=True):
if image_array.max() <= 1.0 and image_array.min() >= -1.0:
image_array = (image_array + 1.0) / 2.0 * 255.0
elif image_array.max() <= 1.0 and image_array.min() >= 0.0:
image_array = image_array * 255.0
image_array = np.clip(image_array, 0, 255).astype(np.uint8)
return Image.fromarray(image_array)
Environment Requirement
CUDA Toolkit 12.8.2 (provides nvcc.exe)

Result
Before After
Build failed: FloatTensor vs HalfTensor ✅ Build succeeds
TensorRT engine would not build ✅ Engine builds in ~6 minutes
Output pixelated or black ✅ Normal output
Commands to Run
powershell
conda activate personalive
python torch2trt.py
python inference_online.py --acceleration tensorrt

I hope this helps someone.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions