Add support for 4bit JoyCaption by maedtb · Pull Request #344 · jhc13/taggui

maedtb · 2025-03-01T03:17:48Z

tl;dr: bitsandbytes is quantizing a nn.NonDynamicallyQuantizableLinear output in JoyCaption. We revert it back to a nn.Linear (as nn.NonDynamicallyQuantizableLinear is not a constructable type.)

We also set the dtype in a few more places where we don't, or don't always.

tl;dr: bitsanbytes is quantizing a nn.NonDynamicallyQuantizableLinear output in JoyCaption. We revert it back to a nn.Linear (as nn.NonDynamicallyQuantizableLinear is not a public type.) We also set the dtype in a few more places where we don't, or don't always.

jhc13 · 2025-03-01T19:35:54Z

                load_in_4bit=True,
                bnb_4bit_quant_type='nf4',
                bnb_4bit_compute_dtype=self.dtype,
+                bnb_4bit_quant_storage=self.dtype,


Could you explain why you added this?

This is something I don't fully understand to be honest, but the investigation I've done so far suggests it's not currently making anything worse. If that is incorrect, this part of the PR can be made to only happen for JoyCaption fairly trivially. Let me know if you would like to go that route, and I'll make the change.

My theory is that without specifying our quant dtype to match our model dtype, the parts of the model that aren't quantizable (just the attention head? I'm not sure.) are still expecting data in the model-native size. That's just a rough theory though; I haven't dug into the why at all when the problem went away easily.

Without this fix, when running JoyCaption in 4bit, we get a type mismatch error on uint8 and bf16. With this change, we store our quantized data in the type dtype is, which makes the error go away. For the record, valid types are: int8, uint8, fp16, bf16, fp32, and fp64. The default is torch.uint8 as per above.

I looked at the video-memory usage with/without this and it doesn't seem to increase (on models that you can run without this change). The parameter comment ("the storage type to pack the quanitzed 4-bit prarams" suggested to me that maybe multiple tensors are being packed into this type, which makes sense that they'd have support for if they're normally usually storing 4bits in a 8bit data type..) My local test runs also did not show a difference in how long it takes to caption images (on a 4bit model with and without this change), they were the same in a build with this and without.

jhc13 · 2025-03-01T19:43:11Z

+            # If our out_proj was converted into a nn.Linear4bit, replace
+            # it with the original nn.Linear. JoyCaption's out-projection
+            # layer is not dynamically quantizable.
+            if isinstance(attention.out_proj, bitsandbytes.nn.Linear4bit):


What would be the case where this is not true? Isn't this fixed for the given model?

I think no, it would currently always be true. This add is an after-thought, when I was repacked this for the PR.

Basically, my thought process was "what if [the model authors] fix the model in whatever way BNB is not currently happy with it, and this hack is not needed and causes the model to break and/or silently misbehave?". I wanted to add one more layer of shielding between that possibility, and since this code should only run once per run, it seemed fairly safe to do so.

Another way to do this might be to load the model normally, but without the weights₁ to avoid double-loading the model, and inspect if out_proj is a torch.nn.NondynamicallyQuantizableLinear before BNB mutates it, and use that to guide our decision to swap a torch.nn.Linear4bit to torch.nn.Linear. That solution seemed like a lot more work for something that I think is something that might not ever happen.

₁ - is this possible via the torch meta device?

dxqb · 2025-04-29T07:06:07Z

works well on a 16 GB card. thank you!

Add support for 4bit JoyCaption

jhc13 reviewed Mar 1, 2025

View reviewed changes

Edit comment

4a96db6

jhc13 merged commit 8c34f59 into jhc13:main May 21, 2025

diodiogod pushed a commit to diodiogod/taggui-video that referenced this pull request Oct 17, 2025

Merge pull request jhc13#344 from maedtb/4bitjoy

99a500d

Add support for 4bit JoyCaption

diodiogod pushed a commit to diodiogod/taggui-video that referenced this pull request Feb 19, 2026

Merge pull request jhc13#344 from maedtb/4bitjoy

4e728fe

Add support for 4bit JoyCaption

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for 4bit JoyCaption#344

Add support for 4bit JoyCaption#344
jhc13 merged 2 commits intojhc13:mainfrom
maedtb:4bitjoy

maedtb commented Mar 1, 2025 •

edited

Loading

Uh oh!

jhc13 Mar 1, 2025

Uh oh!

maedtb Mar 5, 2025

Uh oh!

jhc13 Mar 1, 2025

Uh oh!

maedtb Mar 5, 2025

Uh oh!

dxqb commented Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maedtb commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhc13 Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

maedtb Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

jhc13 Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

maedtb Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

dxqb commented Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maedtb commented Mar 1, 2025 •

edited

Loading