[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
-
Updated
Feb 19, 2025 - Python
[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Datasets collection and preprocessings framework for NLP extreme multitask learning
[CVPR 2026] Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.
This training offers an intensive exploration into the frontier of reinforcement learning techniques with large language models (LLMs). We will explore advanced topics such as Reinforcement Learning with Human Feedback (RLHF), Reinforcement Learning from AI Feedback (RLAIF), Reasoning LLMs, and demonstrate practical applications such as fine-tuning
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model prompting and evaluation, exact match, F1 Score, PEDANT semantic match, transformer match. Our package also supports prompting OPENAI and Anthropic API.
Official Repo for Paper: "Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios"
Learning to route instances for Human vs AI Feedback (ACL Main '25)
Revealing and unlocking the context boundary of reward models
[ACL2024 Findings]DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
ToolRM: Towards Agentic Tool-Use Reward Modeling
Code for SFT and RL
Implementation for our COLM paper "Off-Policy Corrected Reward Modeling for RLHF"
The code used in the paper "DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging"
Source code of our paper "Transferring Textual Preferences to Vision-Language Understanding through Model Merging", ACL 2025
A novel Group Relative Reward Model (GRRM) framework enhances machine translation quality and reasoning capabilities by improving intra-group ranking through comparative analysis rather than isolated metric evaluation.
Building an LLM with RLHF involves fine-tuning using human-labeled preferences. Based on Learning to Summarize from Human Feedback, it uses supervised learning, reward modeling, and PPO to improve response quality and alignment.
RewardAnything: Generalizable Principle-Following Reward Models
Add a description, image, and links to the reward-modeling topic page so that developers can more easily learn about it.
To associate your repository with the reward-modeling topic, visit your repo's landing page and select "manage topics."