Ligong Han

Ligong Han | 韩立功

I am a Research Scientist at the MIT-IBM Watson AI Lab and Red Hat AI, working on Generative AI with a focus on controllable and precise generation, particularly in diffusion models and LLMs. I obtained my PhD in Computer Science from Rutgers University in 2024, advised by Prof. Dimitris Metaxas. During my PhD, I've spent time at Google Research, MIT-IBM Watson AI Lab, Snap Research, NEC Labs America, Tencent, and the Robotics Institute working as a research intern.

Previously, I earned my master's degree from Carnegie Mellon University and my bachelor's from Chien-Shiung Wu College, Southeast University.

Email: lastnamefirstname [at] gmail [dot] com or firstname.lastname [at] rutgers [dot] edu

Email / CV / Google Scholar / Github / LinkedIn / Twitter

News

02-2025	Two papers accepted to CVPR-2025!
01-2025	Three papers accepted to ICLR-2025!
10-2024	One paper accepted to WACV-2025!
10-2024	One paper accepted to NeurIPS-2024!
02-2024	One paper accepted to CVPR-2024!
10-2023	Two papers accepted to WACV-2024!
09-2023	One paper accepted to NeurIPS-2023!
07-2023	One paper accepted to ICCV-2023!
06-2023	One paper accepted to MICCAI-2023!
06-2023	One paper accepted to TMLR!
03-2023	Our paper Constructive Assimilation is accepted at GCV-2023.
02-2023	Two papers accepted to CVPR-2023!

Research

Selected publications are highlighted. (* equal contribution, † corresponding author)

LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
Can Jin, Mingyu Zhao, Shiyu Zhao, Zhenting Wang, Xiaoxiao He, Ligong Han, Tong Che, Dimitris Metaxas.
Accepted to International Conference on Learning Representations (ICLR), 2025
[arXiv] [Github] [bibtex]

TLDR: A low-rank formulation for visual prompting that enables shared and patch-specific information across image patches, improving performance and training efficiency.

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang, Krishnateja Killamsetty, Shivchander Sudalairaj, Wenlong Zhao, Seungwook Han, Abhishek Bhandwaldar, Guangxuan Xu, Kai Xu, Ligong Han, Luke Inglis, Akash Srivastava.
Accepted to International Conference on Learning Representations (ICLR), 2025
[arXiv] [InstructLab] [bibtex]

TLDR: A study on fine-tuning small LLMs (3B–7B) that challenges common training practices, optimizes efficiency, and provides practical guidance to make LLM research more accessible.

🎲 DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models
Xiaoxiao He, Ligong Han^†, Quan Dao, Song Wen, Minhao Bai, Di Liu, Han Zhang, Martin Renqiang Min, Juefei Xu, Chaowei Tan, Bo Liu, Kang Li, Hongdong Li, Junzhou Huang, Faez Ahmed, Akash Srivastava, Dimitris Metaxas.
Accepted to Conference on Computer Vision and Pattern Recognition (CVPR), 2025
[arXiv] [Project Page] [bibtex]

TLDR: Discrete diffusion inversion for precise and flexible content editing by recording noise sequences and masking patterns during the reverse process.

BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
Yibin Wang*, Haizhou Shi*, Ligong Han, Dimitris Metaxas, Hao Wang.
Accepted at Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS), 2024
[arXiv] [bibtex]

TLDR: We derive a novel Bayes by Backprop (BBB) framework for Low Rank Adaptation (LoRA) of Large Language Models (LLMs).

🥤 Spectrum-Aware Parameter Efficient Fine-Tuning
Xinxi Zhang*, Song Wen*, Ligong Han*^†, Juefei Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Dimitris Metaxas.
Accepted at Winter Conference on Applications of Computer Vision (WACV), 2025
[arXiv] [Github] [bibtex]

TLDR: A framework for parameter-efficient fine-tuning by adjusting both singular values and their basis vectors, balancing computational efficiency and representation capacity.

Implicit In-context Learning
Zhuowei Li, Zihao Xu, Ligong Han, Yunhe Gao, Song Wen, Di Liu, Hao Wang, Dimitris Metaxas.
Accepted to International Conference on Learning Representations (ICLR), 2025
[arXiv] [Project Page] [Github] [bibtex]

TLDR: Improves upon traditional In-context Learning by extracting a context vector from demonstration examples and injecting it into the model's activation space, achieving few-shot performance with zero-shot cost.

Score-Guided Diffusion for 3D Human Recovery
Anastasis Stathopoulos, Ligong Han, Dimitris Metaxas.
Accepted to Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[arXiv] [Project Page] [Github] [bibtex]

TLDR: Solving inverse problems for 3D human pose and shape reconstruction with score guidance in the latent space of a diffusion model.

ProxEdit: Improving Tuning-Free Real Image Editing with Proximal Guidance
Ligong Han^†, Song Wen, Qi Chen, Zhixing Zhang, Kunpeng Song, Mengwei Ren, Ruijiang Gao, Yuxiao Chen, Di Liu, Qilong Zhangli, Anastasis Stathopoulos, Jindong Jiang, Zhaoyang Xia, Akash Srivastava, Dimitris Metaxas.
Accepted at Winter Conference on Applications of Computer Vision (WACV), 2024
[arXiv] [poster] [Github] [bibtex]

TLDR: We introduced proximal guidance to enhance diffusion-based tuning-free real image editing in two frameworks, Negative Prompt Inversion and Mutual Self-Attention Control. Our algorithms, ProxNPI and ProxMasaCtrl, overcome limitations and achieve high-quality editing with computational efficiency.

On the Stability-Plasticity Dilemma in Continual Meta-Learning: Theory and Algorithm
Qi Chen, Changjian Shui, Ligong Han, Mario Marchand.
Accepted at Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023
[arXiv] [poster] [Github] [bibtex]

TLDR: This paper presents a theoretical framework and a novel algorithm for Continual Meta-Learning (CML) that effectively balances stability to prevent forgetting previous tasks and plasticity for learning from new tasks.

SVDiff: Compact Parameter Space for Diffusion Fine-Tuning
Ligong Han^†, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, Feng Yang.
Accepted at International Conference on Computer Vision (ICCV), 2023
[arXiv] [Unofficial Code] [PEFT-SVD] [Project Page] [poster] [bibtex]

TLDR: This paper presents an approach for customizing (T2I) diffusion models, fine-tuning singular values of weight matrices to reduce overfitting and model storage, while introducing a text-based single-image editing framework and a data-augmentation technique for multi-subject generation.

DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction
Xiaoxiao He, Chaowei Tan, Ligong Han, Bo Liu, Leon Axel, Kang Li, Dimitris Metaxas.
Accepted at International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2023
[arXiv] [Github] [bibtex]

TLDR: We propose a morphology-guided diffusion model for 3D cardiac volume reconstruction, by interpolation in its latent space. The model outperforms strong baselines including GAN- and DiffAE-based methods.

Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies
Ligong Han^†, Seungwook Han, Shivchander Sudalairaj, Charlotte Loh, Rumen Dangovski, Fei Deng, Pulkit Agrawal, Dimitris Metaxas, Leonid Karlinsky, Tsui-Wei Weng, Akash Srivastava.
Accepted to Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023
[arXiv] [poster] [Github] [bibtex]

TLDR: This study proposes a method to assimilate generated views with expert transformations in contrastive learning, improving the state-of-the-art by up to 3.6% on three datasets and providing a comprehensive analysis of various view generation and assimilation methods.

∿ SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, Jian Ren.
Accepted to Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[arXiv] [Github] [Project Page] [bibtex]

TLDR: This work proposes a model-based guidance technique for single-image editing using pre-trained diffusion models, addressing overfitting issues and enabling content creation with only one given image, while also introducing a patch-based fine-tuning method for generating images of arbitrary resolution.

Learning Articulated Shape with Keypoint Pseudo-labels from Web Images
Anastasis Stathopoulos, Georgios Pavlakos, Ligong Han, Dimitris Metaxas.
To appear in Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[arXiv] [code & data] [Project Page] [bibtex]

TLDR: The paper introduces a method for monocular 3D reconstruction of articulated objects with minimal labeled data, using category-specific keypoint estimators and data selection to improve performance.

StyleGAN-Fusion: Diffusion Guided Domain Adaptation of Style-based Generators
Kunpeng Song, Ligong Han, Bingchen Liu, Dimitris Metaxas, Ahmed Elgammal.
Accepted at Winter Conference on Applications of Computer Vision (WACV), 2024
[arXiv] [Github] [Project Page] [bibtex]

TLDR: This paper demonstrates the use of score distillation sampling as a critic to adapt GAN generators to new domains using text prompts, leveraging large-scale text-to-image diffusion models and achieving high quality and controllability in domain adaptation for both 2D and 3D image generation.

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Ligong Han^†, Jian Ren, Hsin-Ying Lee, Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris Metaxas, Sergey Tulyakov.
Accepted at Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[arXiv] [poster] [Github] [Project Page] [bibtex]

TLDR: The paper presents a multimodal video generation framework using a bidirectional transformer and improved techniques to generate high-quality, diverse video sequences, achieving state-of-the-art results on four datasets.

AE-StyleGAN: Improved Training of Style-Based Auto-Encoders
Ligong Han*^†, Sri Harsha Musunuri*, Martin Renqiang Min, Ruijiang Gao, Yu Tian, Dimitris Metaxas.
Accepted at Winter Conference on Applications of Computer Vision (WACV), 2022
[arXiv] [poster] [Github] [bibtex]

TLDR: Training a style-based autoencoder end-to-end, resulting in a more disentangled latent space and improved image inversion and generation quality.

Enhancing Counterfactual Classification via Self-Training
Ruijiang Gao, Max Biggs, Wei Sun, Ligong Han.
Accepted to AAAI Conference on Artificial Intelligencen (AAAI), 2022
[arXiv] [Github] [bibtex]

TLDR: The paper proposes a Counterfactual Self-Training (CST) algorithm that uses pseudolabeling to address the challenge of partial feedback in settings like pricing, online marketing, and precision medicine, treating it as a domain adaptation problem, and demonstrates its effectiveness on both synthetic and real datasets.

Hierarchically Self-supervised Transformer for Human Skeleton Representation Learning
Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris Metaxas.
Accepted to European Conference on Computer Vision (ECCV), 2022
[arXiv] [Github] [bibtex]

TLDR: The paper introduces a self-supervised hierarchical pre-training scheme with Hi-TRS, capturing multi-level dependencies and achieving state-of-the-art performance in skeleton-based tasks.

Disentangled Recurrent Wasserstein Autoencoder
Jun Han*, Martin Renqiang Min*, Ligong Han*, Li Erran Li, Xuan Zhang.
Accepted to International Conference on Learning Representations (ICLR Spotlight, scored among top 4%), 2021
[arXiv] [Code] [bibtex]

TLDR: R-WAE is a framework for unsupervised disentangled sequential representation learning, outperforming baselines in disentanglement and video generation by optimizing Wasserstein distance and mutual information.

Dual Projection Generative Adversarial Networks for Conditional Image Generation
Ligong Han^†, Martin Renqiang Min, Anastasis Stathopoulos, Yu Tian, Ruijiang Gao, Asim Kadav, Dimitris Metaxas.
Accepted to International Conference on Computer Vision (ICCV), 2021
[arXiv] [poster] [Github] [bibtex]

TLDR: Dual Projection GAN (P2GAN) balances data matching and label matching in cGANs, improving class separability and sample quality.

Human-AI Collaboration with Bandit Feedback
Ruijiang Gao, Maytal Saar-Tsechansky, Maria De-Arteaga, Ligong Han, Min Kyung Lee, Matthew Lease.
Accepted to International Joint Conference on Artificial Intelligence (IJCAI), 2021
[arXiv] [Github] [bibtex]

TLDR: The paper introduces a novel human-machine collaboration approach in a bandit feedback setting that improves decision-making performance by exploiting human-machine complementarity and personalizing routing for multiple human decision-makers.

Robust Conditional GAN from Uncertainty-Aware Pairwise Comparisons
Ligong Han^†, Ruijiang Gao, Mun Kim, Xin Tao, Bo Liu, Dimitris Metaxas.
Accepted to AAAI Conference on Artificial Intelligencen (AAAI), 2020
[arXiv] [poster] [Github] [bibtex]

TLDR: PC-GAN is a novel generative adversarial network using weak supervision (via the proposed Elo rating network) from pairwise comparisons for image attribute editing, achieving robust performance and comparable results to fully-supervised methods.

Unbiased Auxiliary Classifier GANs with MINE
Ligong Han^†, Anastasis Stathopoulos, Tao Xue, Dimitris Metaxas.
Accepted to Conference on Computer Vision and Pattern Recognition Workshops (CVPRW DeepMind Travel Award), 2020
[arXiv] [Github] [bibtex]

TLDR: We propose Unbiased Auxiliary GANs (UAC-GAN) that leverage the Mutual Information Neural Estimator (MINE) and a novel projection-based statistics network architecture to address the biased distribution issue in AC-GANs, resulting in improved performance on three datasets.

Unsupervised Domain Adaptation via Calibrating Uncertainties
Ligong Han^†, Yang Zou, Ruijiang Gao, Lezi Wang, Dimitris Metaxas.
Accepted to Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019
[arXiv] [Github] [bibtex]

TLDR: We propose a Renyi entropy regularization (RER) framework for unsupervised domain adaptation, which adapts from source to target domain by calibrating predictive uncertainties using variational Bayes learning, and demonstrate its effectiveness on three domain-adaptation tasks.

Learning Generative Models of Tissue Organization with Supervised GANs
Ligong Han^†, Robert F. Murphy, Deva Ramanan.
Accepted to Winter Conference on Applications of Computer Vision (WACV), 2018
[arXiv] [Github] [bibtex]

TLDR: We propose a two-stage and an end-to-end supervised GAN approaches for generating realistic electron microscope images with densely annotated sub-cellular structures.

Misc

[2015] MATLAB Code for Axis Label Alignment in 3D Plots
[File Exchange] [Github]

Align axis labels nicely in parallel with axes in MATLAB (3-D) plots. This file was selected as MATLAB Central Pick of the Week.

Website source from Jon Barron.