Deliberative Alignment improves LLM robustness to jailbreak attacks — but does it introduce new vulnerabilities? We designed five novel attacks targeting models with Deliberative Alignment: three bypassing reasoning and two exploiting reasoning itself, achieving over 80% attack success rate.
@article{chen2025bag,title={Bag of Tricks for Subverting Reasoning-based Safety Guardrails},author={Chen, Shuo and Han, Zhen and He, Bailan and Si, Shengyu and Wu, Jingpei and Torr, Philip and Tresp, Volker and Gu, Jindong},journal={Under Review},year={2025},}
COLM
Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs
Yuan He, Bailan He, Zifeng Ding, and 8 more authors
LLMs often produce asymmetric predictions for logically equivalent facts, leading to factual inconsistency and hallucination-like errors. We conducted a large-scale empirical study revealing that pre-training entity frequency distribution induces systematic bias in model predictions, identifying a root cause of factual inconsistency.
@article{he2025asymmetry,title={Supposedly Equivalent Facts That Aren't? Entity Frequency in Pre-training Induces Asymmetry in LLMs},author={He, Yuan and He, Bailan and Ding, Zifeng and Lupidi, Alisia and Zhu, Yuqicheng and Chen, Shuo and Zhang, Caiqi and Chen, Jiaoyan and Ma, Yunpu and Tresp, Volker and others},year={2025},journal={Conference on Language Modeling}}
WACV
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
Shuo Chen, Zhen Han, Bailan He, and 4 more authors
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025
MLLMs’ ability to leverage visual context for few-shot adaptation was not well understood. We conducted large-scale analysis of modality contributions and co-developed MMICES for mixed-modality demo selection, improving model accuracy on downstream tasks with more efficient demo selection strategies.
@article{chen2024multimodal,title={Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?},author={Chen, Shuo and Han, Zhen and He, Bailan and Buckley, Mark and Torr, Philip and Tresp, Volker and Gu, Jindong},journal={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},year={2025},}
2024
ICLR
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
Shuo Chen, Zhen Han, Bailan He, and 5 more authors
International Conference on Learning Representations (ICLR), 2024
MLLMs like GPT-4V lacked systematic evaluation against multimodal jailbreak attacks, especially those combining text and images. We developed a red-teaming benchmark with 1,445 harmful prompts across 11 safety categories covering uni- and multimodal attacks, benchmarking 11 proprietary and open-source models.
@article{chen2024red,title={Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?},author={Chen, Shuo and Han, Zhen and He, Bailan and Ding, Zifeng and Yu, Wenqian and Torr, Philip and Tresp, Volker and Gu, Jindong},journal={International Conference on Learning Representations (ICLR)},year={2024},}
2023
ISWC
Forecasttkgquestions: A benchmark for temporal question answering and forecasting over temporal knowledge graphs
Zifeng Ding, Zongyue Li, Ruoxia Qi, and 8 more authors
Existing TKGQA benchmarks assume access to full temporal KG including future facts; they cannot evaluate forecasting capabilities. We proposed a new task "forecasting TKGQA", created a large-scale benchmark with entity prediction, yes/unknown, and fact reasoning questions.
@article{ding2023forecast,title={Forecasttkgquestions: A benchmark for temporal question answering and forecasting over temporal knowledge graphs},author={Ding, Zifeng and Li, Zongyue and Qi, Ruoxia and Wu, Jingpei and He, Bailan and Ma, Yunpu and Meng, Zhao and Chen, Shuo and Liao, Ruotong and Han, Zhen and others},journal={International Semantic Web Conference},pages={541--560},year={2023},}
IJCNN
Learning meta-representations of one-shot relations for temporal knowledge graph link prediction
Zifeng Ding, Bailan He, Jingpei Wu, and 3 more authors
Dynamic knowledge graphs require robust reasoning under temporal evolution, few-shot relations, and forward-looking tasks. We co-developed a lightweight temporal graph encoder, proposed concept-aware few-shot inductive methods, and meta-representations for one-shot relations.
@article{ding2023meta,title={Learning meta-representations of one-shot relations for temporal knowledge graph link prediction},author={Ding, Zifeng and He, Bailan and Wu, Jingpei and Ma, Yunpu and Han, Zhen and Tresp, Volker},booktitle={2023 International joint conference on neural networks (IJCNN)},pages={1--10},year={2023},organization={IEEE},}