Energy-Based Transformers are Scalable Learners and Thinkers by Filipa Lino

Filipa Lino
PhD Student in Electrical and Computer Engineering
SIPg/ISR

This presentation summarizes the paper “Energy-Based Transformers are Scalable Learners and Thinkers”, which combines energy-based models with standard Transformers to let neural networks iteratively refine and self-verify their predictions, trading extra computation for better out-of-distribution generalization.

Presentation

Key Takeaways

Energy-Based Transformers learn an energy landscape over predictions and refine them through gradient-based “thinking” steps.
They approximate “System 2” reasoning by allocating more computation to harder predictions and less to easier ones.
Early experiments in language and video modeling suggest that this may be a promising direction for scalable, generalizable reasoning.

Reference

Gladstone, Alexi and Nanduru, Ganesh and Islam, Md Mofijul and Han, Peixuan and Ha, Hyeonjeong and Chadha, Aman and Du, Yilun and Ji, Heng and Li, Jundong and Iqbal, Tariq, 2025. Energy-Based Transformers are Scalable Learners and Thinkers. https://arxiv.org/pdf/2507.02092