
Filipa Lino
PhD Student in Electrical and Computer Engineering
SIPg/ISR
This presentation summarizes the paper “Energy-Based Transformers are Scalable Learners and Thinkers”, which combines energy-based models with standard Transformers to let neural networks iteratively refine and self-verify their predictions, trading extra computation for better out-of-distribution generalization.
Presentation
Key Takeaways
- Energy-Based Transformers learn an energy landscape over predictions and refine them through gradient-based “thinking” steps.
- They approximate “System 2” reasoning by allocating more computation to harder predictions and less to easier ones.
- Early experiments in language and video modeling suggest that this may be a promising direction for scalable, generalizable reasoning.
Reference
, 2025. Energy-Based Transformers are Scalable Learners and Thinkers. https://arxiv.org/pdf/2507.02092
