I’ve been incredibly busy lately, so I was only able to start writing my year-end summary after January 1st. Regardless, the fact that I’ve started writing is a good thing in itself.
Regarding the Layoff
When I was asked to join the Llama 4 firefighting in late January 2025, as someone who has always worked in Reinforcement Learning (RL), I actually drew a 2x2 reward matrix beforehand for the following 4 outcomes (although at the time, given the immense pressure, disagreement was almost impossible):
| Agree to Help | Refuse to Help | |
|---|---|---|
| Llama 4 Succeeds | Become Hero | Be Marginalized |
| Llama 4 Fails | Tried our best | Blamed for not stepping up |
At that time, what my thought was: if we go to help, even if the project ultimately fails, at least we did our best and can have a clear conscience. Unfortunately, what actually happened was a fifth outcome that falls out of my plan. This teaches me a lesson and brings me a better understanding of the complexity of life.
Despite this, during those few months of hard work, we get our hands dirty into core issues of RL in LLMs, such as RL stability, training and inference interaction, architecture design, interaction of pre-training and mid-training, long context, various ways of data generation, and RL infra design. These first-hand experience is precious and changes my mindset profoundly.
Having been at Meta for over ten years, leaving at some point is inevitable. It doesn’t make sense for me to stay until retirement. But the inertia take over due to various financial and family reasons. Over the last two years, I started to have this little secret wish that “Meta please fires me”, which ironically made me more open, bold, relaxed and confident. When I took my first “recharge” (1 month long leave every 5 years, as part of the company benefit) at the end of 2023, I was almost on the verge of leaving, but ended up not signing the offer. Now, Meta has made that decision for me, which is actually quite good.
This turmoil, along with the ups and downs of the year, has provided a wealth of new material for my upcoming fiction writing. As the saying goes, “Misfortune in an official career is fortune for the poet; verses become skillful when writing of vicissitudes.” If life is too flat and boring, it isn’t necessarily fun.
At the start of 2021, because of a few sentences of self-reflection about “why none of my papers got accepted,” I got a “Meet Most” rating (a mediocre performance review). While shocked, rather than complaining, I decided to pretend that I just got promoted. As it turned out, half a year later, this promotion came true. And that work from early 2021 that no one cared about? It won the ICML Best Paper Honorable Mention in July 2021, becoming a relatively famous paper in representation learning.
For a period after October 22nd, my communication channels were basically exploding. I received countless messages and emails every day, along with invitations for remote meetings or meetups; I simply couldn’t handle it all, and it took a few weeks to gradually cool down. Very thankful for all the cares I received so far. If I misse any messages, please forgive me.
While I got many reach-outs and offers from well-known companies, I finally decide to become a co-founder of a new stealth startup. I am still young so why not take some risk and try something interesting. No details for now. Work quietly first.
My Research Directions in 2025
For me, my main research directions for 2025 were Large Model Reasoning and Opening the Black Box.
Since our work on Continuous Latent Space Reasoning (Coconut, COLM’25) was released at the end of 2024, it set off a wave of interest in this direction in 2025. Everyone explored how to use this idea in Reinforcement Learning and pre-training, how to improve its training and computational efficiency, and so on. Although our group was pulled away to work on Llama shortly after and could not continue digging deep into it, I found this very encouraging. Despite that, we published a theoretical analysis (Reasoning by Superposition, NeurIPS ‘25) in the first half of the year, showing exactly how continuous latent space reasoning work, drawing quite some attention.
Another area is to improve the reasoning efficiency of large models. Our Token Assorted (ICLR’25) first learns discrete tokens in the latent space via VQVAE, then mixes such discrete tokens with text tokens for post-training. This reduces inference costs while improving performance. Our DeepConf decides whether a reasoning path should be terminated early by detecting the confidence level of each generated token; this significantly reduces the tokens used for reasoning, yet performs better in majority vote scenarios. ThreadWeaver accelerates inference speed by creating parallel reasoning Chains of Thought and performing post-training on them. In addition, we also experimented with training inference models using RL on dLLM (Sandwiched Policy Gradient) and learning reasoning on small models (MobileLLM-R1).
On interpretability (i.e., opening the blackbox of neural networks), I have been spending time on Grokking for about two years, as a natural extension of my previous representation learning work. While we know some behaviors of representation learning (e.g., they can collapse, they may learn PCA features, etc), it remains elusive what kind of representations are learned, the dynamics of learning process, how they relate to the structure of input data, and what kind of generalization the model can achieve. Analyzing the phenomenon of Grokking, in particular the phase transition from memorization to generalization, seems to me the best way to unlock this mystery.
It was indeed a very difficult problem. I start with COGS (NeurIPS ‘25), which can only deal with special cases (perfect training set, no training dynamics) and I am not satisfied. Fortunately, with extensive interaction with GPT-5, my recent work Provable Scaling Laws seems to make some strong progress. It characterizes the training dynamics of feature emergence quite well, and provides a clear picture how the transition between memorization and generalization happens, as well as the amount of data needed for this transition (i.e., provable scaling laws). While still relying on special data distribution (group structure), it goes beyond previous approach such as NTK. For a detailed explanation, please check this X post.
I really like our paper The Path Not Taken released around the year-end. It provides a preliminary answer at the weight level as to why the behaviors of RL and SFT (Supervised Fine-Tuning) are so different.
- SFT causes overfitting and catastrophic forgetting. Apparantly we blame that the training data isn’t “on-policy” enough. However, a deeper reason is that the principal components of the weights are directly and heavily modified by external data, causing the “foundation” of the model to become unstable, leading to a significant performance drop.
- RL, because it trains with on-policy data, it leaves the principal components of the weights unchanged, but only alters the minor ones. This avoids the problem of catastrophic forgetting, and the distribution of weight deltas tends to be sparser (especially under bf16 quantization).
Regarding Interpretability
While “how AI works so well” may not be the favorite topic of many AI researchers, my belief is that it will be crucial in the end. To see that, consider the two future scenarios:
- Scenario One: If we reach AGI or even ASI purely through Scaling, and the value of all human labors drops to zero while AI solves all problems as a giant black box, then ensuring that AI, as a superintelligence, remains benevolent, does not deceive, and does not do evil in hidden ways becomes most urgent. To solve this, we must have interpretability.
- Scenario Two: If the path of Scaling ultimately fails and we lose against exponentially growing resource demands, humans will continue to find other solutions. We will have to ask why the model performs well and what makes it fail. Once we want to understand, we return to interpretability research.
In either case, we need interpretability. Even if AI ends up being an omniscient, omnipotent, and benevolent god, driven by curiosity and the desire to find inner values, humans will inevitably study why AI performs so well. After all, a “black box” implies the birth of a chain of suspicion, when AI reaches or exceeds average human levels at an exponential rate, the “Dark Forest” rules from The Three-Body Problem might appear in a different form. But unlike the glooming future the novel depicts, we have a way to address it, which is to opening the black box.
I think the most challenging and most difficulty part of interpretability is how to achieve first principle explanation: Starting from the intrinsic structure of the data itself, how and why the models converge into these decoupled, sparse, low-rank, modular, and composable emergent features and circuits? How are these emergent structures related to the model architecture, the optimization algorithms, and the hyperparameters of model training? Only then, will interpretability truly move from biology-like evidence collection and explanation, to physics-like derivation from first principles. Eventually, this will guide practice and pave the way for the model design of the next generation of artificial intelligence.
Comparing this to physics four hundred years ago: we currently have AI Tycho Brahe (collecting evidences) and some AI Kepler (proposing hypotheses), but we do not yet have AI Newton (discovering principles).
When that day comes, I believe the world will be turned upside down.