Blog

How test-time training allows models to ‘learn’ long documents instead of just caching them

Ben Dickson

By treating language modeling as a continual learning problem, the TTT-E2E architecture achieves the accuracy of full-attention Transformers on 128k context tasks while matching the speed of linear models.

VL-JEPA is a lean, fast vision-language model that rivals the giants

Ben Dickson

Meta’s VL-JEPA outperforms massive vision-language models on world modeling tasks by learning to predict "thought vectors" instead of text tokens.

URM shows how small, recurrent models can outperform big LLMs in reasoning tasks

Ben Dickson

The key to solving complex reasoning isn't stacking more transformer layers, but refining the "thought process" through efficient recurrent loops.

The hidden architecture behind AI systems that don’t break under growth

Contributor

Most systems break at 100x growth. Real scalability depends on architecture, data quality, and organizational design, not just writing better code.

A few interesting observations on Gemini 3 Flash

Ben Dickson

Google didn’t reveal a lot of information about its Gemini 3 Flash model. So we had to speculate a lot on what is going on under the hood.

How Nvidia changed the open source AI game with Nemotron 3

Ben Dickson

As the industry shifts from chatbots to multi-agent workflows, Nvidia's Nemotron 3 offers a blueprint for efficient, long-context reasoning.

Why AI benchmarks are broken

Ben Dickson

AI labs are racing to overtake each other on key industry benchmarks. But this intense race has stripped the benchmarks of most of their value.

Salesforce tackles the ‘brittleness’ of web agents with new WALT framework

Ben Dickson

WALT abstracts away the chaos of dynamic layouts, allowing AI to focus on high-level planning instead of low-level clicks.

Beyond raw intelligence: How Poetiq cracked the ARC-AGI-2 benchmark

Ben Dickson

The verified solution achieves 54% accuracy on the semi-private test set, outperforming Gemini 3 Deep Think at less than half the cost.

OpenAI’s code red: The curse of being at the forefront of AI

Ben Dickson

OpenAI’s problem is not that it doesn't have the best model anymore but that the general feeling is that it has fallen behind.

Why sandboxing OpenClaw doesn’t stop data exfiltration

Google brings multi-token prediction Gemma 4 LLMs

How Memory Sparse Attention scales LLM memory to 100 million tokens

Claude Code is leaking API keys into public package registries

Anthropic’s MCP vulnerability: When ‘expected behavior’ becomes a supply chain nightmare

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

OpenAI’s GPT-5: A reality check for the AI hype train

OpenAI’s grand return to open source: unpacking the gpt-oss release

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

Why sandboxing OpenClaw doesn’t stop data exfiltration

How test-time training allows models to ‘learn’ long documents instead of just caching them

VL-JEPA is a lean, fast vision-language model that rivals the giants

URM shows how small, recurrent models can outperform big LLMs in reasoning tasks

The hidden architecture behind AI systems that don’t break under growth

A few interesting observations on Gemini 3 Flash

How Nvidia changed the open source AI game with Nemotron 3

Why AI benchmarks are broken

Salesforce tackles the ‘brittleness’ of web agents with new WALT framework

Beyond raw intelligence: How Poetiq cracked the ARC-AGI-2 benchmark

OpenAI’s code red: The curse of being at the forefront of AI