Blog

How Databricks’ FlashOptim cuts LLM training memory by 50 percent

Ben Dickson

Training large language models usually requires a cluster of GPUs. FlashOptim changes the math, enabling full-parameter training on fewer accelerators.

How sparse attention solves the memory bottleneck in long-context LLMs

Ben Dickson

As AI agents take on longer tasks, the KV cache of LLMs has become a massive bottleneck. Discover how sparse attention techniques are freeing up GPU memory.

How ‘semantic chaining’ jailbreaks image generation models

Ben Dickson

Semantic Chaining exploits the fragmented safety architecture of multimodal models, bypassing filters by hiding prohibited intent within a sequence of benign edits.

How Sakana AI’s new technique solves the problems of long-context LLM tasks

Ben Dickson

RePo, Sakana AI’s new technique, solves the "needle in a haystack" problem by allowing LLMs to organize their own memory.

Smarter trade: How AI turns regulatory burden into competitive edge

Contributor

Stop reacting to compliance violations and start preventing them. See how AI empowers organizations to turn regulatory discipline into an engine for innovation and growth.

Recursive Language Models: A new framework for infinite context in LLMs

Ben Dickson

Brute-forcing larger context windows is hitting a mathematical wall. Here is how MIT’s new framework solves "context rot" to process 10 million tokens and beyond.

Microsoft’s new Rho-alpha model brings tactile sensing to robotics

Ben Dickson

Microsoft’s Rho-Alpha upgrades Vision-Language-Action models with tactile data to bridge the gap between semantic reasoning and low-level motor control.

Vulnerability in Perplexity’s BrowseSafe shows why single models can’t stop prompt injection

Ben Dickson

Lasso Security compromised Perplexity’s BrowseSafe guardrail model for AI browsers, proving that "out-of-the-box" tools fail to stop prompt injection attacks.

How test-time training allows models to ‘learn’ long documents instead of just caching them

Ben Dickson

By treating language modeling as a continual learning problem, the TTT-E2E architecture achieves the accuracy of full-attention Transformers on 128k context tasks while matching the speed of linear models.

VL-JEPA is a lean, fast vision-language model that rivals the giants

Ben Dickson

Meta’s VL-JEPA outperforms massive vision-language models on world modeling tasks by learning to predict "thought vectors" instead of text tokens.

How C-JEPA is teaching AI the physics of the physical world

How Databricks’ FlashOptim cuts LLM training memory by 50 percent

How sparse attention solves the memory bottleneck in long-context LLMs

How ‘semantic chaining’ jailbreaks image generation models

How Sakana AI’s new technique solves the problems of long-context LLM…

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

OpenAI’s GPT-5: A reality check for the AI hype train

OpenAI’s grand return to open source: unpacking the gpt-oss release

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

How C-JEPA is teaching AI the physics of the physical world

How Databricks’ FlashOptim cuts LLM training memory by 50 percent

How sparse attention solves the memory bottleneck in long-context LLMs

How ‘semantic chaining’ jailbreaks image generation models

How Sakana AI’s new technique solves the problems of long-context LLM tasks

Smarter trade: How AI turns regulatory burden into competitive edge

Recursive Language Models: A new framework for infinite context in LLMs

Microsoft’s new Rho-alpha model brings tactile sensing to robotics

Vulnerability in Perplexity’s BrowseSafe shows why single models can’t stop prompt injection

How test-time training allows models to ‘learn’ long documents instead of just caching them

VL-JEPA is a lean, fast vision-language model that rivals the giants