Blog

How self-improving harnesses are rewriting the agent engineering playbook

Ben Dickson

With harness engineering becoming a main focus of AI engineering, new frameworks allow AI agents to write their own execution logic and optimize their performance.

How Nvidia’s ASPIRE framework accelerates robot programming with self-improving AI

Ben Dickson

ASPIRE and the new era of self-improving AI frameworks are drastically reducing token costs and deployment friction for real-world robotics applications.

How the AI arms race moved from smart models to full-stack infrastructure

Ben Dickson

A breakdown of how OpenAI, Nvidia, Google, and Amazon are shifting their development strategies to capture value across every layer of the tech stack.

Why LLMs should stop thinking out loud (and what comes after chain-of-thought)

Ben Dickson

Chain-of-Thought prompting is slow, expensive, and largely an illusion. The future of machine reasoning happens in latent space.

Beyond vibe coding: How Codev 3.0 engineers the AI-powered dev team

Ben Dickson

Casual AI prompting breaks down as codebases grow. Codev introduces strict protocols and multi-model reviews to help teams ship maintainable software.

How Cursor’s Composer 2.5 uses self-distillation to beat the frontier LLMs at coding

Ben Dickson

A deep look at the self-distillation techniques that make Composer 2.5 such a great coding model (and the hidden tradeoffs they introduce to AI reasoning).

3D volumetric CT scan showing human jaw with nerve canal, ramus, condyle, and mental foramen labeled

Vertical integration as AI infrastructure: What 21D’s full arch implant system teaches us about building autonomous clinical AI

Contributor

A technical breakdown of how 21D built an end-to-end autonomous AI pipeline for one of medicine's most complex procedures — and the architectural decisions that made it work

Why sandboxing OpenClaw doesn’t stop data exfiltration

Ben Dickson

Research into Nvidia’s NemoClaw reveals that sandboxes don't stop AI agents like OpenClaw from leaking data. We need to rethink security from first principles.

Google brings multi-token prediction Gemma 4 LLMs

Ben Dickson

How Gemma 4’s multi-token prediction and community-driven DFlash are speeding up local LLM throughput by 3-6x.

How Memory Sparse Attention scales LLM memory to 100 million tokens

Ben Dickson

Memory Sparse Attention (MSA) scales LLM context windows to an unprecedented 100 million tokens while preserving accuracy.

Moving beyond passive RAG: How to implement active memory reconstruction for…

How self-improving harnesses are rewriting the agent engineering playbook

How Nvidia’s ASPIRE framework accelerates robot programming with self-improving AI

How the AI arms race moved from smart models to full-stack…

Why LLMs should stop thinking out loud (and what comes after…

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

Demystifying loop engineering: Get more from AI agents, avoid loopmaxxing

Why the future of agentic AI is all about the harness

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

Moving beyond passive RAG: How to implement active memory reconstruction for AI agents

How self-improving harnesses are rewriting the agent engineering playbook

How Nvidia’s ASPIRE framework accelerates robot programming with self-improving AI

How the AI arms race moved from smart models to full-stack infrastructure

Why LLMs should stop thinking out loud (and what comes after chain-of-thought)

Beyond vibe coding: How Codev 3.0 engineers the AI-powered dev team

How Cursor’s Composer 2.5 uses self-distillation to beat the frontier LLMs at coding

Vertical integration as AI infrastructure: What 21D’s full arch implant system teaches us about building autonomous clinical AI

Why sandboxing OpenClaw doesn’t stop data exfiltration

Google brings multi-token prediction Gemma 4 LLMs

How Memory Sparse Attention scales LLM memory to 100 million tokens