Some Thoughts on Inception Labs' Diffusion LLM Mercury API
March 27, 2025 · 5 min read
Overview
✨ Some Thoughts on Inception Labs' Diffusion LLM Mercury API ✨
Inception Labs is a frontier lab working on diffusion-based large language models. Their model architecture is absolutely fascinating and could pave the way towards future breakthroughs in AI.
Since receiving an invite to their API beta last night, I finally found a few minutes today to sit down and play with their mercury-coder-small model. Quick overview of the API:
- Context window is currently limited to 16K tokens, and 8K output tokens.
- Does not yet support tool-calling.
- API latency is currently quite high.
- Beta API is free to use.
I've so far only had time to do some multi-run needle in the haystack (NIAH) tests, hoping I'll find time for more comprehensive evals over the weekend. Quick thoughts on model behaviour:
- Due to the limited context window and restrictive rate limiting, more complex benchmarks are hard to run at the moment.
- Very curious behaviour where its needle retrieval accuracy in the beginning of the context window is very weak! Across a 10 turn NIAH run, it failed to retrieve the needle every time at 10% depth, but it had no problem at 50% or 90% depth. This is surprising as most other LLMs tend to perform well in the beginning and ends of contexts.
- Severe hallucination (which was my main worry with this model) where the model outputs a lot of nonsense.
- Very high token throughput, though I will need some more time to properly quantify this.
Overall, fascinating model and lucky to be part of the beta. Will play around with this more and congrats to the Inception Labs team for the launch!