Show HN: Bhumi–OSS Python Library w Rust Underhead for 2.5x Faster LLM Inference
bhumi.trilok.aiRead the full blogpost at https://rach.codes/blog/Introducing-Bhumi
(click on reader to see the technical breakdown!) AI inference should be fast, but in practice it’s painfully slow. Inference bottlenecks slow down LLM-powered chatbots and AI workflows everywhere.
I built Bhumi to fix that. Bhumi is a Python library designed for developers, yet its performance-critical core is implemented in Rust (via PyO3) for near-native speed. This hybrid approach delivers up to 2.5x faster response times across providers like OpenAI, Anthropic, and Gemini—without changing the underlying model.
THE PROBLEM: SLOW AI INFERENCE
Most LLM clients suffer from three main issues: 1. Batch Processing Overhead – Clients wait for the full response instead of streaming data as it’s ready.
2. Inefficient Buffers – Default buffer sizes aren’t tuned for AI-generated text.
3. Validation Bottlenecks – Tools like Pydantic slow down structured response handling.
Bhumi tackles these challenges with a smarter architecture that blends Python’s ease of use with Rust’s raw speed.
HOW BHUMI MAKES AI FASTER 1. Rust-Based Streaming: Python’s async is useful, but integrating Rust through PyO3 brings near-native performance. Streaming inference starts instantly, cutting response times by up to 2.5x.
2. Smarter Buffer Management: Quality-Diversity algorithms (like MAP-Elites) dynamically discover optimal buffer sizes, boosting throughput by roughly 40%.
3. Replacing Pydantic with Satya: Pydantic was a performance sink. I built Satya—a Rust-backed validation library—that accelerates structured outputs dramatically.
PERFORMANCE BENCHMARKS:
• OpenAI: 2.5x faster response times
• Anthropic: 1.8x faster
• Gemini: 1.6x faster
• Minimal extra memory overhead
Bhumi is provider-agnostic, allowing you to switch between OpenAI, Anthropic, Groq, and more with a simple config change.
USING BHUMI (WITH TOOL USE & STRUCTURED OUTPUTS)
Bhumi makes tool integration effortless. For example, here’s how you can register a weather tool in Python:
import asyncio from bhumi.base_client import BaseLLMClient, LLMConfig
async def get_weather(location: str) -> str: return f”The weather in {location} is 75°F”
async def main(): config = LLMConfig(api_key=“sk-…”, model=“openai/gpt-4o-mini”) client = BaseLLMClient(config) client.register_tool(name=“get_weather”, func=get_weather) response = await client.completion([{“role”: “user”, “content”: “What’s the weather in SF?”}]) print(response[“text”])
asyncio.run(main())
WHAT’S NEXT? I’m actively working on:
• Supporting More Providers & Models
• Adaptive Streaming Optimizations
• Advanced Structured Outputs & Tooling
Bhumi is a Python-first library powered by a Rust underhead for performance.
Check out Bhumi on GitHub at https://github.com/justrach/bhumi or reach out at me@rachit.ai.