Show HN: Extrai – An open-source tool to fight LLM randomness in data extraction

github.com

4 points by elias_t 15 hours ago

A few months ago I was working on a flight search engine that would include pet transport costs (I know a few by hearth but storing them and make the calculations in the UI would be nice) While I was collecting pet pricing from several airlines I strugled to extract data in a common format without hallucinated values.

That's when I thought: What if I use multiple LLMs and take the most common response to improve accuracy?

This idea became this new project. You provide your documents, an SQLModel schema, an LLM provider, plus what you'd like to extract and Extrai does the rest. Including storing them in a DB afterwards.

There are a few other features such as SQLModel generation based on your documents, hierarchical extraction to manage nested objects more efficiently and built-in analytics.

Feedback is more than welcomed, since it's still a work in progress!

I built a landing page for people interested in a managed solution (I also had fun with threeJS!)

Let me know what you think

Landing page: extrai.xyz Github: https://github.com/Telsho/Extrai