Genuine question: I appreciate the comments about MongoDB being much better than it was 10 years ago; but Postgres is also much better today than then as well. What situations is Mongo better than Postgres? Why choose Mongo in 2025?
Mongo is real distributed and scalable DB, while postgres is single server DB, so main consideration could be if you need to scale beyond single server.
High availability is more important than scalability for most.
On average an AWS availability zone tends to suffer at least one failure a year. Some are disclosed. Many are not. And so that database you are running on a single instance will die.
Question is do you want to do something about it or just suffer the outage.
It's sad that this was downvoted. It's literally true. MongoDB vs. vanilla Postgres is not in Postgres' favor with respect to horizontal scaling. It's the same situation with Postgres vs. MySQL.
That being said there are plenty of ways to shard Postgres that are free, e.g. Citus. It's also questionable whether many need sharding. You can go a long way with simply a replica.
Postgres also has plenty of its own strengths. For one, you can get a managed solution without being locked into MongoDB the company.
a) MongoDB has built-in, supported, proven scalability and high availability features. PostgreSQL does not. If it wasn't for cloud offerings like AWS Aurora providing them no company would even bother with PostgreSQL at all. It's 2025 these features are not-negotiable for most use cases.
b) MongoDB does one thing well. JSON documents. If your domain model is built around that then nothing is faster. Seriously nothing. You can do tuple updates on complex structures at speeds that cripple PostgreSQL in seconds.
c) Nobody who is architecting systems ever thinks this way. It is never MongoDB or PostgreSQL. They specialise in different things and have different strengths. It is far more common to see both deployed.
What's wild is you misrepresenting what I said which was:
"built-in, supported, proven scalability and high availability"
PostgreSQL does not have any of this. It's only good for a single server instance which isn't really enough in a cloud world where instances are largely ephemeral.
> It's 2025 these features are not-negotiable for most use cases.
Excuse me? I do enterprise apps, along with most of the developers I know. We run like 100 transactions per second and can easily survive hours of planned downtime.
It's 2025, computers are really fast. I barely need a database, but ACID makes transaction processing so much easier.
I understand the criticisms, but in my experience, MongoDB has come a long way. Many of the earlier issues people mention have been addressed. Features like sharding, built-in replication, and flexible schemas have made scaling large datasets much smoother for me. It’s not perfect, but it’s a solid choice.
Easy. Sometimes it's more than you need, and there's no reason to use sqlite when you can just write things to a flat text file that you can `grep` against.
Can you share more details about the conditions under which it is slow in recent versions? We moved from 3.x to 7 for our main database and after adding a few indexes we were missing we have seen at least an order of magnitude speed up.
I think 8 was a release purely focused on performance, with some big improvements. Comparing 3.4 is kinda unfair.. You were fast with the tradeoff of half your data missing half the time
Only skimmed through the release..I hope they continue supporting the API but it comes with a little higher confidence that the company behind it is not collecting all your data. Voyage has some interesting embedding models that I have been hesitant to fully utilize due to the lack of confidence in the startup behind it.
We use it a lot for a specific use-case and it works great. Mongo has come a long long way since the release over a decade ago, and if you keep it in Majority Read and Write, it's very reliable.
Also, on some things, it allows us to pivot much faster.
And now with the help of LLMs, writing "Aggregation Pipelines" are very fast.
I've been using Mongo while developing some analysis / retrieval systems around video, and this is the correct answer. Aggregation pipelines allow me to do really powerful search around amorphous / changing data. Adding a way to automatically update / recalculate embeddings to your database makes even more sense.
Do you have any tricks for writing and debugging pipelines? I feel like there are so many little hiccups that I spend ages figuring out if that one field name needs a $ or not.
Pretty sure they achieved fiscal nirvana by exploiting enterprise brain rot. You hook em, they accumulate tech debt for years, all their devs leave, now they can't move away & you can start increasing prices. Eventually the empty husk will topple over but that's still years away.
They do have a good product, but "they accumulate tech debt for years, all their devs leave, now they can't move away" is the story of the place I worked at a few years ago. The database was such a disorganized, inconsistent mess that no-one had the stomach (or budget) to try and get off it.
I never understood this argument, there are many great products running on Java, PHP, Ruby, JavaScript... All of these languages have a "crowd" that hates them for historic and/or esoteric reasons.
Great products are in my opinion a function of skill and care. The only benefit a "popular" tool or language gets you is a greater developer pool for hiring.
that’s what I thought, but every single candidate I interviewed mentioned MongoDB as their recent reference document database, I asked the last candidate if they were self-hosting, the answer is no, they used MongoDB cloud.
You cant use the embeddings/vector search stuff this refers to in self hosted anyway, it’s only implemented in their Atlas Cloud product. It makes it a real PITA to test locally. The Atlas Dev local container didn’t work the same when I tried it earlier in the year.
I self host a handful of mongodb deployments for personal projects and manage self hosted mongo deployments of almost a hundred nodes for some companies. Atlas can get very expensive if you need good IO.
Precisely, and if you are enterprise, you want to have an option to request priority support and have a lot of features out of the box. Also some of the search features are only available in Atlas unfortunately.
There are a lot of people still on it, including the place I worked at last.
It was starting to get expensive though, so we were experimenting with other document stores (dynamodb was being trialled, since we were already AWS for most things, just around the time I left)
This may be a shock to many HN readers, but MongoDB's revenue has been growing quite fast in the last few years (from 400M in 2020 to 1.7B in 2024). They've been pushing Atlas pretty hard in the Enterprise world. Have no experience with it myself, but I've heard some decently positive things about it (ease of set up and maintenance, reliability).
I will copy and paste a comment I wrote here previously:
"MongoDB ships with horizontal sharding out-of-the-box, has idiomatic and well-maintained drivers for pretty much every language you could want (no C library re-use), is reasonably vendor-neutral and can be run locally, and the data modeling it encourages is both preferential for some people as well as pushes users to avoid patterns that don't scale very well with other models. Whether these things are important to you is a different question, but there is a lot to like that alternatives may not have answers for. If you currently or plan on spending > 10K per month on your database, I think MongoDB is one of the strongest choices out there."
I have also run Postgres at very large scale. Postgres' JSONB has some serious performance drawbacks that don't matter if you don't plan on spending a lot of money to run your database, but MongoDB does solve those problems. This new documentdb extension from Microsoft may solve some of the pain, but this is some very rough code if you browse around, and Postgres extensions are quite painful to use over the long term.
The reality is that it is not possible to run vanilla Postgres at scale. It's possible to fix its issues with third party solutions or cobbling together your own setup, but it takes a lot of effort and knowledge to ensure you've done things correctly. It's true that many people never reach that scale, but if you do, you're willing to spend a lot of money on something that works well.
> MongoDB ships with horizontal sharding out-of-the-box
Maybe it's better than it was, but my experience with Mongodb a decade ago is that that horizontal sharding didn't work very well. We constantly ran into data corruption and performance issues with rebalancing the shards. So much so that we had a company party to celebrate moving off of Mongodb.
also, I'd challenge your thinking - ultimately the goal is to solve problems. you don't necessarily need SQL, or relations for that matter. that being said, naively modeling your stuff in mongodb (or other things like dynamodb) will cause you severe pain...
what's also true, which people forget, is naively modeling your stuff with a relational database will also cause you pain. as they sometimes say, normalize until it hurts, and then denormalize to scale and make it work
the amount of places I've seen that skip the second part and have extremely normalized databases makes me cringe. it's like people think joins are free...
Then your implementation can be as simple as CREATE TABLE documents (content JSONB);. But I suspect a PK and some metadata columns like timestamps will come in handy.
sigh - mongoDB is not the same as creating a table with jsonb. for one, you don't have to deal with handling connections. that being said, postgres is great, but it's not the same.
I have seen a few rather large, production mongodb deployments. I don't understand how so many people chose it as their basis of their applications. There are a not-negligible amount of mongodb deployments I have seen that basically treat mongodb as a memory dump, where they then scan from some key and hope for the best. I have never seen a mongodb solution where I thought that it was better than if they just chose any sql server.
SQL or rather just some schema based database has a ton of advantages. Besides speed, there is a huge benefit for developers to be able to look at a schema and see how the relationships in the data work. Mongodb usually involves looking at a de facto schema, but with fewer guarantees on types relations or existence, then trawling code for how its used.
We use their atlas offering. It’s a bit pricey but we are very happy with it. It’s got a bunch of stuff integrated - vectors, json (obviously), search and charting along with excellent support for drivers and very nice out of the box monitoring.
Now I could possible spend a bunch of time and do the same thing with open source dbs - but why? I have a small team and stuff to deliver. Atlas allows me to do it fast.
Similar here, there are gotchas though. Some versions ago they've changed their query optimization engine - some of our "slow aggregations" become "unresponsive aggregations" because suboptimal indexes were suddenly used. We had to use hints to force proper indexing. Their columnar db offering is quite bad - I'd say if there's need for analytical functionality, its better to go with a different db. Oplog changes format - and although its expected, it still hurts me every now and then when I need to check something. Similarly at some point they've changed how nested arrays are updated in changestream, which has broken our auditing (its not recommended to use changestream for auditing, we still did ;) ). We've started using NVM instances for some of our more heavily used clusters. Well it turned out recovery of an NVM cluster is much much slower than a standard cluster. But all in all I really like mongodb, if there are no relations - its a good choice. Its also good for prototyping.
There’s a ton of hosted Postgres providers that do all of that and more and are just as simple to use. Neon.tech is really easy to set up and if you need more of a baas (firebase alternative), Supabase. Plus, no vendor lock in. I’ve moved vendors several times, most recently AWS RDS to Neon and it was nearly seamless. Was originally on Heroku Postgres going way back. Try getting off Atlas…
Ha - easier said than done in an enterprise, especially when nothing is wrong. Maybe the $$, but at some point the effort involved with supply chain and reengineering dwarfs any “technical” benefit.
This is why startups like to get into a single supply chain contract with an enterprise - it’s extremely hard to get it setup, but once done very easy to reuse the template.
It's simply not that widespread of knowledge. Modern Postgres users would never suggest Mongo, but a generation of engineers was taught that Mongo is the NoSQL solution, even though it's essentially legacy tech.
I just ran into a greenfield project where the dev reached for Mongo, and didn't have a good technical reason for it beyond "I'm handing documents". Probably wasn't aware of alternatives. FWIW Postgres would've been a great fit for it, they were modeling research publications.
Um because it must be worth 2 billion if this acquisition is worth $220 million. I know there’s rules about discussion quality on this site, so I guess we can’t question that.
10x exit in a couple years, quite nice on the VC side!
On the tech side ... no idea what Mongo's plan is ... their embedding model is not SOTA, does not even outperform the open ones out there, and reranking is a dead end in 2025.
I think the value is on Voyage's team, their user base and having a vision that aligned with Mongo's.
>their embedding model is not SOTA, does not even outperform the open ones out there, and reranking is a dead end in 2025.
Are you referring to the MTEB leaderboard? It's widely believed many of those test datasets are considered during the training of most open-source text embedding models, hence why you see novel + private benchmarks discussed in many launch blogs that don't exclusively refer to MTEB. There are problems there, and it would be great to see more folks in the search benchmark dataset production space like what Marqo AI has done in recent months.
Also what makes you say reranking is dead? Mongo doesn't provide it out of the box but many other search providers like ES, Pinecone, Opensearch do so it must provide some value to their customers? Maybe you're saying it's overrated in terms of how many apps actually need it?
Taking a step back, accuracy/quality of retrieval is critical as input to anything generated b/c your generated output is only as good as your input. And right now folks are struggling to adopt generative use cases due to risk and fear of how to control outputs. Therefore I think this could be bigger than you think.
Interesting take. Have you benchmarked models on your own data? Cause at this point everything is contaminated so I find it impossible to tell what proper sota is. Also - most folks still just use openai.
Last time I checked, reranking always performs better than pure vector search. And to my knowledge it's still the superior fusion method for keyword and vector results.
In my experience, storing RAG chunks with a little bit of context helps a lot when doing the retrieval, then you can skip the whole "rerank" bit and halve your cost and latency.
With embedding/generative models becoming better with time, the need for a rerank step will be optimized away.
You don't hear the big AI providers talk about embeddings much, but I have to believe in the long run that companies building SOTA foundational LLMs are going to ultimately have the best embedding models.
Unless you can get to a point where you can make these models small enough that basically sit in the DB layer of an application...
That and because the embedding models are much easier to improve with at scale usage (hence why everyone has a deep search/research/RAG tool built into their AI web app now).
If I had to guess they might see embedding models become small and optimised enough to the point that they can pull them into the DB layer as a feature instead of being something devs need to actively think about and build into their app.
Or it could just be an expansion to their cloud offering. In a lot of cases embedding models just need to be 'good enough' and cheap and/or convenient is a winning GTM approach.
Genuine question: I appreciate the comments about MongoDB being much better than it was 10 years ago; but Postgres is also much better today than then as well. What situations is Mongo better than Postgres? Why choose Mongo in 2025?
Mongo is Web scale.
Choose Mongo if you need web scale.
Mongo is real distributed and scalable DB, while postgres is single server DB, so main consideration could be if you need to scale beyond single server.
Ahhh, this sounds familiar! https://www.youtube.com/watch?v=b2F-DItXtZs
High availability is more important than scalability for most.
On average an AWS availability zone tends to suffer at least one failure a year. Some are disclosed. Many are not. And so that database you are running on a single instance will die.
Question is do you want to do something about it or just suffer the outage.
It's sad that this was downvoted. It's literally true. MongoDB vs. vanilla Postgres is not in Postgres' favor with respect to horizontal scaling. It's the same situation with Postgres vs. MySQL.
That being said there are plenty of ways to shard Postgres that are free, e.g. Citus. It's also questionable whether many need sharding. You can go a long way with simply a replica.
Postgres also has plenty of its own strengths. For one, you can get a managed solution without being locked into MongoDB the company.
Citus is owned by Microsoft.
And history has not been nice to startups like this continuing their products over the long term.
It's why unless it is built-in and supported it's not feasible for most to depend on it.
that's fair, but that's true of mongodb itself too. I wouldn't count that against either of them.
MongoDB makes money selling and supporting MongoDB.
Microsoft does not make money supporting Citus.
Simple.
Postgres is hard, you have to learn SQL. SQL is hard and mean.
Mongo means we can just dump everyone into a magic box and worry about it later.No tables to create.
But their is little time, we need to ship our CRUD APP NOW! No one on the team knows SQL!
I'm actually using Postgres via Supabase for my current project, but I would probably never use straight up Postgres.
Postgres supports JSONB natively. It literally speaks mongo line protocol and you can shove unstructured json into it.
It has supported this since 9.4: https://www.postgresql.org/docs/current/datatype-json.html
I don't necessarily agree with the above justifications, but in my experience this is basically why teams pick Mongo.
It's easier to get started with.
Even as a JSON document store I'd rather use postgres with a jsonb column.
a) MongoDB has built-in, supported, proven scalability and high availability features. PostgreSQL does not. If it wasn't for cloud offerings like AWS Aurora providing them no company would even bother with PostgreSQL at all. It's 2025 these features are not-negotiable for most use cases.
b) MongoDB does one thing well. JSON documents. If your domain model is built around that then nothing is faster. Seriously nothing. You can do tuple updates on complex structures at speeds that cripple PostgreSQL in seconds.
c) Nobody who is architecting systems ever thinks this way. It is never MongoDB or PostgreSQL. They specialise in different things and have different strengths. It is far more common to see both deployed.
A) Postgres easily scales to billions of rows without breaking a sweat. After that shard. It’s definitely negotiable.
So does a text file.
Statements like yours are meaningless when you aren't specific about the operations, schema, access patterns etc.
If you have a single server, relational use case then PostgreSQL is great. But like all technology it's not great at everything.
The use a text file.
In all seriousness, calling Postgres’ scalability “not-negotiable for most use cases” is wild.
What's wild is you misrepresenting what I said which was:
"built-in, supported, proven scalability and high availability"
PostgreSQL does not have any of this. It's only good for a single server instance which isn't really enough in a cloud world where instances are largely ephemeral.
> It's 2025 these features are not-negotiable for most use cases.
Excuse me? I do enterprise apps, along with most of the developers I know. We run like 100 transactions per second and can easily survive hours of planned downtime.
It's 2025, computers are really fast. I barely need a database, but ACID makes transaction processing so much easier.
MongoDB has had ACID transactions for many years. I encourage folks to at least read up on the topic they are claiming to have expertise in
Great response. All arguments are valid and fair.
I understand the criticisms, but in my experience, MongoDB has come a long way. Many of the earlier issues people mention have been addressed. Features like sharding, built-in replication, and flexible schemas have made scaling large datasets much smoother for me. It’s not perfect, but it’s a solid choice.
I think the amount of people working on large enterprise systems here is a lot smaller than one would think.
Whenever a fly.io post about sqlite ends up in here, there are a scary amount of comments about using sqlite in way more scenarios than it should be.
Why would I use anything other than sqlite?
Easy. Sometimes it's more than you need, and there's no reason to use sqlite when you can just write things to a flat text file that you can `grep` against.
True. I have that feeling many times that the enterprise crowd doesnt visits hacker news.
Bloomberg says it was a $220M cash & stock deal: https://www.bloomberg.com/news/articles/2025-02-24/mongodb-b...
I rather them focus on performance.
Last MongoDB is still slower than MongoDB 3.4. An almost 10-year old release. For both reads and writes.
Can you share more details about the conditions under which it is slow in recent versions? We moved from 3.x to 7 for our main database and after adding a few indexes we were missing we have seen at least an order of magnitude speed up.
Most regular inserts and regular selects: https://medium.com/serpapi/mongodb-benchmark-3-4-vs-4-4-vs-5...
We have internally a benchmark with MongoDB 8.x, but same pattern of disappointing results.
I think 8 was a release purely focused on performance, with some big improvements. Comparing 3.4 is kinda unfair.. You were fast with the tradeoff of half your data missing half the time
That might explain the write performance degradation, but not the reads.
mongodb had consistency issues before v5 if I recall, so take that for what it's worth.
Only skimmed through the release..I hope they continue supporting the API but it comes with a little higher confidence that the company behind it is not collecting all your data. Voyage has some interesting embedding models that I have been hesitant to fully utilize due to the lack of confidence in the startup behind it.
This blog post outlines the new roadmap: https://www.mongodb.com/blog/post/redefining-database-ai-why...
They commit to supporting the API in step 1 but it's not entirely clear to me whether that commitment continues with step 2-3...
How does MongoDB still have that much available to spend? Everyone I know moved off it years ago.
We use it a lot for a specific use-case and it works great. Mongo has come a long long way since the release over a decade ago, and if you keep it in Majority Read and Write, it's very reliable.
Also, on some things, it allows us to pivot much faster. And now with the help of LLMs, writing "Aggregation Pipelines" are very fast.
Pretending a pile of json is a database is great for pivoting, not so great for anything else.
Maintaining apps built on MongoDB is soul killing.
I've been using Mongo while developing some analysis / retrieval systems around video, and this is the correct answer. Aggregation pipelines allow me to do really powerful search around amorphous / changing data. Adding a way to automatically update / recalculate embeddings to your database makes even more sense.
Do you have any tricks for writing and debugging pipelines? I feel like there are so many little hiccups that I spend ages figuring out if that one field name needs a $ or not.
Pretty sure they achieved fiscal nirvana by exploiting enterprise brain rot. You hook em, they accumulate tech debt for years, all their devs leave, now they can't move away & you can start increasing prices. Eventually the empty husk will topple over but that's still years away.
Is it possible that they simply have a good product?
They do have a good product, but "they accumulate tech debt for years, all their devs leave, now they can't move away" is the story of the place I worked at a few years ago. The database was such a disorganized, inconsistent mess that no-one had the stomach (or budget) to try and get off it.
Impossible! It's not based on sqlite, postgres or written in rust, so it must be terrible!
I never understood this argument, there are many great products running on Java, PHP, Ruby, JavaScript... All of these languages have a "crowd" that hates them for historic and/or esoteric reasons.
Great products are in my opinion a function of skill and care. The only benefit a "popular" tool or language gets you is a greater developer pool for hiring.
Then they get acquired by BloodMoor and they squeeze every last cent out of the remaining customers.
Unironically, this.
that’s what I thought, but every single candidate I interviewed mentioned MongoDB as their recent reference document database, I asked the last candidate if they were self-hosting, the answer is no, they used MongoDB cloud.
You cant use the embeddings/vector search stuff this refers to in self hosted anyway, it’s only implemented in their Atlas Cloud product. It makes it a real PITA to test locally. The Atlas Dev local container didn’t work the same when I tried it earlier in the year.
I self host a handful of mongodb deployments for personal projects and manage self hosted mongo deployments of almost a hundred nodes for some companies. Atlas can get very expensive if you need good IO.
if you a developer you wanna use MongoDB as database, not be MongoDB SRE and DBA
thats the reason for using Atlas
Precisely, and if you are enterprise, you want to have an option to request priority support and have a lot of features out of the box. Also some of the search features are only available in Atlas unfortunately.
Everyone you know put a dollar in donation basket while moving off. Mongo collected all and brought Voyage AI
$2.3B in cash as of last quarter
Because they are web-scale obviously.
Well, it's referred to as a cash-and-stock deal but I can't find any more detail about how much is stock:
https://seekingalpha.com/news/4412466-mongodb-acquires-voyag...
There are a lot of people still on it, including the place I worked at last.
It was starting to get expensive though, so we were experimenting with other document stores (dynamodb was being trialled, since we were already AWS for most things, just around the time I left)
This may be a shock to many HN readers, but MongoDB's revenue has been growing quite fast in the last few years (from 400M in 2020 to 1.7B in 2024). They've been pushing Atlas pretty hard in the Enterprise world. Have no experience with it myself, but I've heard some decently positive things about it (ease of set up and maintenance, reliability).
How is MongoDB still a thing when there's already several ways to handle json in Postgres including Microsofts new documentdb extension:
https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...
What am I missing? Are Mongo users simply front end folks who didn't have time to learn basic SQL or back end architecture?
I will copy and paste a comment I wrote here previously:
"MongoDB ships with horizontal sharding out-of-the-box, has idiomatic and well-maintained drivers for pretty much every language you could want (no C library re-use), is reasonably vendor-neutral and can be run locally, and the data modeling it encourages is both preferential for some people as well as pushes users to avoid patterns that don't scale very well with other models. Whether these things are important to you is a different question, but there is a lot to like that alternatives may not have answers for. If you currently or plan on spending > 10K per month on your database, I think MongoDB is one of the strongest choices out there."
I have also run Postgres at very large scale. Postgres' JSONB has some serious performance drawbacks that don't matter if you don't plan on spending a lot of money to run your database, but MongoDB does solve those problems. This new documentdb extension from Microsoft may solve some of the pain, but this is some very rough code if you browse around, and Postgres extensions are quite painful to use over the long term.
The reality is that it is not possible to run vanilla Postgres at scale. It's possible to fix its issues with third party solutions or cobbling together your own setup, but it takes a lot of effort and knowledge to ensure you've done things correctly. It's true that many people never reach that scale, but if you do, you're willing to spend a lot of money on something that works well.
> MongoDB ships with horizontal sharding out-of-the-box
Maybe it's better than it was, but my experience with Mongodb a decade ago is that that horizontal sharding didn't work very well. We constantly ran into data corruption and performance issues with rebalancing the shards. So much so that we had a company party to celebrate moving off of Mongodb.
> my experience with Mongodb a decade ago
So before the Apple Watch was released.
Why is this relevant today ? Technology changes very quickly.
MongoDB is not the same as Postgres and jsonb.
also, I'd challenge your thinking - ultimately the goal is to solve problems. you don't necessarily need SQL, or relations for that matter. that being said, naively modeling your stuff in mongodb (or other things like dynamodb) will cause you severe pain...
what's also true, which people forget, is naively modeling your stuff with a relational database will also cause you pain. as they sometimes say, normalize until it hurts, and then denormalize to scale and make it work
the amount of places I've seen that skip the second part and have extremely normalized databases makes me cringe. it's like people think joins are free...
Then your implementation can be as simple as CREATE TABLE documents (content JSONB);. But I suspect a PK and some metadata columns like timestamps will come in handy.
sigh - mongoDB is not the same as creating a table with jsonb. for one, you don't have to deal with handling connections. that being said, postgres is great, but it's not the same.
Postgres has ways to simplify connection management, if that is a blocker for you (pooling, pgbouncer, postgrest, etc)
I have seen a few rather large, production mongodb deployments. I don't understand how so many people chose it as their basis of their applications. There are a not-negligible amount of mongodb deployments I have seen that basically treat mongodb as a memory dump, where they then scan from some key and hope for the best. I have never seen a mongodb solution where I thought that it was better than if they just chose any sql server.
SQL or rather just some schema based database has a ton of advantages. Besides speed, there is a huge benefit for developers to be able to look at a schema and see how the relationships in the data work. Mongodb usually involves looking at a de facto schema, but with fewer guarantees on types relations or existence, then trawling code for how its used.
We use their atlas offering. It’s a bit pricey but we are very happy with it. It’s got a bunch of stuff integrated - vectors, json (obviously), search and charting along with excellent support for drivers and very nice out of the box monitoring.
Now I could possible spend a bunch of time and do the same thing with open source dbs - but why? I have a small team and stuff to deliver. Atlas allows me to do it fast.
Similar here, there are gotchas though. Some versions ago they've changed their query optimization engine - some of our "slow aggregations" become "unresponsive aggregations" because suboptimal indexes were suddenly used. We had to use hints to force proper indexing. Their columnar db offering is quite bad - I'd say if there's need for analytical functionality, its better to go with a different db. Oplog changes format - and although its expected, it still hurts me every now and then when I need to check something. Similarly at some point they've changed how nested arrays are updated in changestream, which has broken our auditing (its not recommended to use changestream for auditing, we still did ;) ). We've started using NVM instances for some of our more heavily used clusters. Well it turned out recovery of an NVM cluster is much much slower than a standard cluster. But all in all I really like mongodb, if there are no relations - its a good choice. Its also good for prototyping.
There’s a ton of hosted Postgres providers that do all of that and more and are just as simple to use. Neon.tech is really easy to set up and if you need more of a baas (firebase alternative), Supabase. Plus, no vendor lock in. I’ve moved vendors several times, most recently AWS RDS to Neon and it was nearly seamless. Was originally on Heroku Postgres going way back. Try getting off Atlas…
Ha - easier said than done in an enterprise, especially when nothing is wrong. Maybe the $$, but at some point the effort involved with supply chain and reengineering dwarfs any “technical” benefit.
This is why startups like to get into a single supply chain contract with an enterprise - it’s extremely hard to get it setup, but once done very easy to reuse the template.
If you can learn Mongo you can learn SQL and 'back end architecture' let's be honest the basics are hardly difficult no matter what tool you're using.
Just because Postgres is good doesn't mean other things can't also be good (and better for some use cases).
Enterprise sales
Mongo is Firestore for entrprise
It's simply not that widespread of knowledge. Modern Postgres users would never suggest Mongo, but a generation of engineers was taught that Mongo is the NoSQL solution, even though it's essentially legacy tech.
I just ran into a greenfield project where the dev reached for Mongo, and didn't have a good technical reason for it beyond "I'm handing documents". Probably wasn't aware of alternatives. FWIW Postgres would've been a great fit for it, they were modeling research publications.
Um because it must be worth 2 billion if this acquisition is worth $220 million. I know there’s rules about discussion quality on this site, so I guess we can’t question that.
10x exit in a couple years, quite nice on the VC side!
On the tech side ... no idea what Mongo's plan is ... their embedding model is not SOTA, does not even outperform the open ones out there, and reranking is a dead end in 2025.
I think the value is on Voyage's team, their user base and having a vision that aligned with Mongo's.
Congrats!
>their embedding model is not SOTA, does not even outperform the open ones out there, and reranking is a dead end in 2025.
Are you referring to the MTEB leaderboard? It's widely believed many of those test datasets are considered during the training of most open-source text embedding models, hence why you see novel + private benchmarks discussed in many launch blogs that don't exclusively refer to MTEB. There are problems there, and it would be great to see more folks in the search benchmark dataset production space like what Marqo AI has done in recent months.
Also what makes you say reranking is dead? Mongo doesn't provide it out of the box but many other search providers like ES, Pinecone, Opensearch do so it must provide some value to their customers? Maybe you're saying it's overrated in terms of how many apps actually need it?
disclosure: I work on vector search at Mongo
>Maybe you're saying it's overrated in terms of how many apps actually need it?
Yes, my comment leans more towards that, rather than suggesting is useless.
Taking a step back, accuracy/quality of retrieval is critical as input to anything generated b/c your generated output is only as good as your input. And right now folks are struggling to adopt generative use cases due to risk and fear of how to control outputs. Therefore I think this could be bigger than you think.
Interesting take. Have you benchmarked models on your own data? Cause at this point everything is contaminated so I find it impossible to tell what proper sota is. Also - most folks still just use openai. Last time I checked, reranking always performs better than pure vector search. And to my knowledge it's still the superior fusion method for keyword and vector results.
In my experience, storing RAG chunks with a little bit of context helps a lot when doing the retrieval, then you can skip the whole "rerank" bit and halve your cost and latency.
With embedding/generative models becoming better with time, the need for a rerank step will be optimized away.
Looks like everyone is jumping into the AI game. Is there a bubble?
Voyage AI basically builds embedding models for vector search
You don't hear the big AI providers talk about embeddings much, but I have to believe in the long run that companies building SOTA foundational LLMs are going to ultimately have the best embedding models.
Unless you can get to a point where you can make these models small enough that basically sit in the DB layer of an application...
That and because the embedding models are much easier to improve with at scale usage (hence why everyone has a deep search/research/RAG tool built into their AI web app now).
Is Voyage AI web-scale yet?
Voyage AI post: https://blog.voyageai.com/2025/02/24/joining-mongodb/
and the mongo blog post for how it will be used: https://www.mongodb.com/blog/post/redefining-database-ai-why...
what's the calculus here? if i'm a developer choosing a low-level primitive such as a database, i'm likely quite opinionated on which models i use.
If I had to guess they might see embedding models become small and optimised enough to the point that they can pull them into the DB layer as a feature instead of being something devs need to actively think about and build into their app.
Or it could just be an expansion to their cloud offering. In a lot of cases embedding models just need to be 'good enough' and cheap and/or convenient is a winning GTM approach.
Curious - do people migrate due to the price tag, ease of use, sth else ?