Very interesting, tempted to apply to the company as this sounds very promising and fun to work on...
I'm curious about the business side of things, to give some context, some open source models that startups use today are:
1. The "open core" model, Gitlab being a good example. They try to split features that are open or closed/enterprise depending on the buyer.
2. The AGPL model, Mongodb used to do this, today a popular example is Grafana and their collection of products.
3. The Apache + cloud backend model, the core being standalone working with Apache license while building a value added managed service. I think this is what Synadia is doing with NATS.
4. The "source available" model, not really open source, but worth mentioning as it's very popular recently. Examples Mongodb, Elastic, Cochroachdb and TimescaleDB. This is often combined with open source such that some parts are open source, others source available.
With this as a reference Nikita, how would you explain how Neon thinks in regards to licensing and eventually building a healthy business? It's obvious a managed database service is the money maker, but how do you think around compeditors taking the project and building managed services without or with minimal code contributions? I'm sure you guys have thought a lot about this, would be interesting to hear some thoughts and reasoning for or against different options.
(Note: This is not meant to be an extensive explanation of these business models just a high level overview. If I have miscategorized some company above feel free to correct me in a comment.)
It's 3. Our intention is to only monetize DbaaS revenue and opensource all/most of the tech with Apache 2.0 license. It's similar to that of Databricks. Databricks over time built out photon that is proprietary. We will stay away from this ideally forever. Enduring technologies are fully open source, we see an opportunity to build a standard scalable storage tier for Postgres and maybe for other engines over time (other engines are off strategy right now).
Tips for transitioning to database development: learn rust, start working on systems, ideally get a systems job, and optimize for being prolific. Write a lot of code
Are there plans to release an HTTP API to make it easier to use with services like Fastly Compute@Edge and Cloudflare Workers? And if so would the API be global or region specific?
One thing I haven't seen with "serverless" databases is an easy way to dictate where data is stored. Mongo has a pretty clever mechanism in their global clusters to specify which regions/nodes/tags a document is stored in, and then you simply specify you want to connect to nearest cluster. Assuming your compute is only dealing with documents in the same region as the incoming request, this ends up working really well where you have a single, multi-region db, but in practice reads/writes go to the nearest node if the data model is planned accordingly.
A real world example of how I am using this in Mongo today: I run automated trading software that is deployed to many AWS regions, in order to trade as close to the respective exchange as possible. I tag each order, trade, position, etc. with the exchange region that it belongs to and I get really fast reads and writes because those documents are going to the closest node in same AWS region. The big win here is this is a single cluster, so my admin dashboard can still easily just connect to one cluster and query across all of these regions without changing any application code. Of course these admin/analytics queries are slower but absolutely worth the trade off.
Absolutely! We are working on it right now and call this “regions”. We already have a proxy - you will notice that the connection string is project_name.cloud.neon.tech.
We are working on deploying the proxy globally and routing read traffic to the nearest region.
We also have some multi-master designs in collaboration with Dan Abadi. But this will take a second to build.
Kind of, yes. There are a few user creation details we need to polish so that you can follow the tutorial word for word without needing to do any click-ops on the console
I‘ve seen an clever approach to do location-specific data at rest. Partitioning and postgres_fdw was combined to store specific data on specific clusters in some regions. And on top of it partioning was used to again get the global view. Really nice.
Depending on the cost per database, a service like this could allow one to build SAAS apps with Single Tenant model [1] in a really simple and elegant way.
I always liked this model since it gives the best isolation and not having to code with multi-tenant in mind really simplifies development and security. But maintaing an infra for this is hard and costly.
How does the memory cache work? The architecture looks great, but so much of PG efficiency is cache hit rate. Do the pageservers have a distribution plan? Plans to add something like DynamoDB Accelerator?
This is something I've been hoping to see for a long time. I've long wanted a serverless postgres solution that would enable me to inexpensively put up a db and throw hasura on top of it.
I have a question about performance though. Let's say I have a table with millions of rows. Would it automatically scale to resources to get me sub second queries on it?
Ultimately that's what I desire in such a solution. Something that will let me throw whatever at it and will handle the scaling to keep whatever queries I throw at it fast.
I see there's branching support. Are there plans for version control features? I'd like to be able to treat the database as a collection of versioned immutable artifacts. Diff database branches, merge/patch, revert to a previous known good version. This would allow me to eliminate "migrations" and simply stage a set of changes, test them, and then merge or revert them, knowing I'm getting exactly what worked before without writing additional code to ensure it or walk it back.
Everybody today knows that if you're gonna change your database [schema], you need a system to migrate the DDL changes intelligently so you don't break something, and then run that in lower environments to test, then promote it up to higher envs and run it, then deploy your apps that use the changes. But of course it may be impossible to revert those changes after that point, requiring an entire database snapshot restore. So not only are there serious operational concerns, making any operations around this time-consuming and frought with peril, but you need to set up a migration solution (language-specific or framework-specific or agnostic) and make sure you architect your application to only make changes in a specific way. (all of this, by the way, is only necessary because the database is one big mutable state machine)
...whereas if it worked more like version control, you could make any change, commit it, and get a commit ID. If that causes problems, if you could just `cloudpg revert $change_id`, then there would be no need to carefully architect the app, changes wouldn't be fraught with peril, and we could be more agile with database-driven design. The database would obviously need to be intelligent enough to figure out how to revert any change, which is why this has to be a database-specific feature and not just a git revert.
Use Case 2. Merging Changes
This sort of follows on the above (making changes more agile). If you have branches, and 4 different devs are working on 4 different database changes, how do you merge and deploy those all safely, and handle reversions safely? Well if all database changes had versions, and we could diff the changes between versions, then we could treat the database like a Git repo and merge/rebase all the changes to the database together at the same time as the code. Again, no need to go back and refactor migration scripts or the app design, because the database is essentially just version-controlled code.
The same things would apply to upgrading/downgrading database versions, bringing up or restoring new servers, possibly even making replication easier, maybe other things we haven't thought of yet.
It would need to revert only the changes in that commit while preserving all new changes afterwards. I don't have the foggiest how to make that happen, that's up to the ninjas at neon!
There would probably need to be a fallback mechanism if, for example, a new column was created and new data was entered into it, and then the revert removes the column. Probably it could keep pointers to such things ("there is a database D with a table T with a column C and rows [a,b,c,d]"), so that if the change is re-reverted later, an extra merge instruction could pop the reference back into place like nothing happened. Somebody with an actual CS background must have better ideas than me :)
This is a tricky problem to solve. Reverting to the commit - that's easy and we have all the infrastructure already to allow you to do that. The tricky part is
* Make a schema change
* push into prod
* accumulate some new data
* revert just the schema change
In our discussions we call this separating schema and data and allow you to have different schemas on the same data.
It's tricky to do in Neon due to the fact that storage knows nothing about schemas. It stores page with no idea what's on them.
But we have some ideas how to do this with logical replication where we will run a transform on top of logical replication stream to keep two branches in sync. Not this year though.
It would be a killer feature, but the thing is that Git has a conflict resolution feature that puts the working tree in an invalid state until the user has manually fixed certain issues, so that you cannot commit until you manually fix the files and mark those as such.
Doing so for a database seems less desirable from an availability perspective, especially with high-throughput databases.
But the manual intervention of conflict resolution in a Git repo is done in a working tree. The named branch's HEAD remains the same until the conflict has been resolved and the commit is merged/pushed. So nobody sees the conflict at all, they only ever see either the old code, or the merged resolved code.
I've been wondering if it makes sense for databases to start offering CRDTs natively, so that you can perform merges that are algorithmically conflict-free.
In a sense, a CRTD is a database. But the 'conflict-free' is only because it defines semantics for conflicting operations.
But, because SQL has conflict resolution by cancellation of one of the two conflicting modifications, I don't think that it is reasonable (or even possible) to merge 2 divergent databases in a single way that always conforms to the needs of the developer and/or application.
CRDTs are pretty hard to implement at the Neon storage level - pages don’t know what’s written on them. I don’t know of a general purpose database that supports CRDTs. It would be cool though!
I see how having multiple Postgres compute nodes scales up reads, which is great! Does separating the "write" functionality into the Pagekeepers service allow writes to scale as well?
Seems like there's no upper limit for scaling up reads, just wondering how this architecture affects write throughput. Would love to hear more!
Yes you can’t infinitely scale writes. You can make your storage bandwidth infinit but you are still limited to the how much Postgres WAL one instance can pump into the storage.
I think we can do a lot of good things here over time and have plenty ideas. But for now it’s a single writer system. Good news is that there is so much open source tech around Postgres that it might not be a gargantuan task in the future
What’s the ‘page’ in Pageserver? Is it a B-tree page? Or disk page?
I have a hunch that Pageservers contain the disk pages, so storing the B-tree (or may be LSM) pages and compute traverses those pages to find the relevant page/pages. I am curious about how does it fit in together and fetches from disk/Pageservers work
AWS Aurora Serverless can't scale down to 0, at least not on v2, so minimum cost to play is $40/month
I'm sure Neon's design can handle the hobbyist wordpress DB or personal project DB for <=$10/month since it can scale down to 0, and it likely would not be terrible for a cold-start on a small personal website to be 1-3 seconds for the first page load -- sure, bad for SEO, but I think a LOT of people want a DB that is managed and simple for <$20/month -- and again, I'm very hopeful Neon will have a nice pricing number here when they figure out their model, since a very tiny shared tenant should be very inexpensive for their model.
How does shared memory get handled across nodes? I’d imagine there’s quite a bit of in memory state for things like sequences. For that specific example you could preallocate chunks of them as there’s no guarantee of them being contiguous, but I’m guessing there’s more complicated examples.
Our compute node is basically a usual postgres where we intercept WAL write and page read streams. In other words we don't do any compute sharding since we want to preserve vanilla postgres compatibility and any sharded solution will be riddled with a lot of issues like you've mentioned.
can you describe how compute scaling works? I do a lot of work in PostGIS, which can have big CPU needs depending on the function being invoked, so my workload can look super bursty from a CPU perspective.
We are still experimenting with compute scaling, and the tech preview includes only a small fixed compute container. We have a custom proxy in front of a compute, and we can quickly change the underlying compute container with a bigger/smaller one. That works fine if you don't use per-session semantics and transactions are short-lived. But there are a lot of tradeoffs on what to do if there is a long-lived transaction or session-level object. So if we need to scale up the container in the presence of long transactions, we can:
1) roll back long transactions and enforce upscale
2) wait for a better moment to upscale (potentially forever)
3) try to do a live migration of running Postgres to another node (like VM live migrations, or CRIU-like process migration) and preserve long-running transaction
So far, we plan to start with some combination of 1+2 -- should be fine for web/OLTP kind of load. But ultimately, we want to arrive at 3), but that approach has way more technical risks.
Huh, interesting. So your postgres can basically run arbitrary code on behalf of the user? I'm talking about stuff like https://tada.github.io/pljava/ where you can use the DB to invoke whatever code you like, outside of any sandbox. Like, could I upload a PL/Java function that probes your internal network? How are you making that secure?
That is true. And that is why we do not have the UI for loading extension binaries and do not give root access to the compute node. Yet. Of course, some containerization is in place, but it is not as tight as we would like for arbitrary code execution.
Still, there are no technical limitations. Our test suite already uses Neon-specific SQL functions from a C extension (https://github.com/neondatabase/postgres/tree/7faa67c3ca53fc...). At the very least, providing a lot of popular extensions out-of-the-box is on our roadmap once we figure out the security, no special repacking needed. As compute nodes should already be pretty isolated from each other, I don't think allowing arbitrary code will require a redesign.
Yes it’s a lot of work. But a lot less compared to building a database from scratch. There is a blog coming out from Heikki and it covers the tricky spots
For many workloads it’s on par. Few are slower. Pathological case, working set doesn’t fit in memory allotted to pg but fits in file system page cache. There is not page cache in neon
- Buffer cache evictions of freshly dirtied pages are bad. This is because we request pages with a hint on the latest change was evicted, and with newer and newer changes being evicted from buffers you might start to get limited by the write-through latency to Pageserver instead of only Safekeeper.
- Commit latency can be not great due to cross-AZ communication -- with 3 safekeepers in as many AZs, the second slowest response is the limiting factor
- Write amplification in the whole system is quite high. Plain PG does ~ 2x (WAL + Page), while we have many times that. Admittedly, these writes are spread around several systems, where any one system only really needs to write 1x WAL volume for the data that it is responsible for, but Pageserver is currently configured for something like 4x write amplification due to 4 stages of LSM-tree compaction.
Anything with a write-based working set that fits in the buffers of the primary instance;
Databases that are unused most of the time;
Databases with a lot of read replicas (potentially with Neon only providing the read replicas, not the write node/hot standby);
Apps that want to run analytics on the data, but don't want to transfer O(datasize) data to a different DB every time, and also don't want to deal with the problems of long-running transactions.
Not very suitable:
OLTP with writeset that doesn't fit in the caches;
We are using AWS Aurora (non serverless) and getting crushed by IOPS charges. We do very little caching in our app, which makes the architecture simple, but the downside is we end up relying on DB-level caching implicitly. And so when the DB instance has enough memory, IOPS are modest. But once we outgrown it, we have to scale compute up just to get more RAM or else we get destroyed on IOPS.
Seems like from a cost standpoint Neon does a good job of avoiding EBS entirely and thus this particular problem would be sidestepped. Is that right?
from the design I've seen, the user-dependent data is mostly in S3. EBS should only have a minimal involvement -- probably only used for active data (waiting to be written, or perhaps cached data to safe an S3 call for recently requested data)
Looking at the design they are going for, I expect they can scale horizontally really well. I'm really looking forward to this as a low-cost hobbyist DB
Yes Aurora gets really expensive on IOPs. They basically charge you for pulling pages from their own page servers. In their design they collocate page server and safekeeper functionality. I believe we will be cheaper in this case yes.
Sounds very exciting, but curious about performance. Of course there will be downsides, it would be nice if this could be characterized somehow as part of the docs or site.
Congrats on the launch! Does it support logical decoding plug-ins? I.e. could I use for instance Debezium for streaming changes out of Neon to Kafka etc.?
Any plugin that doesn't access the file system directly but uses the appropriate file system and buffer manager APIs should work. But because we scale to 0 when we detect no activity we might have issues with background tasks, so your milage may vary.
Next, we have not yet optimized for replication outside Neon Cloud, nor do we have sideloading of extensions, so to use Debezium you'd have to self-host Neon for now.
You can already build and run all base components on your local machine, see instructions here: https://github.com/neondatabase/neon/tree/d11c9f9fcb950ac263... . You can run the tests instead if you want more insight into how a particular piece works. You probably want to attach your S3-compatible storage to Pageserver and Safekeepers; some Ansible scripts with command-line flags are in the repository.
To run Neon components on multiple machines, you should be able to create the `.neon` data directory via `neon_local init` and then share the generated configuration files across machines and tweak network settings. You can refer to our documentation to understand the terminology and the intended hosting configuration: https://neon.tech/docs/storage-engine/architecture-overview/
However, there are still two missing bits: the self-hosting documentation and the Neon Control Plane (web UI + K8S-based compute nodes orchestrator). So you don't get automatic scale-to-zero at the moment out-of-the-box, although all hooks and the PostgreSQL proxy we're using at pg.neon.tech are there.
We consider open-sourcing the Control Plane, so stay tuned. As for documentation and support, Nikita has already answered.
So you can selfhost our storage and we are making it easier and easier. Mostly we want to write one helm chart that will package it all.
We are not planning to support on prem deployments commercially. We are working with partners like Percona to do it eventually, but those conversations are too early to commit to anything.
>EBS volumes are very expensive, AWS throttle your throughput unless you pay for provisioned IOPs
How does using local SSD instead of default EBS help achieve higher throughput though? I get it's cheaper and lower latency but I fail to see how the rebalance magic solves the throughput tradeoff for all unless they throw more ram at it.
I'm not familiar with postgres or cloud architectures and I have a few dumb questions. Is the pageserver act as a page cache for s3? Does the postgres compute also have an internal cache? If so, this looks like multiple levels of memory cache connected by networks.
Yes. The issue is that compute doesn’t have enough of the cache AND you need to have a scratch space to update pages. You could theoretically do it on the compute too and this would be a valid design. A bit harder to work with read replicas.
I read that pageservers are shared between users. What if the memory capacity of the pageserver becomes the bottleneck? eg: some users perform full table scans and make most page access fallbacks to s3. Sorry for one more dumb question.
The answer is more pageservers. Right now we have 1 to may relationships 1 page server many tenants. But one tenant one pageserver. We will shard pageservers and make it many to many. The good news is that pageserver workload is constant space so it's relatively easy to schedule. Unlike query processing workloads that have joins and those are not constant memory space.
What sort cold start times do you hope to achieve? If it's serverless and no cost while not using(aka scale to zero), you can't run those containers all the time. For comparison the serverless version of sql azure takes ~50 seconds to cold start.
How long before we get public access? I signed up for the beta a few weeks ago but I haven't heard back aside from having to fill in a bunch of surveys.
Let us bump you up. We received a lot of sign ups. Mostly due to the HN power and now onboarding as fast as we can as well as fixing small issues please email at beta@neon.tech. We will be asking for feedback in return.
People say that. The underlying tech is different. It’s closer to aurora than to Planetscale. Planetscale is shared nothing which breaks MySQL compat. We are 100% Postgres compatible.
We are not ready to answer this because pricing is always a model that is built on top of your COGS (cost of goods sold) and validated on real usage. We just started to onboard users - we have hundreds now, but it's just the first week of doing it.
The rule of thumb of software cloud margins for infrastructure is 60-70%. So we need to be there over time at least. So sorry for non answer, I hope we will sort it out soon.
I'm curious about the business side of things, to give some context, some open source models that startups use today are:
1. The "open core" model, Gitlab being a good example. They try to split features that are open or closed/enterprise depending on the buyer.
2. The AGPL model, Mongodb used to do this, today a popular example is Grafana and their collection of products.
3. The Apache + cloud backend model, the core being standalone working with Apache license while building a value added managed service. I think this is what Synadia is doing with NATS.
4. The "source available" model, not really open source, but worth mentioning as it's very popular recently. Examples Mongodb, Elastic, Cochroachdb and TimescaleDB. This is often combined with open source such that some parts are open source, others source available.
With this as a reference Nikita, how would you explain how Neon thinks in regards to licensing and eventually building a healthy business? It's obvious a managed database service is the money maker, but how do you think around compeditors taking the project and building managed services without or with minimal code contributions? I'm sure you guys have thought a lot about this, would be interesting to hear some thoughts and reasoning for or against different options.
(Note: This is not meant to be an extensive explanation of these business models just a high level overview. If I have miscategorized some company above feel free to correct me in a comment.)