MiniCouchDB in Rust

lukevp · on June 7, 2020

This is really cool, thanks for your work! I really love couchdb. I have seen tickets for document based ACLs on their github recently, I think this is the last feature they need to open up usage across many more domains. The core app works great for replication between dbs and to the edge using pouchdb, but almost everyone is turned off by the database-per-user that is required to use it. If they solve that I think their adoption could skyrocket, if they can get over the hurdle of years of neglecting this core feature where everyone became disillusioned by it. It’s positioned as a “put your db up on the internet and interact/replicate over http” but it doesn’t work for most non trivial use cases because everyone can then see the data, so you end up layering adapters on to do ACL. This should really just be built in to the core app.

xrd · on June 7, 2020

This is interesting to me.

One of the things that really turned me off from firebase is the rules system where you have to build an ACL there. It's been a few years since I've played heavily with it, but I really disliked trying to essentially build my authorization system into a backend which I couldn't run locally and was primarily edited in their web console.

For me, writing a proxy that sits in front of couchdb is very simple. I use JWT tokens that get passed between components in the system. I keep my authorization logic out of couch. I can write normal unit tests on my proxy. And my pouchdb client code mirrors the structure of my backend structure, which is a mental model I really prefer.

I guess I'm saying I think having a separate database for each user makes more sense to me, not less.

The only thing I'm struggling with is running code when a document is updated. I have a polling client that watches for the _global_changes updates, but it seems really hacky. I wish there was a better way to get access to all database changes that looked like firestore functions.

kiwicopple · on June 8, 2020

This is almost exactly the architecture I’m thinking about for Supabase, except using our Real-time server instead of polling. If you’re open to it, I’d love to chat to you about how you’ve implemented this? My email is in my profile

tehbeard · on June 7, 2020

_changes should be available in an event stream format for continuous consumption, and you can apply filter functions to the feed?

xrd · on June 7, 2020

It is, but I am using the global changes feed (so I don't need to create a process for each database which doesn't scale for me). And you have to manually create that database to have it work. And I have to filter based on my own code, I just wish there was a way to subscribe to high level events. In general it works fine but I keep thinking there could be a better way.

garrensmith · on June 7, 2020

Thanks for reading. We have some plans around this, see the RFC - https://github.com/apache/couchdb-documentation/pull/424

In general, we are trying to move away from running your whole application in CouchDB. We would prefer that you have an application layer in front of your CouchDB instance. We recommend that you put up a proxy if you want to expose your replication to PouchDB. That way you can add security around that endpoint.

lukevp · on June 7, 2020

Thanks for the reply, that’s the RFC I was referring to, and it’s great that that’s being reviewed! If you look back on the history of couchdb on HN for example, the lack of document based ACLs continues to come up as a serious limitation of couchdb. Because it speaks http(s) natively, it is a perfect db that can be used directly in place of a lot of the rest of your stack if your use case is a MBaaS. I see couch primarily positioned to compete with firebase firestore, but the perspective I see from maintainers is that it should be competing with mongodb instead (especially with the integration of mango queries and the push to put proxies in front of it.)

I feel that moving further in this direction to make couch a competitor at the DB later will be the death of couchdb in the long run, because it is removing the only real advantage it has over its peers, which is https based replication to the browser and between peer DBs, and the fact that it’s built on Erlang and can handle replication to many peers with ease. If you make us proxy and wrap the Erlang app in our crappy (excuse the language) business layers like Java and C# to build the proxy ACLs against a separate DB, you’ve totally covered up the huge benefit of the safety and concurrency of Erlang in handling the replication to endpoints for us.

kungfooguru · on June 7, 2020

Curious why they suggest a reverse proxy. It may be a holdover from a time before Erlang SSL got so many improvements? Heroku moved SSL termination to Erlang from ELB's and saw great improvements, plus they released a useful lib https://github.com/heroku/snit

I suppose it could be related to their use of mochiweb still for the web layer. Maybe they'll add on HTTP/2 eventually and no longer recommend a reverse proxy.

There are alternatives for HTTP/1.1 and HTTP/2 in Erlang, like Elli (http/1) https://github.com/elli-lib/elli and Chatterbox (http/2) library https://github.com/joedevivo/chatterbox

daleharvey · on June 7, 2020

This is an architecture I use a lot. The reverse proxy is there to implement the replication, having your application understand the replication protocol is a big ask. Instead, your app takes http requests, handles the ones it is supposed to, and forwards database requests directly to (c|p)ouchdb (after checking auth)

https://github.com/daleharvey/noted/blob/master/index.js is a very simple example of how it can work

lukevp · on June 7, 2020

Right, but if your application is just a straight proxy that’s stripping out a bearer jwt or something to validate, you can’t control which documents are being synced. So you have to have 1) some understanding of the underlying protocol so you can parse the request and reject it based on user if that user does not have read/write access to the db, or 2) replicate your authorization code into couchdb with users and db permissions, at which point you have the same db per user issue (or lack thereof) plus you now have 2 user stores and 2 layers. Once this gets sufficient complexity and you try to architect something non trivial (eg. User a owns write access to document b but shares read only access to document b to user C) you end up praying to god you built the protocol parsing just right or you might accidentally let users arbitrarily write to each other’s documents.

I acknowledge that this is the same issue in a traditional dbms, but it doesn’t try to handwave away this complexity from you. It doesn’t show examples on their site about how easy replication is to set up only to be betrayed later After you’ve already integrated and marveled at the couchdb sync performance when you can’t build real permissions and would’ve been better off using Postgres and getting a bigger community and ACID/CP. and let’s be honest, almost every app is gonna need a rdbms at least for transactional data anyway (we know you’re not storing stripe billing records in your couchdb) so then the question becomes, is the sync protocol even good enough anymore to warrant using couchdb?

lukevp · on June 7, 2020

We really shouldn't be writing our own authentication layers (or anything security-related for that matter) unless necessary.

Case in point, this example code has a massive security issue that allows anyone to impersonate without tokens if there is an active authentication request open. Hopefully no one has used this example code to build a production system that has real user data.

This is a perfect example of why this should really be part of CouchDB/PouchDB itself and not something each person must write themselves. This should be solved once, solved right, vetted by the community, and be easy to fall into a pit of success.

I really like CouchDB and PouchDB as a product, but this insistence that this is the right path is really holding you guys back.

matlin · on June 7, 2020

You're probably write about writing your own authentication layer but it still shouldn't be part of CouchDB or PouchDB. A better solution is for some OSS project to build a standard proxy that applies the document level authorization that everyone is asking for. No reason for it to be built-in.

lukevp · on June 7, 2020

I would agree if couchdb wasn't positioning itself as being exposed directly on the internet. If CouchDB is meant to be proxied behind another system / auth stack, then why does it have CORS support and cookie auth built in?

Both of those features are purely a client-side concern and exist because the original intent of CouchDB was real-time replication to browsers. Otherwise the proxy could do the CORS as well as the authentication and CouchDB would only require an http-level authentication pattern (like basic auth).

Having said that, I do agree that this should be compartmentalized and the end user should be able to pick and choose what features they want to allow, but I don't think that this should continue to be a separate concern that everyone is building themselves, it should be a first-party solution.

foolmeonce · on June 7, 2020

It seems to me like the pieces never really came together at the right time to make couchapps and I find that a bit sad.. I don't really like having apache in the middle, but ssl wasn't ready, then references to external services went away, now html view designs drop out just as the JS support is modernized..

I think it's generally a problem that taken alone these features aren't useful until they are all together good enough to invert the system such that the DB answers and occasionally proxies instead of the other way around.

lukevp · on June 7, 2020

Great perspective. Totally agree. We are using nginx to proxy to couchdb and do the ssl termination, and it’s great that we can proxy the DB traffic like this since it’s all just http, but couch has always felt one step away from being an absolutely killer dominator of the MBaaS space to me. It’s like a self hosted firestore, except it has no story around auth. If it also had a way to trigger user code on the server side in other languages that would be incredible.

xrd · on June 7, 2020

This is a great comment.

In all fairness, they did have an auth story, right, and recent documentation suggests they reconsidered that path and now suggest keeping it out of couchdb. So, to me, this says auth was something they never could get right because it is complicated. And I took that a step further to think that it's better to have it outside because you can use any auth solution you want, instead of what the couchdb people felt were the best way to do it (it's smart that they realized they got it wrong and changed course).

I'm confused why everyone here seems to think reverse proxies and auth proxies are complicated. Isn't it the case that all apps of any complexity are a bunch of small services wired together behind a proxy? My auth proxy is all of 50 lines of code, my reverse proxy is 8 lines in nginx conf and it's all held together with a docker compose file that is declarative and works locally as well as on my production server.

lukevp · on June 7, 2020

Thanks for the shout out.

The problem with auth is in the authorization realm, not the authentication realm. It's super easy to do authentication in nginx for example using client certificates, basic auth, api keys, or in your app layer via checking a separate DB or cache for tokens, say via a JWT.

However, it's not trivial to say user "a" owns document "b" but not document "c", and that user "a", in fact, shouldn't even know that document "c" exists at all, while also maintaining the replication that CouchDB has built in.

What we want is one single DB that has all documents, and can replicate all documents on the server level with another DB, but can expose a user-specific changes feed that replicates only data that user has access to. I should be able to grant / revoke access to a document at any time and have it propagate, and all documents should be owned only by the creating user by default.

The choice they are requiring us to make is to shard data ourselves by user (which is unusable in the context of 2 users sharing data), or implement a separate layer that can understand CouchDB replication and can do the filtering of the changes feed as well as the write access to ensure that the documents are restricted correctly.

Take this a step further and now that I have to implement my own authentication AND authorization outside of CouchDB and protect it from the end user directly accessing it, and you could ask the question of why do I even need CouchDB then? Why not just speak CouchDB replication protocol on top of my own auth database and store documents there too?

xrd · on June 7, 2020

I was thinking about my comment, and thought: "I need to ask if you mean authz or authn?" That's funny you said it first.

All those things you note are important, and I just can't see how CouchDB would get those things right inside CouchDB.

My impressions with Firebase/Firestore were that they tried to do that with their "rules" system, and it was a not-quite-JS declarative system that relied on you understanding the non-standard parts of their auth system and the things they exposed. I always felt like this was going to be a huge hole in my app and I would have no way to validate all the edge cases. I feel like this is a complicated beast to do in a generic way and CouchDB was smart to leave it to the app developer, rather than the devops/sysadmin role.

Aren't you being a little stingy with your appreciation for the sync part of CouchDB/PouchDB and a little bit overblown with your worries about understanding the CouchDB "protocol?"

Replication and sync are really challenging problems even if you just think about sharing data in two places, and when you start having your JS code deal with revisions, it gets messy really quickly. That's what appealed to me about PouchDB was never having to really think about sync, other than how to handle conflicts.

But, the CouchDB protocol is just HTTP. And, making a proxy to talk to that is as simple as importing a http proxy module. You are just responsible for your authz logic, which is hard, but at least you can make it exactly the way you want it. I don't really see how mapping your authn logic onto standard HTTP verbs like POST, GET, PUT is that complicated.

Having said all this, you clearly have thought through this stuff deeply, so I'm very interested in hearing your thoughts here because I'm sure my comments are wrong past the surface.

xrd · on June 7, 2020

And, this might be an interesting open source project to collaborate on. I'd be very game for that. xrdawson at the google owned mail system. :)

tgtweak · on June 7, 2020

In general the fewer technologies you need to bring into the mix the less failure domains you need to consider. Let's also add that having an option to provide this natively allows for safe/sane defaults that prevent someone from botching a reverse proxy authentication setup which is by no means trivial to your application developer.

I can't see someone making a meaningful document level ACL outside of the db without some serious effort, nor would I consider home-baked external authentication for replication simple for the same crowd.

A side note I don't think it's good practice to expose your db to the internet, but if you look at Mongo's Atlas this doesn't seem to be as big of an issue as it may have been in the past.

karmelapple · on June 7, 2020

We moved to CouchDB because we didn’t want to reinvent syncing. We wanted a tech that could handle the online and offline case well, and CouchDB and some supporting libraries (PouchDB, some native mobile OS libraries) helped with this.

Syncing was a solved problem in the CouchDB world, so we moved to CouchDB.

Access controls also seem like they should be a “solved problem,” something we can adopt that works well with CouchDB.

Is there some plan for a standard ACL approach for CouchDB users? If we need to put it into some app in front of the database, that eliminates the “solved” nature of sync... or at least introduces added complexities like a reverse proxy.

I really hope CouchDB could bring in some ACL management into the ecosystem as a standard, drop-in sort of approach, and not something that requires custom development for every single app that wants to use CouchDB / Cloudant.

lukevp · on June 7, 2020

That RFC/PR adds ACLs at the document level to CouchDB, but it has been open for a year now. I hope they merge it as well. Until that PR the party line has always been to do ACL outside the DB like you would with any other DB, which really doesn’t make much sense to me.

tgtweak · on June 7, 2020

Have a look at the ($3000/core) enterprise version of Microsoft SQL server and check the field and row level permissions with full inherited RBAC... It's a feature that has some need in the entreprise world. Ignoring these features because you can do it in the application is a bit of a sabotage... take the logic far enough and you'll eventually get to the conclusion that maybe you should build your own DB too.

karmelapple · on June 7, 2020

I just read this more carefully, and my gut reaction is disappointment, but hopefully you can clarify if I’m misunderstanding anything.

Right at the top it says “Make the db-per-user pattern obsolete.”

Db-per-user enables trivial syncing between multiple devices for a single user. That means someone’s tablet, phone, and laptop can all stay in sync thanks to CouchDB’s replication. And we can do it without building anything extra, as long as we have a client library that talks CouchDB’s http API.

Does this new approach eliminate that possibility? Will we now need a server in front of the database for that?

lukevp · on June 7, 2020

No, the problem right now is couchdb has no ACLs at the document level so a common pattern is to use a separate DB per user. If there are document level ACLs, you will still have to create a user in couchdb per user but can now use a single database and assign the permissions at the document level. You could continue using the current pattern if you wanted to. This new architecture has HUGE ramifications though because it enables easy ways to share / make public data just by changing ACLs, and it drastically simplifies backup and restore as well as multi master replication because a super-user (admin) can be used server side to sync everything. Honestly this 1 change should have been a main priority since the inception of couchdb because it further expands the key differentiator of couchdb, which as you said is to enable endpoint syncing for a subset of the overall documents down to end user devices for more performant access, creation, and offline use. This is assuming there is a way to default ACLs on documents to the creating user or role and not wide open. The wide open by default nature of couchdb was a major misstep and why they disabled “admin party” in 3.0.

MuffinFlavored · on June 7, 2020

can’t you store a user_id property/column and do checks based on that?

lukevp · on June 7, 2020

You can, but one of the big advantages of couch/pouch is that you can replicate all the way down to the edge and back in real-time. If you have to put up an auth layer in between it becomes more of a traditional architecture. You can see this in RxDB, a js replicating datastore with observable events on the client side. They built support for both couchdb and GraphQL based replication, so I think this plus hasura would get you replicating GraphQL + a way to easily implement auth within hasura, and a reusable api that would work outside of the pouch/couch replication. Not to mention you also get schemas on both sides.

That’s the main issue with this whole thing. The solid replication and change events and conflict resolution all together is the only way that couch is compelling at all compared to other dbs in my opinion, and having to partition the db to the user level for security without adding a proxy just defeats the whole purpose for most use cases.

rstarast · on June 7, 2020

Oh, I first thought this was mimicking the pretty complete Go https://github.com/alicebob/miniredis, which goes to some effort to be a complete and faithful implementation of the Redis interface.

Still pretty cool, do you intend to develop this further?

garrensmith · on June 7, 2020

Thanks. I'm not sure. I have some other project ideas I would probably work on next. But I might use this as my go to project to test out specific Rust patterns I've learnt.

ruuda · on June 7, 2020

The page only says “You need to enable JavaScript to run this app.”

I expected some kind of live demo, so I enabled Javascript. It turned out to be a static page with text. Not even a toggle-able menu on narrow screens, just a static page.

Why does it need Javascript to render a static page?

Wingy · on June 7, 2020

Probably so that the author can learn whatever JS framework they used. Overkill tech to learn on a personal site before a production site.

radicalriddler · on June 8, 2020

It is React based on the unminified source files that I can see.

Dowwie · on June 7, 2020

The 'mini-redis' project wasn't intended to build a redis clone as much as it was to teach many of the asyncio concepts in Rust with an example that uses an extensive amount of code comments. Such examples have been missing from the Rust ecosystem.

If you were inspired by mini-redis, why not go the extra mile and turn it into another great educational resource?

nullwarp · on June 7, 2020

This site is completely unusable without javascript, just spits out "You need to enable JavaScript to run this app."

dmix · on June 7, 2020

Both helpful git repos for learning how to architect async Rust apps using Tokio. `mini-redis` is especially well documented (I understand the current MiniCouchDB is still early!). Thanks for sharing.

jamil7 · on June 8, 2020

This is a very cool project and a fun read, I didn't know CouchDB 4 would be built on FoundationDB. Without trying to hijack this thread, I've been scratching my head lately looking for open source solution to offline-first, collaborative document editing on native mobile clients. The closest I can get is couchbase lite, does anyone know if something like this could be built for CouchDB? or any other solutions?

anilgulecha · on June 8, 2020

I've been working with yjs, and that is probably the best option for collaborative editing at the moment. Look into it. Reach out if you have any questions.

_npki · on June 8, 2020

https://pouchdb.com/ is a JavaScript port of CouchDB, running in the browser and capable of replicating with normal CouchDB.

jamil7 · on June 8, 2020

Thank you! I have come across it before but wasn't sure how I could run it in a native (kotlin / swift) environment. From memory it also isn't particularly well supported in React Native.

js4all · on June 7, 2020

Awesome, I tried to get it running, but I get value: Missing("couch_directory"). There is nothing mentioned in the readme about this. Can anyone help?

kiddico · on June 7, 2020

Just a heads up: in dark mode your Opensource page doesn't adjust font color (black text on dark purple.)

I wish I knew rust and could say something more substantial lol.

garrensmith · on June 7, 2020

Awesome thanks for the heads up. I just fixed it.

aswanson · on June 7, 2020

If you think about it, all future database development should be written in Rust.

asjw · on June 7, 2020

Because the other several DBMS not written in Rust have shown what problem exactly?

kasperni · on June 7, 2020

You didn't really want to write database did you?

aswanson · on June 8, 2020

Nope.