I've had some middling success with this by utilizing CLAUDE.md and language features. Two approaches in C#: 1) use partial classes and create a 'rule' in CLAUDE.md to never touch named files, e.g. User.cs (edits allowed) User.Protected.cs (not allowed by convention) and 2) a no-AI-allowed attribute, e.g. [DontModifyThisClassOrAttributeOrMethodOrWhatever] and instructions to never modify said target. Can be much more granular and Claude Code seems to respect it.
In .NET consulting land many years ago I wanted the following: a faster way to build reports (primarily tables, sometimes with charts) for clients. I found the vast majority of work I was doing was taking basic SQL queries and getting them into a fast, pretty JS/HTML front-end. There are some solutions out there but they're pretty enterprisey and clunky and expensive (e.g. SSRS and their ilk).
I ended up finding https://datatables.net to take care of the front-end side of things and duct-taped that to a mini-ORM (https://github.com/schotime/NPoco) which came with a simple parameter-safe query generator and ended up with the ability to basically write 1 POCO defining columns and their metadata combined with 1 POCO that contained an in-code SQL query. Once defined, all a developer would need to do from the front-end is ask for the column POCO and suddenly a fully server-side powered table would appear with full support for (server-side!) pagination, page saving, export to CSV/Excel, column hiding/showing, column rearrangement, etc.
I had to write a ton of glue and learned a massive amount about SQL Server during the process. While the mini-ORM gave me a very simple way to inject things like the WHERE clauses and ORDER/SORT BYs, I ended up writing all of the actual logic to convert the request coming from DataTables to actual SQL. At first I thought I was writing an ORM, but really it was just a very specific use-case query builder.
Around the time I got that much of it working a co-worker put me into contact with a friend of his at another consulting shop nearby who he swore was doing the exact same thing as me but with Dapper (https://github.com/DapperLib/Dapper). A few beers and weeks later we had both sat down and evaluated each others work and decided to join forces. He ended up preferring NPoco and I ended up preferring a TON of his query generation (he also had support for basically all the non-complex types (read: hierarchy and friends), so it was an awesome merger).
In modern times, we've got to the point where you can now write a single C# class that inherits from our IMagicalQuery with a single SQL statement inside. Using the magic of Roslyn and a Visual Studio plugin we wrote for ourselves, you can smack a button and it will generate all the other supporting classes needed and update them if your query changes at all (one of our biggest problems was changing the query required you to change [DataAttribute] values on the matching column metadata class, which was very easy to forget to do and annoying).
We also have the awesome feature that came out of necessity of removing joins when hiding columns. Say we've got a 30+ JOIN query (yes, that is necessary for some of our clients; no, it shouldn't be; we don't always get to design the databases!) and the client hides 1 column out of 3 that depend on the JOIN Foobar table - nothing happens! But if they hide all 3 our magical query builder will drop that join entirely from the actual SQL that gets passed to the mini-ORM. The query in our C# class is untouched, it just deletes the offending join out before shipping it to SQL. We've been able to get some awesome performance out of that.
It is sadly never going to be something we release because the reality is that it is a monster made by two savages. Technically either of us can push it out at any time - we both agreed on MIT from the get-go as we both also agreed it's a hydra that no one would ever want to buy. That said we've both profited - our shoppe is now known as the one that can get you the data your people need 10x faster than the other guys. We do SMB consulting so word of mouth is king.
I probably made it sound all roses and kittens so let's talk negatives. 1) SQL Server only. We have tons of SQL Server specific dialect stuff because, well, 99% of our clients use SQL Server. 2) We only support the bits of SQL we need. Initially it was the basics: numerics, approximate numerics, datetimes, strings. Later on we added TVP support, some very limited spacial support, sql_variant. But it has all been as-needed. 3) Adding features that go up the entire vertical is...an experience. Pretty much only my partner in crime or myself touch that. It basically requires you to get the following into your brain: the entire DataTables API, the 5 or so ES6 modules that wrap around DataTables that handle type conversions, data conversions, query mapping stuff, etc., then the 20+ or so C# classes that do all the heavy lifting to ensure type safety doesn't go boom, SQL generation, all the magical performance features we added like JOIN removal, etc., then you also need to know how NPoco is using that data in case something goes funky there. I generally set aside a full day or two whenever we need to add a full-stack feature cause it's just a headache. 4) We write SQL in SSMS then literally copy paste it into a C# @"" string and then parse it with Roslyn later and rewrite it. I know exactly where I'm going in the afterlife...
For future stuff we are planning on a version that generates a stored procedure directly into a SSDT project so that we can actually store our SQL in a project that...is meant to store SQL (what a novel idea!). It would simplify our C# query to literally just be "EXEC <proc> <here's some params!>" but we haven't quite figured out how we're going to pull it off without losing features.
I've only done a few 30+TB moves, largest being 50TB which had similar table structure - one table per machine holding calibration readings that were constantly streamed in from a MUCH beefier RabbitMQ instance. Old CPU, not a ton of cores, spinning disks, etc. Same base problem. I think the best thing to do here would be change the order of some things around.
1. Buy the hardware first with blazing drives and roaring CPUs and get it stood up before you start trying to change your schema on a potato
2. Do his last step now: set up an AG and let SQL Server do it's thing
3. Now go start planning your ETL/schema changes and writing scripts while you wait 1-3 days (or in my case 10 as we had to suspend during business hours) for data to migrate.
4. Fail-over and promote and start collecting data on your superbox
5. If the world didn't explode, you do the same thing at your other DC
6. Now, if you absolutely must, apply and test all your schema changes in production instead of dev first but enjoy the fact that those changes are exponentially faster!
You could also use log shipping, but with those restore times you'd have data loss I think. Haven't done that in years - I've been lucky enough to only have to deal with projects where we shutdown, restore, and do a full downtime migration. So much nicer...
Having step 1 of your solution be "eliminate the hardware constraints that made this job painful" is kind of a cheat. We can all do a better job on any problem if we lift those constraints.
I think we all agree on that, but the question was not whether this was an efficiently managed migration. I think we can assume OP made a similar argument up the chain (he got them to add SSD storage, at least).
I went on a similar journey but for OpenID Connect. While the spec is fantastic https://openid.net/specs/openid-connect-core-1_0.html#Overvi..., I found the same thing to be true - very little explanation of why. For example, it's very clear how each flow works and therefore how to implement, but not clear why there are so many of them. While researching and building my own implementation I eventually ran into IdentityServer3 https://identityserver.github.io/Documentation/ which had a nice intro video explaining things clearly. I also quit building my own at that point, since their offering is very well done and using the same stack as the rest of our software. I wouldn't say the docs are a good resource, but they helped a bit. There's also a version 4 now, though the documentation looks about the same.
Also not a good resource, but acceptable: Pluralsight. There is one straight up OAuth course to go over all the basics and then quite a few language/framework specific ones, e.g. how to implement OAuth in Node/ASP.NET/etc. The OAuth course was dry but had some decent information - but I did quit halfway through it because of IdentityServer, so take that with a grain of salt.
I really do recommend checking out IdentityServer4 though, unless you're implementing this specifically to learn / have fun / etc. And if you don't care for the Microsoft ecosystem, I've heard nice things about Hydra https://github.com/ory/hydra which is a similar Go offering.
Awesome, thanks for this. I think I've read the Road to Hell article 3/4 times at this point. It makes a little more sense each time as I learn more.
I totally agree there are so many resources about implementation, and honestly it's pretty straight forward. My guess is that because of this people don't think to question it and simply assume it's all necessary. And maybe it is, but in my experience necessity is often tied to specific assumptions that may not be true for a specific use case.
With oauth in particular I suspect a lot of the details are tied to the assumption that you have to do a full redirect in order to authenticate. But my emauth.io service uses email over a back-channel to authenticate, so the user can stay on the app page while they verify their identity. So at the very least you don't have to worry about redirect hijacking.
If you want a more batteries-included less-glue-requiring IoC container, can't recommend Autofac enough. Always up-to-date, responsive developers, thorough documentation. I chose it over ten years ago after reading https://www.manning.com/books/dependency-injection-in-dot-ne...? and have never looked back.
More OVH numbers using your same pgbenches. I've got a $99/mo GAM1 - Intel i7-7700K NO OC - 4C/8T - 4.2GHz - 64GB - NO SoftRAID (using ESXi) 2x450GB NVMe running 3 VMs. Two VMs are using < 50 MHz, the other is using 30% of the CPU for an empty Minecraft server.
With the Minecraft server on:
progress: 60.0 s, 6956.5 tps, lat 1.437 ms stddev 1.485
With the Minecraft server off:
progress: 60.0 s, 7275.5 tps, lat 1.374 ms stddev 1.975
Looks like not having SoftRaid is killing me. I wish I could test with it on so I can see the comparison of the E3 vs the 7700K, but I would need to re-image.
For fun, a random $5 lowest tier Digital Ocean droplet running a TeamSpeak server (which is using about .07% CPU):
progress: 60.0 s, 1507.3 tps, lat 6.631 ms stddev 5.809
And for even more fun, WSL running on my personal/WFH rig - i7-7820X @ 4.8GHz OC, 1TB 950 Pro NVME, 32GB RAM (I let this one run a bit longer as WSL has performance issues, was curious if I would see variance):
progress: 60.0 s, 1172.4 tps, lat 8.516 ms stddev 6.017
progress: 120.0 s, 1271.3 tps, lat 7.863 ms stddev 4.817
progress: 180.0 s, 1274.0 tps, lat 7.849 ms stddev 4.943
progress: 240.0 s, 1266.1 tps, lat 7.896 ms stddev 4.875
progress: 300.0 s, 1239.2 tps, lat 8.069 ms stddev 5.211
progress: 360.0 s, 1213.1 tps, lat 8.242 ms stddev 5.447
progress: 60.0 s, 3295.3 tps, lat 3.034 ms stddev 1.906
progress: 120.0 s, 3384.4 tps, lat 2.955 ms stddev 2.341
progress: 180.0 s, 3294.9 tps, lat 3.035 ms stddev 2.384
progress: 240.0 s, 3025.3 tps, lat 3.305 ms stddev 2.158
progress: 300.0 s, 3158.5 tps, lat 3.166 ms stddev 2.435
progress: 360.0 s, 3097.8 tps, lat 3.228 ms stddev 3.078
progress: 60.0 s, 4114.9 tps, lat 2.429 ms stddev 7.033
progress: 120.0 s, 4143.9 tps, lat 2.414 ms stddev 6.656
progress: 180.0 s, 3845.5 tps, lat 2.598 ms stddev 7.774
progress: 240.0 s, 4324.4 tps, lat 2.314 ms stddev 7.129
progress: 300.0 s, 3892.1 tps, lat 2.569 ms stddev 8.179
progress: 360.0 s, 4071.5 tps, lat 2.457 ms stddev 7.833
Doesn't look like they scale too well, unfortunately.