This incident highlights a problem that is often overlooked in the debate about feature branches versus feature toggles.
I've worked with both feature branches and feature toggles, and while long lived feature branches can be painful to work with what with all the conflicts, they do have the advantage that problems tend to be uncovered and resolved in development before they hit production.
When feature toggles go wrong, on the other hand, they go wrong in production -- sometimes, as was the case here, with catastrophic results. I've always been nervous about the fact that feature toggles and trunk based development means merging code into main that you know for a fact to be buggy, immature, insufficiently tested and in some cases knowingly broken. If the feature toggles themselves are buggy and don't cleanly separate out your production code from your development code, you're asking for trouble.
This particular case had an additional problem: they were repurposing an existing feature toggle for something else. That's just asking for trouble.
That's interesting. Whenever I have an issue with a flag it gets picked up on dev/test/uat environments (all gets tested, especially around the code behaving the same as before with the flag off). The code change never reaches production. And if for some reason the code under the flag is wrong, and it has reached production (something unexpected, unseen), undoing the change is whatever long it takes to switch the flag back (and the cache to update if you have a cache).
That's a good approach if you can cleanly separate out the old code from the new code, and if you can make sure that you've got all the old functionality behind the switch. Unfortunately this can be difficult at times. Feature toggles involving UI elements, third party services or legacy code can be difficult to test automatically, for example. Another risk is accidental exposure: if a feature toggle gets switched on prematurely for whatever reason, you'll end up with broken code in production.
The cases where I've experienced problems with feature toggles have been where we thought we were swapping out all the functionality but it later turned out that due to some subtleties or nuances with the system that we weren't familiar with, we had overlooked something or other.
Feature toggles sound like a less painful way of managing changes, but you really need to have a disciplined team, a well architected codebase, comprehensive test coverage and a solid switching infrastructure to avoid getting into trouble with them. My personal recommendation is to ask the question, "What would be the damage that would happen if this feature were switched on prematurely?" and if it's not a risk you're prepared to take, that's when to move to a separate branch.
Railway-oriented programming is an interesting concept and it does have its use cases, but it does need to come with a massive health warning. I've often seen it used in practice to reinvent exception handling badly, and this is something I consider particularly ill advised because exceptions, when understood and used correctly, provide a much cleaner and more effective way of handling error conditions in most cases.
The thing about exceptions is that in most cases, they make the safe option the default. An error condition is an indication that your code can not do what its specification says that it does, and in that case you need to stop what you are doing, because to continue regardless means that your code will be operating under assumptions that are incorrect, potentially corrupting data. Error conditions can happen for a wide variety of reasons, many of which you do not anticipate and can not plan for, and in those cases the only safe option is to clean up if necessary and then propagate the error up to the caller. Exceptions do this automatically for you by default (you need to explicitly override it with a try/catch block) but alternative approaches, such as railway oriented programming, require you to add in a whole lot of extra boilerplate code that is easy to forget and easy to get wrong. If you can't handle the error condition on the way up the call stack, you would then log it at the top level and report a generic error to the user.
Having said that I see two particular use cases for this kind of technique. The first is situations where you need to handle specific, well defined and anticipated errors right at the point at which they occur. Validation is one example that comes to mind; another example is where you are trying to fetch a file or database record that does not exist. The second is situations where exception handling is not available for whatever reason. Asynchronous code using promises (for example with jQuery) are pretty much an exact implementation of railway oriented programming, but since modern JavaScript now has async/await, we can now use exception handling in these scenarios.
> Exceptions do this automatically for you by default (you need to explicitly override it with a try/catch block) but alternative approaches, such as railway oriented programming, require you to add in a whole lot of extra boilerplate code that is easy to forget and easy to get wrong.
The unfortunately missing part of exceptions (in mainstream languages) is that they handle this invisibly. Figuring out, at compile time, what sort of exceptions can appear inside a given function is not obvious.
That's the big payoff of ROP: you can look at any function signature and immediately know what sort of errors can come out of it.
Mitigating the downside of ROP (boilerplate) can be done to various extents, depending on the language. Haskell has do-notation. In F#, using the result computation expression [0] can make your code extremely clean:
type LoginError = InvalidUser | InvalidPwd | Unauthorized of AuthError
let login (username : string) (password : string) : Result<AuthToken, LoginError> =
result {
// requireSome unwraps a Some value or gives the specified error if None
let! user = username |> tryGetUser |> Result.requireSome InvalidUser
// requireTrue gives the specified error if false
do! user |> isPwdValid password |> Result.requireTrue InvalidPwd
// Error value is wrapped/transformed (Unauthorized has signature AuthError -> LoginError)
do! user |> authorize |> Result.mapError Unauthorized
return user |> createAuthToken
}
Could we do the reverse, i.e. mitigate the downside of exceptions? Is there a linter, code analyzer, or some other compile-time tool that can integrate with a Java IDE and automatically display the uncaught exceptions that might be thrown by a given line of code?
Java has/had a compiler check that forced you to write catch blocks or `throws` annotations in/on functions that call other functions which might throw.
The feature is called "checked exceptions" and I believe it has been discarded for its inconvenience by now.
Sometimes it feels like developers are going in circles while trying to find the most optimal way to handle errors.
Checked exceptions are one of the main reasons I’m sticking with Java, even though Java lacks the ability to abstract over sets of checked exceptions, which does cause some inconvenience. It’s unfortunate that no other mainstream languages have been taking that approach.
With result types, you typically don’t get automatic exception propagation. I agree that overall it’s a spectrum of syntactic convenience, checked exceptions effectively form a sum type together with the regular return type.
That sounds awesome! Do you have that flag set on a big codebase? Was it a big hassle to turn it on (like you had to remediate a bunch of code that didn't handle exceptions before you could check it in). Have you seen any big changes since enabling it?
People are now realizing that having the errors a function can cause right in the type system may actually have been a good idea, but when you point out that Result is not the only way and that Java checked Exceptions do the exact same thing (and so does the Zig error handling mechanism which is a third variant of the idea), they come up with all sorts of easily dismissable nonsense to explain why the two are very different.
Well, that's kind of true! The fact checked Exceptions are inconvenient doesn't change the fact they are equivalent to returning a Result type (the implementation is obviously different but I think we don't need to mention that).
A future version of Java could totally make it more convenient, and perhaps even make the implementation cheaper such that it would not just nearly the same , but literally the same as in Rust or other similar languages.
Does it even need a new language version? If the compiler already spits out the error "hey, your Fart() function should be annotated with 'throws ButtsException'", couldn't an IDE relatively easily be configured to automatically add the " throws " annotations?
I’m not the parent, but exception declarations are IMO necessary for a stable API contract. It’s exactly the same reason why return types are explicit. The actual issue in Java is that you can’t abstract over an arbitrary-length list (sum) of checked-exception types (variadic type parameters) (with the exception of rethrowing from multi-catch clauses).
> The unfortunately missing part of exceptions (in mainstream languages) is that they handle this invisibly. Figuring out, at compile time, what sort of exceptions can appear inside a given function is not obvious.
Figuring out, at compile time, what sort of exceptions appear inside a given function is a futile exercise in many contexts, and railway oriented programming does not fix it. Java tried this with checked exceptions and it fell out of favour because it became too unwieldy to manage properly.
In any significantly complex codebase, the number of possible failure modes can be significant, many of them are ones that you do not anticipate, and of those that you can anticipate, many of them are ones that you cannot meaningfully handle there and then on the spot. In these cases, the only thing that you can reasonably do is propagate the error condition up the call stack, performing any cleanup necessary on the way out.
"Handling this invisibly" is also known as "convention over configuration." In languages that use exceptions, everyone understands that this is what is going on and adjusts their assumptions accordingly.
> Java tried this with checked exceptions and it fell out of favour because it became too unwieldy to manage properly.
Because they did a half-assed job of it, and required the user to explicitly propagate error signatures. Inference and exception polymorphism are essential.
Checked exceptions always seemed to me to be an exercise of self-flagellation and enumerating badness; when most of the time there are a handful of specific errors that require special handling, with everything else logged/return error/possibly crash.
The problem is that the callee can’t decide for the caller which exceptions will require special handling. And for the caller to be able to make an informed decision about that, the possible exceptions need to be documented. Since this includes exceptions thrown from further down the call stack, checked exceptions are about the only practical way to ensure that all possible failure modes get documented, so that callers are able to properly take them into account in their program logic.
If you want to (and are able to) document all possible failure modes, then checked exceptions will give you that. As far as I can tell, railway oriented approaches can't.
Unfortunately, you can only do that when the number of possible failure modes is fairly limited. In a complex codebase with lots of different layers, lots of different third party components, and lots of different abstractions and adapters, it can quickly become pretty unwieldy. And then you end up with someone or other deciding to take the easy way out and declaring their method as "throws Exception" which kind of defeats the purpose.
> No; you simply abstract the underlying subsystem’s exceptions in your own types, the same way you do with any other type.
That's all very well as long as people actually do that. It doesn't always happen in practice. And even when they do, the abstractions are likely to be leaky ones.
> And yes, “railway oriented approaches” can absolutely do this.
How? Please provide a code sample to demonstrate how you would do so.
> That's all very well as long as people actually do that. It doesn't always happen in practice. And even when they do, the abstractions are likely to be leaky ones.
They don’t have a choice under “railway oriented” API in a typesafe language — they must translate the subsystem’s error types to their own error type.
If the abstraction is leaky, at least it’s well-specified.
How is that worse than having no abstraction at all, and leaving callers with no idea what error cases an API might raise?
> How? Please provide a code sample to demonstrate how you would do so.
In what language? What data structure?
If we assume Haskell and Either, then it can be as trivial as:
You adjust the reported failure modes to the abstraction level of the respective function, wrapping underlying exceptions if necessary. You don’t leak implementation details via the exception types. Callers can still unwrap and inspect the underlying original exceptions if they want, but their types won’t typically be part of the function’s interface contract, similar to how specific subtypes of the declared exception types are usually not part of the contract.
I think the conventional way exceptions are implemented is pretty bad.
First, a lot of languages make you use an awkward, unnecessary scope to catch an exception. e.g., you want to declare and initialize a variable to the value of a function that can throw (and assign some other value if it does. Well, you've got to split the declaration and initialization, putting the declaration outside the scopes try and catch create. That one's an unforced error -- languages don't have to do that to use exceptions, but for some reason many do. It's pretty weird to have to add homespun utilities for fundamental control flow scenarios.
But the bigger issue is that you really want to handle the error conditions at the lowest level where you have enough context to do so correctly. That's usually pretty low, but exceptions default to "send it all the way to the top". The default is either invisible or invisible in practice, depending on the language, and wrong, so programs end up riddled with these issues. You tend to end up with these higher-level functions that can throw all kinds of exceptions, many of which are meaningless to the caller. E.g. someone adds a file cache one day and all of a sudden some higher-level HandleRequest function can through a IO exception... because the cache code didn't handle it... because they never even realized it was a possibility. You couldn't design a better mechanism for creating leaky abstractions.
I think anything a function might return needs to be an explicit part of its signature, and a caller needs to handle it explicitly, even if just to indicate, pass it up the line. The langue doesn't need to require a lot of boilerplate to do this.
That's just my experience from having lived through it.
I think Rust has shown very well how ROP with first-class syntax support pretty much eliminates all boilerplate code. IMHO Rust nailed error handling with the `Result` type/trait.
It came to my mind too but then I got confused, what if the type of the Result changes along the function call chain and you want to propagate Errors with minimal effort?
Then I saw this stackoverflow question and it seems that ? operator does quite smart thing and is as easy to use as possible.
anyhow is the most commonly used crate to have type erased errors(1), nothing more then that but also nothing less
this means when returned form a library a Result<_, anyhow::Error> _is often an anti-pattern_ (often not always!)
but if you write an application it's pretty common to have many many places in the code where you can be sure that no upstream code needs more fine grained error handling (because you workspace is the most upstream code) so using anyhow is a pretty common and convenient choice
Though it's not unlikely for anyhow to fade into being mostly unused in the future with further currently missing rustc/std features, through not anytime soon.
But luckily this doesn't matter, due to how `?` works you can trivially convert errors on the fly no matter which (well kinda, there is an unlucky overlap between orphan rules and From wildcard implementations in the anyhow crate, but we can ignore that for this discussion).
(1): It's basically a form of Box<dyn Error + Send + Sync + 'static> which also has thin pointer optimizations and (can) by default include a stack trace + some convenience methods.
Sure, but the question was specifically looking for the "minimum effort" solution. I almost brought up thiserror but that just makes things more complicated. If you're writing a Rust application and just want to propagate errors, anyhow is currently the most popular way to do that.
anyhow is for type erased errors, which is mainly used for the kind of errors you mainly propagate upward without handling them in any fine grained way. It's mainly used in applications (instead of libraries). For example in a web server anyhow errors will likely yield Internal Serer errors.
thiserror provides a derive (codegen) to easily create your own error. It's much more often used by libraries, but if an application doesn't want to handle this errors they will likely be converted into anyhow errors. A very common use case is to apply it on an enum which represent "one of many errors" e.g. as a dump example `enum Error { BadArgument(...), ConstraintViolation(...), ... }` and it's no absurd in some cases to have a mixture e.g. an enum variant `Unexpected(anyhow::Error)` which represents various very unpexted errors which likely could be bugs and you might have considered panicing there but decided to propagate them instead to avoid panic related problems
I don't understand why this answer is buried deep in a thread & isn't included in the Rust Book, even though it's been conventional wisdom among experienced Rustaceans for a few years now.
Download counts don't mean very much here as I'm fairly sure both crates are common transitive dependencies. Or in other words, millions of programmers aren't individually choosing Anyhow or Thiserror on a monthly basis -- they're just dependencies of other rust crates or apps.
And agreeing with the other reply, nobody jumps up and down with joy when choosing an error handling crate. You pick the right poison for the job and try not to shed a tear for code beauty as you add in error handling.
In my mind, the difference between errors-as-values and exceptions is most useful when describing domain-specific errors and other issues that you have to handle in support of the domain/problem space. To me, domain errors make sense as errors-as-values, but your database being unreachable is unrelated to the domain and makes sense as an exception.
> another example is where you are trying to fetch a file or database record that does not exist
I think this depends on whether or not you expect the file/record to exist. Handling a request from a user where the user provided the id used for lookup? The lookup itself is validation of the user input. But if you retrieved a DB record that has a blob name associated with it and your blob storage says that a blob doesn't exist by that name? I find that to be a great situation for an exception.
The errors-or-exception line is fuzzy and going to be dependent on your team and the problems you're solving, but I've found that it's a decent rule of thumb.
"The first is situations where you need to handle specific, well defined and anticipated errors right at the point at which they occur"
Barring system level errors can you give an example of an error state that's not like that, that would then rather merit an exception? I would like to understand your point of view, is it due to the nature of the problem, or the constraints of runtime that make exceptions preferable.
In the C++ code I need to write, we can 1. check data for error conditions in the beginning 2. if we fail the error check, let application crash 3. use the found error state to debug and fix the error in the initial checking code.
The data my code needs to process is fairly straightforward - data abiding by some known CAD data format or given geometric topology, so the error conditions are "quite easy" to tackle in the sense that there is an understanding what correct data looks like in the first place.
Missing dependencies. External services having gone offline. Timeouts. Foreign key violations. Data corruption. Invalid user input. Incorrect assumptions about how a third party library works. Incorrectly configured firewalls. Bugs in your code. Subtle incompatibilities between libraries, frameworks or protocols. Botched deployments. Hacking attacks. The list is endless.
Probably not so much of an issue if you're dealing with well validated CAD data and most of your processing is in-memory using your own code. But if you're working with enterprise applications talking to each other via microservices written by different teams with different levels of competence, legacy code (sometimes spanning back decades), complex and poorly documented third party libraries and frameworks, design decisions that are more political than technical, and so on and so forth, it can quickly mount up.
> External services having gone offline, timeouts, and invalid user input are expected conditions you should handle locally.
Not necessarily. You should only handle expected conditions locally if there is a specific action that you need to take in response to them -- for example, correcting the condition that caused the error, retrying, falling back to an alternative, or cleaning up before reporting failure. Even if you do know what all the different failure modes are, you will only need to do this in a minority of cases, and those will be determined by your user stories, your acceptance criteria, your business priorities and your budgetary constraints. That is what I mean by "expected conditions." Ones that are (or that in theory could be) called out on your Jira tickets or your specification documents.
For anything else, the correct course of action is to assume that your own method is not able to fulfil its contract and to report that particular fact to its caller. Which is what "yeeting exceptions up the call stack" actually does.
> Almost everything else you listed represents a bug in your software that should terminate execution.
Well of course it represents a bug in your software, but you most certainly do not terminate execution altogether. You perform any cleanup that may be necessary, you record an event in your error log, and you show a generic error message to whoever needs to know about it, whether that be the end user or your support team.
Again, what action you need to do in these cases will depend on your user stories, your acceptance criteria, your business priorities and your budgetary constraints. But it is usually done right at the top level of your code in a single location. That is why "yeeting exceptions up the call stack" is appropriate for these cases.
You only terminate execution altogether if your process is so deeply diseased that for it to continue would cause even more damage. For example, memory corruption or failures of safety-critical systems.
> I’m more than a little shocked that you think yeeting exceptions up the call stack is appropriate for these cases.
I hope I've clarified what "yeeting exceptions up the call stack" actually does.
The alternative to "yeeting exceptions up the call stack" when you don't have any specific cleanup or corrective action that you can do is to continue execution regardless. This is almost never the correct thing to do as it means your code is running under assumptions that are incorrect. And that is a recipe for data corruption and all sorts of other nasties.
How do you know what to cleanup when you have no idea which APIs might throw, what stack frames might have been skipped when they do throw, and what state was left broken by yeeting a stack-unwinding exception up your call stack?
You clean up processing that your own method is responsible for. For example, rolling back transactions that it has started, deleting temporary files that it has created, closing handles that it has opened, and so on and so forth. You rarely if ever need to know what kind of exception was thrown or why in order to do that.
You can only assume that the methods you have called have left their own work in a consistent state despite having thrown an exception. If they haven't, then they themselves have bugs and the appropriate cleanup code needs to be added there. Or, if it's a third party library, you should file a bug report or pull request with their maintainers.
You don't try to clean up other people's work for them. That would just cause confusion and result in messy, tightly coupled code that is hard to understand and reason about.
Usually, no you don't. You only write a try ... catch or try ... finally block round the entire method body, from the point where you create the resources you may need to clean up to the point where you no longer need them. For example:
var myFile = File.Open(filename);
try {
while ((var s = file.ReadLine()) != null) {
var entity = ProcessLine(s);
// do whatever you need to do to entity
}
}
finally {
myFile.Dispose();
}
C# gives you the using keyword as syntactic sugar for this:
using (var myFile = File.Open(filename)) {
while ((var s = file.ReadLine()) != null) {
ProcessLine(s);
// do whatever you need to do to entity
}
}
It isn't in practice. Only a minority of methods actually need it.
It's certainly far, far better than having to add exactly the same check after every method call. Which is only what you need to do if you're working in a situation where exceptions are not an option.
I'll add that C# also has using statements that dispose the object when the current scope exits (including if it exits due to an exception) this significantly cuts down on ugliness .
C++/Rust are different because exceptions in those languages are expensive and culturally counter indicated.
For the runtime-hosted languages the author is talking about (JVM, CLR, Python etc.), optionally throwing an exception is much cheaper than constantly creating and unwrapping Result objects. Your example is a perfect case where one would prefer to throw: say you have a parser that parses your file and the parser is expensive because the files are large. You are better off throwing out of your parsing iteration then doing a Result.map in your hot loop. (However you might want to wrap the top level of the parser in a Result and return that.)
I disagree that exceptions are better in most cases. Exceptions aren't captured effectively in most type systems so it's hard to ensure you've covered all your bases. When used effectively, discriminated unions for return types force you to handle all the cases and the result is much more robust in my experience.
In a previous job, I joined a team that was supposed to be introducing DevOps to the organisation.
It started out well -- we spent a few months hacking with Terraform, Docker, Vagrant, Kubernetes, and related technologies to implement an infrastructure-as-code approach -- automating the process of provisioning and decommissioning servers, and building a setup where development teams could deploy updates with a simple git push.
Unfortunately it all went downhill fairly rapidly. We ended up spending the majority of our time manually applying security patches to a bunch of snowflake servers that we'd lifted-and-shifted from another hosting provider to AWS, and fielding support requests from the development teams. Within a year, we were being told in no uncertain terms by our project manager that we were an operations team, not a development team.
It felt like a complete bait-and-switch. Within two years, I had left the organisation in question and moved on to a new job elsewhere doing actual development again. Last I heard, the entire team had been disbanded.
It sounds like the author of this article must have had a very similar experience. I wonder just how common it is. It seems that in many places, "DevOps" is all Ops and no Dev.
> It seems that in many places, "DevOps" is all Ops and no Dev.
This was definitely my experience at my last couple of jobs. At my last one they "scaled out their DevOps team" by hiring tons of juniors with next to no software development background. And then they "empowered" teams by assigning the juniors to each dev group. As a result, we ended up having to train them how to do their core jobs, which... went about as well as you'd think.
Eventually, there was an attempt to shift everyone to kubernetes. They had a special "DevOps" team build a layer on top of it to handle the non-kubernetes aspects of deployment as well, and somehow manage them together using Helm. If you're wondering "what the hell does that mean", well, it turned out nobody really knew. These "DevOps" engineers didn't really seem to understand kubernetes core concepts, and just ended up hacking away with some scripts on top of terraform delivered via Helm until something got configured. It was incredibly slow to deliver, hard to use, and I just stayed away from it until some exec threw down the mandates. (And then everyone started quitting because it was an absolute disaster.)
Ultimately, these are really stories about bad management, not really anything to do with DevOps. But that's how these things roll - some new hot concept comes to town, and bad managers try to adopt the term, without really understanding it.
What you just described is the exact same reason I started working on https://stacktape.com 3 years ago.
When doing the market research, I talked to ~150-200 companies (mostly SMBs). Everyone was trying to "do DevOps". But the complexity of running a Kubernetes cluster (or a custom AWS setup using ECS) is just overwhelming for most of the teams.
In most cases, the DevOps/platform team requires atleast 2-3 experienced people that have successfully done this before.
Considering how few experienced DevOps people with such kind of experience are currently available on the market, it's no surprise that only the "coolest" companies around get to hire these people. These successful companies then write blogposts about how successful they were.
And the circle starts all over again. Less successful companies follow them and (in most cases) fail.
Most of these companies don't admit it, or don't admit it soon enough. They also don't write blogposts about their failures.
From my experience and research, roughly 70-80% of companies fail to deliver the expected results (or deliver them with order of magnitue more effort than initially expected). Yet 90-95% of the content we get to read about these topics is overwhelmingly positive.
PS.: If you don't have an A-tier DevOps teams, check out https://stacktape.com. I promise it will make your life easier.
It's not always bad managers or management. The fancy new hotness gets pushed onto them by smooth-talking evangelists. For every criticism or question they have a snazzy little quip and retort that makes you look like an idiot for not knowing "the obviousness" of your errors and how this new fad/tech/framework/methodology solves it. And if that retort doesn't work, they just tell you "it's standard in the industry, dunno what you want me to tell you".
And from the outside, this all just looks like resume padding and job-security. Devops is the new priesthood, subscribe or be reduced to irrelevance by config warriors.
The fancy new hotness gets pushed onto them by smooth-talking evangelists.
Gets pushed onto them top-down. This means there is no real competition, no real empiricism, no real comparative merit involved in the switch.
How should never be imposed top-down. It's only goals and how success is measured which should be top-down. The classic example of this was when Jeff Bezos mandated that all Amazon software systems should be accessible by APIs over the internal network, or else one would be fired.
Whenever I've seen How imposed top down, I've only seen the lower level managers talk about how they could put off and passive aggressively stymie the initiative.
> We ended up spending the majority of our time … fielding support requests from the development teams
This has been my experience at 3 different small-medium companies now. A too small DevOps team suddenly is in the critical path for even the most trivial software task, then engineering productivity grinds to a halt. I think a much better pattern would be to enable dev teams to self serve. Set up the required infrastructure and guard rails, then let teams handle their own deployments and infrastructure. Give people what they need to do it themselves instead of having to open a support ticket for everything.
> I think a much better pattern would be to enable dev teams to self serve. Set up the required infrastructure and guard rails, then let teams handle their own deployments and infrastructure.
I think that's how DevOps is actually supposed to be done in the first place. You view Ops -- and the code used to manage and support it -- as a product, and get a specialised team of experienced Devs (and architects) to build it.
Once you've got the basic infrastructure and architecture in place, you then train up the individual development teams to customise it, extend it and troubleshoot it as they need to. In much the same way as they do with any other software product.
My experience is what inevitably happens is the ops team goes an writes a layer on top of Kubernetes, and now instead of dealing with Kubernetes you're dealing with a half baked poorly written abstraction with zero documentation and no StackOverflow on top Kubernetes. So you need to become an expert in both.
Most organizations don't have the resources, mindset, or skills to support a software library product and should only do it as a last resort.
Most developers won’t know how to do it without footgunning themselves constantly is the problem.
If the dev ops team is staffed enough to develop integrations that won’t allow that AND won’t get in the way, and then train folks and ‘keep the line’ enough to stop the scope creep - they’re probably not at a shitty small/medium sized shop.
1. An operations team with a different name.
2. A platform team with a different name.
3. A development team with a different name.
4. A "CI/CD team".
5. A role (ex. "dev who automates ops", "ops who codes", "support specialist who codes", all three in one).
6. A chart that the delivery manager maintains.
Here is what DevOps should actually be:
1. Delivering rapidly and consistently with extremely high levels of confidence.
2. The right people address problems correctly, immediately, the first time, and fix it so it doesn't happen again.
3. That's it.
Increased velocity is what business get promised, yes. It’s what you want.
The reality is that you can’t just magically make that happen.
Draw a line. Now draw the rest of the owl! Easy~
The problem has never been understanding what the desired state is… it’s always been that getting from the current state to the desired state is very very hard, and continually road blocked by:
- people who don’t want to learn new skills
- people who don’t want to seed control they currently have (over process and product)
- a lack of clarity on who is responsible for what systems
Devops is a load of hype.
There’s never been any reliable process to move to rapid delivery from it.
Yes, some teams have managed to get something that works, and there are a lot of tools and a lot of training which has resulted in over all better SRE processes.
…but by and large, that’s because of using better tools (eg. infrastructure as code) not because of devops.
When was the last time you had a “devops” guy you got to do something for you?
Right. That’s ops.
When was the last time something broke and the people responsible for making sure it never happened again were the “platform team” or an SRE?
Ops again.
You had all that before you started your devops journey.
It’s all just ops, with slightly better tooling, less outages, higher reliability and absolutely zero increase in product velocity.
Devops was the promise that by bridging operations and development you could get high reliability and faster iteration by having teams that could “cut through” the red tape and get things done.
That appears, largely, not to work.
Yes, developers that understand systems tend to build more reliable software.
No, it is not faster to do it that way, and the transition will be painful, and, because businesses mostly care about iteration speed more than reliability, even a technical success, it often a fails to deliver on its business value.
It has worked well for organizations that embraced it. It hasn't worked well for organizations that paid it lip service. That's the way of the world. There is a path laid out by DevOps methods and dozens of ways to get there, but the path doesn't walk itself.
Note that this admits strategies where nothing of consequence is ever delivered (but each deploy has some quantifiable and measured churn), and the people that break stuff get credit for fixing the stuff they broke.
I've watched this particular breed of organizational cancer destroy many companies and products.
The end game is that people creating useless, but highly visible churn get promoted, as do the ones that repeatedly break stuff. Even if that doesn't happen, the engineers that want to build stuff inevitability flee.
That's where value chain management comes in. If you can't show business value being delivered, there's no point to any of it.
It's also worth acknowledging when you don't need DevOps. Banks, for example, shouldn't need it. Their entire purpose is to be slow and reliable. Most of their money is literally just old people keeping lots of money in one place and not touching it. They shouldn't need to churn on features and ship constantly.
I've not seen one place that has escaped this problem though.
Something is either old and nobody feels like fixing it or something doesn't fit into the current constraints of the platform featureset. So they build something on their own, probably undocumented and without consulting the supposed DevOps team, but still using parts of the platform that have not been declared an API. But when you want to exercise the freedom you supposedly built with your platform, all these edge cases fall back on you and inhibit change. "We can't change the ingress controller, we rely on this implicit behavior of it" "You can't change how database credentials are provisioned, we pulled these from the Cloud SQL console and comitted them to git". And facilitating any change as soon as you can cover the use-case is a fight with stakeholders and POs that I usually have no nerve for. "Why do we need to do anything??? It works?". And then you get blamed when it breaks. I love this job.
The big problem that I have with Clean Code -- and with its sequel, Clean Architecture -- is that for its most zealous proponents, it has ceased to be a means to an end and has instead become an end in itself. So they'll justify their approach by citing one or other of the SOLID principles, but they won't explain what benefit that particular SOLID principle is going to offer them in that particular case.
The point that I make about patterns and practices in programming is that they need to justify their existence in terms of value that they provide to the end user, to the customer, or to the business. If they can't provide clear evidence that they actually provide those benefits, or if they only provide benefits that the business isn't asking for, then they're just wasting time and money.
One example that Uncle Bob Martin hammers home a lot is separation of concerns. Separation of concerns can make your code a lot easier to read and maintain if it's done right -- unit testing is one good example here. But when it ceases to be a means to an end and becomes an end in itself, or when it tries to solve problems that the business isn't asking for, it degenerates into speculative generality. That's why you'll find project after project after project after project after project with cumbersome and obstructive data access layers just because you "might" want to swap out your database for some unknown mystery alternative some day.
The problem with explicit error handling is that it's all too easy to get it wrong (by forgetting to check the return value) and when it does go wrong, it goes wrong silently, introducing a risk of leaving you with corrupt data. In production.
The beauty of exceptions, on the other hand, is that the default option is the safe one. Sure, forgetting to add error handling may leave you presenting a user with a stack trace, but at least you're not billing them for something that never gets delivered.
>The beauty of exceptions, on the other hand, is that the default option is the safe one
Except, when it's not. Exceptions tend not to solve the problem, only make it subtly worse. The biggest wart on exceptions is the fact it introduces non-local control flow. All of the sudden any function you call can cause you jump out of your current function. In any situation where an unhandled error will corrupt your state, exceptions have that problem as well, on top of the fact that they are invisible.
An `err` aliased to `_` or shadowed can be found by a linter or a human reading the code. A function `foo()` causing your stack to blow up and possible corrupt any IO you are doing is worse. Therefore, the guys in Java-land discovered CheckedExceptions which were even more controversial, and arguably led to languages like Go and Rust dropping exceptions in general.
The problem with explicit error handling is that it's all too easy to get it wrong (by forgetting to check the return value) and when it does go wrong, it goes wrong silently, introducing a risk of leaving you with corrupt data. In production.
The problem with exception error handling is that it's all too easy to get it wrong (by forgetting to complete the handle code) and when it does go wrong, it goes wrong silently, introducing a risk of leaving you with corrupt data. In production.
In either case, error handling needs to be code reviewed. The best thing to do, is to make the right thing the easiest, minimal friction thing to do. Unfortunately, getting off the "happy path" is often a messy business. My suggestion is to explicitly implement a standard "developer scaffold" to be used when filling in the error handling during development, with penalties for not using it. This makes it easier to find where error handling needs to be fully fleshed out.
The beauty of exceptions, on the other hand, is that the default option is the safe one. Sure, forgetting to add error handling may leave you presenting a user with a stack trace, but at least you're not billing them for something that never gets delivered.
The entire reason why Unit Testing and Test First made it into Extreme Programming and Agile methods, is that it was way too easy for end-users to see error notifiers and stack traces in production in Smalltalk.
> The problem with exception error handling is that it's all too easy to get it wrong (by forgetting to complete the handle code) and when it does go wrong, it goes wrong silently, introducing a risk of leaving you with corrupt data. In production.
It's much more difficult to "forget completing the handling code" than it is to overwrite a golang error value and not handle it (which does happen in code bases). The default behavior of unhandled exceptions is to bubble up, unlike golang errors that can get silently dropped quite easily, and I've seen this is in large code bases. Unless you're writing
try { ... } catch (..) { /* do nothing */ }
in which case you explicitly opt into doing nothing, and for which there are linters that catch this sort of behavior automatically.
in which case you explicitly opt into doing nothing, and for which there are linters that catch this sort of behavior automatically.
There are linters for golang. No reason why those sorts of tools and community norms shouldn't squash those sorts of behaviors. (As has happened for race conditions in golang!)
I used such linters, and they fail at certain things.
Of course. They're just a tool.
For the amount of concurrency done, golang is doing pretty good. I know of no other programming community which has such community standards on using race condition checking to the same extent as linting.
Which is why it is strictly superior to use a language feature which does not have this issue to begin with (e.g. exceptions or actual compiler enforced error handling like Rust).
> I know of no other programming community which has such community standards on using race condition checking
Compare (again) with Rust, or with languages with proper immutable data structures like Scala and Java
To paraphrase you from earlier: As far as I know, race conditions happen and leak out into production with Scala and Java.
The lesson of golang, and really the lesson of programming in the large for the past 30 years, is that it's not enough to have mathematical or methodological power in guaranteeing correctness in a program. If that were the case, formal methods would have won decades ago, and we'd all be using such environments. It's not about having the most rigorous widget. It's getting programmers to do the right thing, month after month, year after year, in large numbers, across different management teams. Arguing the strength of constructs within programming languages is just an idle meta-anatomic measurement exercise.
That said, I like lots of things about Java, Scala, and Rust. I just happen to really like golang for a different set of reasons. Golang is concurrency+programming-in-the-large-human-factors "Blub," in a way that refutes PG's essay about Blub.
Yes - that was implied in the same sentence where I wrote "(e.g. exceptions or actual compiler enforced error handling like Rust)." :-)
> As far as I know, race conditions happen and leak out into production with Scala and Java.
They do. But there are more techniques in those languages (such as immutable collections, as their type systems are actually able to model and implement them) to mitigate the issue.
> and really the lesson of programming in the large for the past 30 years, is that it's not enough to have mathematical or methodological power in guaranteeing correctness in a program.
I agree there. I actually don't abide by the ideas that using a language like Haskell or Idris or what have you will automagically grant you superior or error-free software. I actually like what talks like these have to say:
However, that does not mean we disregard established practices, only to try to reinvent them in a bad way as we see golang trying to do (no generics - even though many other languages implemented them "correctly", bad error handling, no sum types, null pointers, and so on). They purposely disregarded these works without good arguments at all, and now they're struggling to find workarounds, or continue to dismiss them as non-issues, even though they clearly are.
The way I see it, the moment Java gets fibers and value types implemented, and GraalVM's native compilation is mature enough for use, golang's appeal as a devops langauge where it ended up (however low it is currently - as it's mainly over hyped) becomes even less. C#/.NET is the other alternative making very good progress, and they already have async, value types, and NGEN (though I don't know how mature the latter is for production).
> programming-in-the-large-human-factors
This seems to continually get mentioned without anything to back it up. Sorry, but from what I've seen, Java and C# are strictly superior for "programming in the large". Things like "fast compile times" aren't even a factor in front of Java and C# with their incremental compilation. And the fact that the language is "simple" just means that you will end up with more verbose and complex code, with more places for things to go wrong, and yes I've seen it in larger golang code bases.
As other commenters have noted, this post is seven years old. My position on feature branches has evolved (and softened somewhat as well) in the meantime.
> TL:DR Successive, well intentioned, changes to architecture and technology throughout the lifetime of an application can lead to a fragmented and hard to maintain code base. Sometimes it is better to favour consistent legacy technology over fragmentation.
Nice idea in theory, sometimes impossible in practice.
A few years ago I came onto a project that had a very clearly defined separation of concerns, with a business layer, data access layer, presentation layer, and Entity Framework. This was resulting in a number of SQL queries that took over a minute to run and caused web pages to time out.
I ended up cutting right through the layers, bypassing Entity Framework altogether and replacing it with hand-crafted SQL. This ended up cutting down the query time from six minutes to three seconds.
Abstractions and hard-interfaces rarely result in increased efficiency.
Breaking through the barriers and merging layers often allows a more inefficient solution in the same way as denormalisation increases performance through ignoring the "rules".
Well in the example I've just given they reduced query times from six minutes to three seconds. If that isn't increased efficiency, then I don't know what is.
The fact is that sometimes you have to ignore the "rules," because the "rules" were designed to serve a purpose that does not apply in your particular case, or perhaps never even applied at all in the first place.
The problem with trying to separate your business layer from your data access layer is that it's often difficult if not impossible to identify which concerns go into which layer. Take paging and sorting for example. If you treat that as a business concern, you end up with your database returning more data than necessary, and your business layer ends up doing work that could have been handled far more efficiently by the database itself. On the other hand, if you treat it as a data access concern, you end up being unable to test it without hitting the database.
You need to realise that software development always involves trade-offs. Blindly sticking to the "rules" is cargo cult, and it never achieves the end results that it is supposed to.
Incidentally I wrote a whole series of blog posts a while ago where I cast a critical eye over the whole n-tier/3-layer architecture and explained why it isn't all that it's made out to be.
Man... I read your blog. It's all true. I find so hard to explain this to Jr devs. They read a lot of best practices, and take it as a religion. Lots of uneeded code is created.
Pet projects need not take up all of your time. All you need is a few hours every so often — an evening or so once every couple of months, or one weekend a year would set you head and shoulders above a lot of people. The whole thing of dark matter developers with no pet projects at all, versus "passionate programmers" who eat, sleep and breathe code, is a completely false dichotomy.
All employers are looking for is some publicly available evidence that you actually have the skills that you claim to have listed on your CV.
I started keeping a comprehensive developer diary (I actually refer to it as lab notes) back in December, and it's made a considerable difference to how I think about what I'm coding.
The way I write it is similar to test-driven development — I write down each step in what I'm doing before I do it. When I run a command, I copy the command itself into my notes, then once it's run I copy the important parts of the output (e.g. any error messages).
It really comes into its own when I want to pick up on a task that I had put to one side a few days previously, or when I run into a problem that I think I'd encountered before. It also makes it much easier to avoid the trap of trying the same thing several times before you realise that you're going round and round in circles.
It's also useful for writing documentation. You can just copy and paste what you've written in your lab notes into your commit summaries, Jira tickets, e-mails, spec, help files, whatever, then tidy up as appropriate.
Is there a reason why you made the structure of the notes depend on time? Aren't notes that are about the same topic getting very far apart from each other?
I think searching works well when you are as month or maybe three into the project. How does this work like after five years?
If you organize by topic, you have to decide which topic to put information in. This means you have to pause and think for 10 seconds about your organization system before writing anything. 10 seconds is enough to lose your train of thought.
If you organize by time, you don't have to decide where to put it. It always goes in "today". You have to pause for 0 seconds.
I can confirm, I do this also and posted so elsewhere. But the key is ruthless simplicity. Just a blob of plain text every day and any searchable codes, resist all urges to add formatting, drawings, tables, categories, etc. It's more important to put as few mental obstacles in the way of writing in it as you can, and even if you write a lot every day, it still isn't all that much text, and is pretty easy to manage and search conventionally.
Not GP, but I have kept a 'logbook' file per year for ten years, with similar contents. I'm sure there's a better way than relying on OS X Spotlight to drum up things from five years ago, but it works well enough for me.
I do time-stamping based note taking too using org-capture in emacs, and I also use tags to put my notes in some broad category. Tag auto-completion using counsel package does not need me to remember the exact tag string if it's used previously.
Also, I'd have thought that with weekend projects you're more likely to find things out by experimentation and reading the documentation than by asking questions. With work or classroom projects, you have to work with a fixed spec, a prescribed technology, all sorts of Best Practices, and deadlines. Weekend projects give you much more freedom in all four respects.
I've worked with both feature branches and feature toggles, and while long lived feature branches can be painful to work with what with all the conflicts, they do have the advantage that problems tend to be uncovered and resolved in development before they hit production.
When feature toggles go wrong, on the other hand, they go wrong in production -- sometimes, as was the case here, with catastrophic results. I've always been nervous about the fact that feature toggles and trunk based development means merging code into main that you know for a fact to be buggy, immature, insufficiently tested and in some cases knowingly broken. If the feature toggles themselves are buggy and don't cleanly separate out your production code from your development code, you're asking for trouble.
This particular case had an additional problem: they were repurposing an existing feature toggle for something else. That's just asking for trouble.