I hope Epic wins this. I would buy from Apple again if they would open their platforms. I think it's ridiculous that when I was working on an app, the app would expire after a week unless I paid them more money for the privilege of keeping an app I wrote on a phone I paid for. Nevermind software other people made that I should be free to choose to run.
You can distribute an app for free if you are part of their developer program - which is something like $99 for all the needed software and tooling (perhaps one of the lowest SDK prices historically?). They will actually subsidize all your bandwidth / storage / distribution costs in that case.
True, that is why Android and feature phones rule in such countries, not iOS, thus making a moot point when customers aren't using Apple devices anyway.
Is it really though? Bing has been consistently worse than Google over the years. Maybe our definition of vendor lock-in needs to change.
Honestly at this point it doesn't even matter how good Bing is - we've been unconsciously trained to work with Google's algorithm in particular and they just have a de facto monopoly on the mental process a person goes through to formulate a search. Everyone's workflow everywhere will be worse and take more time if they voluntarily stop using Google, that's not what I consider a fair competitive landscape.
This reminds me of a meetup I attended last fall, they were talking about the Spectre/Meltdown issues. I asked the presenters if anything in chip manufacturing/verification processes had changed as a result of that and they seemed surprised.
To me, when a software bug shows up in a critical system, that means you actually have a logistics bug. Airplane control software should not be allowed to have bugs. CPUs should not be allowed to have bugs. And OS's should not be allowed to crash (looking at you Microsoft).
When one of these things happens, in my opinion the correct response is _not_ to just release fixes and workarounds and then say "we'll try really hard to not let it happen again." You do that, sure. But the first time you see airplane software malfunction, that means you need to change the way the software is written and released so that the whole class of issues will not ever happen again. You don't stop at a public apology, you don't fire the person that unintentionally wrote the bug. If you have to hire mathematicians to formally prove the critical paths of the software, you do that. If it costs 10x more to release bug-free software, oh well, you do that.
All of these corporate people thinking they can save money by spending less on quality are extremely naive. You can do a financial analysis of this, but they're doing it wrong. Did you ever consider what the cost of a whole generation just not trusting air travel at all would be?
>But the first time you see airplane software malfunction, that means you need to change the way the software is written and released so that the whole class of issues will not ever happen again.
This is pretty good intuition but often a systemic change is not economically feasible. For avionics software at least, a rewrite of the software would likely have to be recertified from scratch before it would be allowed to fly.
We do, however, have several different quality assurance programs in Aerospace that are supposed to address this sort of thing.
Once you identify the root cause, the process found to be deficient is supposed to have a Process Owner who is required to create a preventive and corrective action plan to prevent a recurrence, with more severe problems requiring more robust action plans. Done right, the process owner is supposed to be empowered to make the changes that need to be made.
These systems tend to be evolutions of ISO 9000 as pioneered by Toyota (IIRC). They are highly bureaucratic and soul-sucking, but they are also the least-shitty solution that's been tried.
Are you willing to pay 10x more for the product with that supposed extra reliability (100% vs 99.99966%)? Before you answer, you must remember that perfection cannot be proven ex-ante, it can only be assured.
You should also keep in mind that real systems have fault modes aside from software bugs and hardware glitches, such as unanticipated edge cases and user error, which may dominate your actual failure statistics.
You are correct, but airplane companies already do that for the most part and much much more.
The difference in reliability between normal software and airplane software is so vast that "best practices" from normal software can not be applied to airplane software since that would be gross criminal negligence. To explain, in the 10 years prior to the 737-MAX problems there were 50,000,000 flights and software was not implicated in a single passenger air fatality. The average flight is ~5,000 KM which is ~4-5 hours. So, in ~250,000,000 flight-hours, there were two crashes due to software. A plane takes ~3 minutes to fall from cruising altitude, so we can model this as a downtime of 6 minutes per 250,000,000 hours which gives us an downtime of 1 in 2,500,000,000 or a 99.99999996% uptime (yes, that is 9 9s). In contrast, I think most software people would agree that AWS is high quality. The AWS SLA specifies a 99.99% uptime (1 in 10,000 downtime). So, by this metric, airplane software is 250,000x more reliable than normal high quality software.
The point of this is that the standard for airplanes is almost inconceivably high compared to normal software. To think that they are incompetent or suggest that all they need to do is adopt X or Y common-sense/best-practice is a gross misunderstanding of what is being done and what needs to be done to improve. It would be like someone trying to tell a civil engineer making a 50-story skyscraper that they really need to adopt high quality wood construction techniques from makers of doghouses. To actually improve it, you need to consider practices 250,000x better than "best practices" and go from there.
To put it another way, the solutions are actually really really good, unfortunately the problems are really really really really hard.
Not to detract from your point that aeronautical industry software is reliable (it is), but the 737 MAXes that crashed were all new planes. There wasn't even 24 months between the first delivery of a MAX to the model being grounded.
The issues with the MAX were also clearly preventable and there were multiple failures of the systems (regulators, internal reviews, etc.) that were in place to catch these kinds of issues.
But as you point out, the aeronautical industry has an excellent track record for software reliability, if you evaluate reliability by hull losses. By other metrics, it's a bit more debatable (eg. the integer overflow for Dreamliners such that they need to be restarted at least every 248 days), but still keeps people moving safely.
Yes. I included the MAX because otherwise the software-related fatalities over the last 10 years is 0. If you do just the MAX, the low end in terms of flights is ~200,000 with an average of 3 hours per flight. Using the same time basis above, that is 1 in 6,000,000 or 99.99998% uptime which is 600x better than AWS by my previously used metric. The software of an unconscionable deathtrap is 600x better than extremely high quality server software.
My primary point is that many people look at these failures and incorrectly conclude that the processes in place are objectively terrible and below average. This leads to them discounting the processes in these systems in favor of policies from vastly less reliable systems that they think are quality-focused or "best practices" because they, fairly, think "bad" in a safety-critical context means the same as regular "bad", so regular "amazing" is clearly better. In truth, "unconscionable deathtrap" and "gross criminal negligence" in the airplane world is more of a synonym for "amazing beyond belief" in the rest of the software industry. The correct takeaway is understanding that regular "amazing" is actually orders of magnitude worse than "unconscionable deathtrap" and is thus completely inadequate for the job. As a corollary, if you do not think you are doing "way better than amazing" you are probably not doing an adequate job in these contexts.
To reiterate, the solutions are really really good, unfortunately the problems are really really really really hard.
I do totally agree with your larger points, but these numbers just don’t make any sense, and analysis like this could do unintended damage to your otherwise good points. Would it perhaps be better to cite the industry testing practices and procedures, the volume of testing, the regulations, training, feedback loop, redundancies, and all the other safety efforts behind airline software?
Uptime is not a comparable metric in any way. Aircraft computers often reboot every flight or every day. AWS downtimes don’t typically result in fatalities. The fall time of the 737 MAX before it impacts isn’t ‘downtime’, and simply cannot be used to summarize the reliability of aviation software as a whole. Arriving at 250000x this way makes it a meaningless number, and you didn’t account for the bug in the linked article in your reliability estimate at all.
No, not really. How would a normal software engineer evaluate the processes if stated? There is no frame of reference for what is effective or not if you do not trace to quantitative outcomes. Like, if I said: "The industry uses an autoregressive failure model with 175 billion parameters, 10x more than any previous non-sparse failure model." would that mean anything (it does not, I just replaced "language" with "failure" in the GPT-3 abstract). How can anybody tell what is an effective or ineffective process if they do not trace to an actual outcome? 10x times as many tests and code mean nothing if they test nothing of value. Redundancies are irrelevant if they are completely correlated. Regulations mean nothing if they encode ineffective or meaningless techniques (look at security standards which require antiviruses). One of the only ways to compare processes and not be tricked by fancy words, especially as a non-expert, is to look and compare actual outcomes.
I somewhat agree that the metric I chose is somewhat sloppy, but you can afford to be sloppy when you are comparing things with such disparate outcomes. Sure, maybe we are not comparing a 1 story house to a 50 story skyscraper, it is only a 30 story skyscraper, but that has little impact on the fact that they are fundamentally different and to declare that they are even remotely comparable is a massive category error.
I, however, disagree that "uptime" is a nonsense metric, though there are absolutely better ones. "Uptime" in this context means duration/probability of critical operational failure which is an extremely relevant metric. That AWS does not result in fatalities during critical operational failure has no bearing on whether critical operational failure occurred or not, it just means that it matters less. A valid quibble is that I am using crashes as a proxy for failure which discounts critical software failures that did not cause critical operational failure due to non-software redundancy, but again, the outcomes are so disparate it beggars belief that this would bridge the gap.
As for aircraft computers being rebooted frequently, true. So? I am comparing full system reliability during operation, not individual components. It is not like individual AWS servers run indefinitely; they are rebooted frequently, but the system as a whole stays operational due to redundancy and migration.
The reliability estimate does account for the bug. The bug did not cause a critical operational failure. It could cause a critical operational failure in an extremely unlikely case if it remained undetected and no measures were taken to avoid or correct for it. However, it was detected and countermeasures have been put into place, so the processes in place continue to achieve their intended goal of preventing critical operational failure. So, the outcome-based estimate continues to be accurate.
Just to be clear, an outcome-based estimate is not perfect. By its nature, it only looks at the past, so has no true predictive power. You can not use an outcome-based estimate to predict the effects of process changes. However, it is a relatively unbiased way of evaluating if prior processes were effective which we can use to inform us which processes of the past were actually effective or not and the effects of process changes.
> The reliability estimate does account for the bug. The bug did not cause a critical operational failure.
Yes it did cause operational failure! An airplane turning itself the wrong direction is an outcome, and an extremely serious one.
There was a bug that put people at risk, and you are saying that just because a human caught it and it didn’t crash the plane, it doesn’t count as unreliable?! You’ve just rationalized ignoring all bugs that don’t cause fatal crashes when estimating software reliability. This is making your point weaker, not stronger. You’re arguing that software reliability should only be measured by fatalities. If you really want to go that way, one might conclude that “normal software” like AWS is infinitely more reliable that aviation software, because it never killed anyone. By discounting any bugs that don’t lead to plane crashes, you are undermining your own claim that aviation software is “250,000x” more reliable than other kinds of software.
This kind of analysis- the insistence that reliability is high because death has not occurred often- has played a major role in several high profile accidents. In the shuttle disaster, for one, it was specifically called out that reliability estimates were exaggerated. The Therac-25 incident is another case where engineers failed to understand what happened for a long time due to vastly exaggerated reports of the system’s risks and reliability.
No, uptime still makes zero sense to compare, it is a nonsense metric in this context. Uptime is a measure of continuous operation, and planes aren’t in continuous operation. Simple as that. It’s a metric that does not apply to aircraft, no matter how you spin it.
There are multiple cases of major software failure in military and aviation from systems being in continuous operation for too long. There was a thread just the other day about an airline’s safety procedures specifically requiring in writing a reboot every 30 days due to known bugs.
And you’re ignoring that the 737 MAX did not suffer system operational failure. The system didn’t go down, it kept working. If the system had gone down, those people might have survived. The crash happened precisely because the buggy system kept working. If you want to count the downtime of the system, you maybe ought to count all the flight hours the plane would have flown since the crash, rather than using a bogus concept of only the ratio of fall time to all flight hours to estimate industry reliability. Again, that ratio is completely and utterly meaningless as a proxy for software reliability.
“Downtime” in normal software is not always caused by catastrophic failure, sometimes it’s due to maintenance and upgrades, sometimes it’s due to low performance, sometimes it’s caused by people actively attempting to fix bugs during uptime. None of those things happen during an airplane’s uptime.
> One of the only ways to compare processes and not be tricked by fancy words, especially as a non-expert, is to look and compare actual outcomes.
I’m not arguing against comparing outcomes. I’d agree that looking at outcomes is a good thing, if, and only if, you are actually fair about seeing all outcomes. I’m suggesting that pointing at the more easily verifiable volume of testing effort and safety concern in regards to aviation software, when compared to how much testing and verification happens on ‘normal software’, might adequately persuade someone who didn’t know it that aviation software testing and bugs are taken way more seriously than testing and bugs of web apps are.
This is not about saving money! You can't simply shutdown manufacturing of Intel or other chips that have Spectre/Meltdown issues because that would leave us with essentially no usable CPUs for new computers!
The Spectre/Meltdown issues are deep and architectural, not simple to fix. It's not just a batch of CPUs that's the problem, but all of them.
Besides, if a CPU ships with a bug that can be fixed via a microcode patch, then it would be a tremendous economic waste for all humanity to throw those CPUs out.
Even when new CPUs come out that can be shown not to have Spectre/Meltdown issues, it will take a long time to replace the installed base of those that do because it's not a matter of a little bit of money, but a matter of a great deal of money and opportunity costs.
So microcode patches and software mitigations is all there is. Absolutist attitudes don't help.
“But the first time you see airplane software malfunction, that means you need to change the way the software is written and released so that the whole class of issues will not ever happen again. “
This sounds pretty good in theory but in practice you will just trade the current set of issues against new issues.
In reality, Systems and their interactions are so complex that there is no amount of software design that can avoid bugs and fixing them. We sure can improve but it would be naive to think you can design 100% reliability into something like an airplane.
You are about 100% right on the mark here. There is only one slight problem: people don't want to pay for very high quality software except in a very limited number of fields.
In a way every real software improvement (not fancy language flavor 'x' of the year but entirely new ways of developing software) have always been with the main goals of writing software with fewer bugs faster.
That's the whole reason we have abstractions, compilers, syntax checkers, statical analyzers and so on. In spite of all those, software still has bugs and budgets are still not sufficient to write bug free software.
On another note: this problem is getting worse over time. As tools improved codebases got larger and the number of users multiplied at an astounding rate resulting in many more live instances of bugs popping up. After all, software that contains bugs but that is never run is harmless, only when you run buggy software many times does the price of those bugs really add up.
Somewhere we took a wrong turn and we decided that more of the same is a better way to compete than to have one of each that is perfected and honed until the bugs have been (mostly...) ironed out.
> If you have to hire mathematicians to formally prove the critical paths of the software, you do that. If it costs 10x more to release bug-free software, oh well, you do that.
If you’re trying to keep planes from crashing at all costs, sure. If you’re trying to reduce deaths from travel, that’s a terrible plan. Every family that you price out of commercial air travel and convert over to private auto travel instead has been placed at significantly higher risk as a result of the excessive pursuit of safety.
It’s the reason the FAA allows lap infants under 2 years old. Not because that’s “safe” in absolute terms, but because it’s safer than the likely alternative.
On one hand, I understand your sentiment, on the other hand even with these bugs air travel is as safe as it’s ever been. We’ve reached a point where fewer people die in air travel per year than at any other point in the history of air travel, and that’s before you account for the number of miles travelled. It’s almost ridiculous how safe air travel is on average.
That was true until 737 MAX, which statistically must have been one of the most dangerous planes (or jets at least) in history. Very few miles and 2 complete hull loss incidents very close together. These bugs really do matter. You can have quite a lot of minor issues and get away with it, but when you hit a serious failure like the MAX had, even if only triggered 1 in 10,000 flights ends up with an awful lot of casualties.
MCAS was not a bug. The software behaved excatly as specified.
The issue was the specification itself, which assumed pilots would reliably catch the uncommanded trim down, diagnose it and disable the whole electric trim subsystem within seconds of the problem behavior arising.
That assumption turned out to be massively flawed.
Your comment implicitly - and probably unintentionally - appears to assign part of the blame to the pilots, which I think is a very bad thing to do in this particular case.
Even if my comment implies that there might be pilot error, pilot error doesn't mean pilot blame.
In this case, I'm very much of the opinion that the blame either belongs with the official Boeing training program, which didn't correctly train any 737 pilots to correctly handle this scenario.
Or the blame belongs to the design specification that relied on the assumption pilots would be able to correctly handle this scenario with out even testing that assumption. Or potentially both.
Even if say 10% of pilots could fluke into handling this scenario without the correct training, doesn't mean the other 90% are to blame for not flukking into a correct solution.
I think specification here refers to the type specification of the aircraft. It's not putting the burden on the pilots but rather on the lack of pilot training due to Boeing and airlines not wanting to bear the cost of training pilots to a new aircraft type.
> airlines not wanting to bear the cost of training pilots to a new aircraft type.
This is a perfectly reasonable request by the airlines. Some airlines rely on the operational efficiency of a single aircraft type. It lets them interchange parts and people and not have to worry that the wrong airplane is in the wrong spot.
What is NOT reasonable was Boeing providing an aircraft that actually had MAJOR differences yet claiming it was the same.
And what makes it particularly stupid is no airline that relies on a single airplane type is going to switch from Boeing to Airbus because they would have to migrate their entire fleet en masse. So Boeing had plenty of time to certify the 737 MAX airframe properly.
"Indonesian investigators have determined that design and oversight lapses played a central role in the fatal crash of a Boeing 737 MAX jet in October, according to people familiar with the matter, in what is expected to be the first formal government finding of fault.
The draft conclusions, these people said, also identify a string of pilot errors and maintenance mistakes as causal factors in the fatal plunge of the Boeing Co. plane into the Java Sea, echoing a preliminary report from Indonesia last year."
The MAX problems weren't so much software bugs as specification bugs. The software did exactly what it was told to do by criminally-negligent engineering and management personnel.
Private planes and industrial planes still have an awful safety record.
Most stats also exclude 'unrelated' deaths which happen during a flight (even though there is a good chance the changes in air pressure, stress, lack of medical care, and cramped conditions at least contributed to the death).
Stats also often exclude terrorist or war shootdowns of commercial planes, which are starting to become significant.
> Private planes and industrial planes still have an awful safety record.
I don't know about industrial, but I assume "private" is a combination of 1) private pilots suck and 2) too much catering to client.
Kobe Bryant would be my unfortunate shining example of 2). The pilot either wanted to cater to Kobe or would get fired if he didn't, and so went up in weather that it was stupid to go up in.
As for 1), I've seen far too many sleep-deprived, hungover, drunk, or stoned private airplane pilots. And this is on top of the fact that they probably aren't the most experienced pilots to begin with. What is it about piloting that seems to attract frat boys who never grew up?
What about giving them a high potential but making their actual salary a function of approval rating? In order to maximize how much you make you have to work to unify your constituents.
I don't think you'd want this to be linear either. My feeling is you'd want the pay to stay pretty low anywhere below a 50% support level and climb pretty steeply above that, which discourages the split people down the middle mess the US is in. You'd probably accept that a small portion of people are nutcases so the salary would approach the max at around 85-90% support.
There are some potential exploits that you'd have to try to address but it's a thought.
How do you determine approval rating? The reality is that only a small percentage of the public has any idea of whom any public servant is outside of their leader.
And at that point it's easily gamed, and leads to ineffective short termist planning by the public servant.
This is a terrible idea that would be hugely damaging to the public.
Just a minor annoyance except of course for those not rich enough to own two phones, but I suppose if you didn't want to be oppressed you should have thought of that before you decided to be born poor, right?