It does seem like the state of the art differs from popular understanding. Not only is mitochondrial DNA straight forward (although not especially useful for forensics as it is maternal), but with specialized extraction it is still possible to recover nuclear DNA, just exceedingly painful to do so.
It is the initial purpose of a microbiome to be at least commensal, in that it is usually prohibitively expensive to maintain a sterile environment so the odds of a true pathogen colonizing a system is greatly reduced if you simply have a crowded space of neutral participants.
Once that's true it does seem there's a lot of host and microbiome interactions we've only begun to explore, but it shouldn't be surprising that co-evolution of the microbiome and host begins to take over as soon as you have one. One great example is short-chain-fatty-acid (SCFA) producing bacteria in the human gut. [1] These seem to be essential, and if there was a general takeaway to improve health, it would be to eat your roughage so they can do their job.
This is also why high alpha-diversity (community richness in particular) is such a dead-ringer for healthy vs diseased states. And frustratingly, is often exactly where the story ends for a lot of observational studies.
Also, in case you are curious, artificially sterile mice (gnotobiotic mice) tend to act differently than other mice, which is pretty odd to be honest, and why the gut-brain axis is a plausible mechanism to research further. [2]
If anyone is interested in a more formal descriptions of these control-loops, with more testable mechanisms, check out the concept of reward-taxis. Here are two neat papers that I think are more closely related than might initially appear:
"Is Human Behavior Just Running and Tumbling?": https://osf.io/preprints/psyarxiv/wzvn9_v1
(This used to be a blog post, but its down, so here's a essentially identical preprint.)
A scale-invariant control-loop such as chemotaxis may still be the root algorithm we use, just adjusted for a dopamine gradient mediated by the prefrontal cortex.
Not an expert, but I have a bit of formal training on Bayesian stuff which handles similar problems.
Usually Gibbs is used when there's no directly straight-forward gradient (or when you are interested in reproducing the distribution itself, rather than a point estimate), but you do have some marginal/conditional likelihoods which are simple to sample from.
Since each visible node depends on each hidden node and each hidden node effects all visible nodes, the gradient ends up being very messy, so its much simpler to use Gibbs sampling to adjust based on marginal likelihoods.
It's been a long time since I took a class like this, but I definitely had a similar experience to the author.
Ideas like fold and map where _never_ mentioned in lisp (to exaggerate, every function had to have the recursive implementation with 5 state variables and then a simpler form for the initial call), at no point did higher-order functions or closures make an appearance while rotating a list by 1 and then 2 positions.
The treatment of Prolog was somehow worse. Often the code only made any sense once you reversed what the lecturer was saying, realizing the arrow meant "X given Y" not "X implies Y", at which point, if you could imagine the variables ran "backwards" (unification was not explained) the outcome might start to seem _possible_. I expect the lecturer was as baffled by their presentation as we were.
In general, it left the rest of the class believing quite strongly that languages other than Java were impossible to use and generally a bad move. I may have been relatively bitter in the course evaluation by the end.
Thie irony is palpable. I had the (misfortune) of only being (mis)taught procedural languages by professors who thought computers were big calculators who could never be understood, but could be bent to your will by writing more code and maybe by getting a weird grad student to help.
Patterns might appear to the enlighted on the zeroth or first instance, but even the mortal must notice them after several goes. The magic of lisp is that if you notice yourself doing anything more than once you can go ahead and abstract it out.
Not everything needs to be lifted to functional valhalla of course, but not factoring out e.g. map and filter requires (imho) a wilful ignorance of the sort that no teacher should countenance. I think it's bad professional practise, bad pedagogy, and a bad time overall. I will die on this hill.
Having been on the other side of this (teaching undergrads), I do get why courses would be structured like this. If you actually try explaining multiple things, lots of students freeze up and absorb nothing. Certainly there’s a few motivated and curious students who are three lectures ahead of you, but if you design the class for them, 60% of students will just fail.
So I get why a professor wouldn’t jump in with maps and folds. First, you need to make students solve a simple problem, then another. At the third problem, they might start to notice a pattern - that’s when you say gee your right there must be a better way to do this, and introduce maps and folds. The top 10% of the class will have been rolling their eyes the whole time, thinking well duh this is so boring. But most students seem to need their hand held through this whole process. And today, of course, most students are probably just having LLMs do their homework and REALLY learn nothing
> I wish you luck with tracking down versions of software used when you're writing papers... especially if you're using multiple conda environments.
How would you do this otherwise? I find `conda list` to be terribly helpful.
As a tool developer for bioinformaticians, I can't imagine trying to work with OS package managers, so that would leave vendoring multiple languages and libraries in a home-grown scheme slightly worse and more brittle than conda.
I also don't think it's realistic to imagine that any single language (and thus language-specific build tools or pkg manager) is sufficient. Since we're still using fortran deep in the guts of many higher level libraries (recent tensor stuff is disrupting this a bit, but it's not like openBLAS isn't still there as a default backend).
> home-grown scheme slightly worse and more brittle than conda
I think you might be surprised as to how long this has been going on (or maybe you already know...). When I started with HPC and bioinformatics, Modules were already well established as a mechanism for keeping track of versioning and multiple libraries and tools. And this was over 20 years ago.
The trick to all of this is to be meticulous in how data and programs are organized. If you're organized, then all of the tracking and trails are easy. It's just soooo easy to be disorganized. This is especially true with non-devs who are trying to use a Conda installed tool. You certainly can be organized and use Conda, but more often than not, for me, tools published with Conda have been a $WORKSFORME situation. If it works, great. If it doesn't... well, good luck trying to figure out what went wrong.
I generally try to keep my dependency trees light and if I need to install a tool, I'll manually install the version I need. If I need multiple versions, modules are still a thing. I generally am hesitant to trust most academic code and pipelines, so blindly installing with Conda is usually my last resort.
I'm far more comfortable with Docker-ized pipelines though. At least then you know when the dev says $WORKSFORME, it will also $WORKFORYOU.
This is true of corals, and they are often considered "colonial" organisms instead of an individual.
That said, I don't think anyone who studies biology is particularly concerned with hard-line definitions, as nature tends to eschew them every chance it has.
I think Pando and corals being considered "modular bodyplans/habits" is perhaps a more useful concept than individual or clone.
I don't know, that sounds like a complex kind of ingest which could be arbitrarily subtle and diverge over time for legal and bureaucratic reasons.
I would kind of appreciate having two formats, since what are the odds they would change together? While there may never be a 3rd format, a DRY importer would imply that the source generating the data is also DRY.
Good point. This may be a case where domain knowledge is helpful.
One of the reasons they brought me in on this project is that besides knowing how to wrangle data, I'm also an experienced pilot. So I had a good intuitive sense of the meaning and purpose of the data.
The part of the data that was identical is the description of the airspace boundaries. Pilots will recognize this as the famous "upside down wedding cake". But it's not just simple circles like a wedding cake. There are all kinds of cutouts and special cases.
Stuff like "From point A, draw an arc to point B with its center at point C. Then track the centerline of the San Seriffe River using the following list of points. Finally, from point D draw a straight line back to point A."
The FAA would be very reluctant to change this, for at least two reasons:
1. Who will provide us the budget to make these changes?
2. Who will take the heat when we break every client of this data?
I see, so it's a procedural language that is well understood by those who fly (not just some semi-structured data or ontology). This is a great example of the advantage of domain experience. Thanks for sharing!
> a procedural language that is well understood by those who fly
That is a great way to describe it!
Of course it is all just rows in a CSV file, but yes, it is a set of instructions for how to generate a map.
In fact the pilot's maps were being drawn long before the computer era. Apparently the first FAA sectional chart was published in 1930! So the data format was derived from what must have been human-readable descriptions of what to plot on the map using a compass and straightedge.
I just remembered a quirk of the Australian airspace data. Sometimes they want you to draw a direct line from point F to point G, but there were two different kinds of straight lines. They may ask for a great circle, a straight path on the surface of the Earth. Or a rhumb line, which looks straight on a Mercator projection but is a curved path on the Earth.
You would often have some of each in the very same boundary description!
For anyone curious about this stuff, I recommend a visit to your local municipal airport and stop by the pilot shop to buy a sectional chart of your area.
Paper charts are great (they're fairly cheap and printed quite nicely in the USA at least) but you can get a good look at these boundaries through online charts.
In such case I think I'd go for an internal-DRYing + copy-on-write approach. That is, two identical classes or entry points, one for each format; internally, they'd share all the common code. Over time, if something changes in one format but not the other, that piece of code gets duplicated and then changed, so the other format retains the original code, which it now owns.
I believe this very method is very common in games - you have similar logic for entities, but some have divergences that could occur in unknown ways after playtesting or future development.
Tho if done haphazardly by someone inexperienced, you might end up with subtle divergences that might look like they're meant to be copies, and debugging them in the future by another developer (without the history or knowledge) can get hard.
Then someone would wonder why there are these two very similar pieces of code, and mistakenly try to DRY it in the hopes of improving it, causing subtle mistakes to get introduced...
I prefer the FP approach of separating data and logic. you could end up with a box of functions (logic) that can be reused by the different "entities".
Last time i checked the FP world is slowly producing ECS frameworks that are needed to make the game performant. They used to be nearly C++ (or OO) exclusive.
Is an entity component system really functional programming? I had the sense that functional programming was more about writing functions that are pure and referentially transparent, and making data immutable normally, which can make code simpler and more modular. It tends to use higher-order functions (recursive operators) more than direct recursion, because it's easier to verify correctness then. Rather than imperative loops and mutation, the meat of the program consists of the composition of many different functions, both small and large.
Entity component systems are pretty cool, as is functional programming, but I don't see the relation.
In addition, object-oriented languages seem well-suited to making entity component systems. There are some tutorials on them in different object-oriented programming languages:
> In such case I think I'd go for an internal-DRYing + copy-on-write approach.
I agree. The primary risk of presented by DRY is tight coupling code which only bears similarities at a surface level. Starting off by explicitly keeping the externa bits separate sounds like a good way to avoid the worst tradeoff.
Nevertheless I still prefer the Write Everything Twice (WET) principle, which means mostly the same thing, but following a clear guideline: postpone all de-duplication efforts until it's either obvious there's shared code (semantics and implementation) in >2 occurrences, and always start by treating separate cases as independent cases.
Inheritance is only good for code reuse, and it’s a trick you only get to use once for each piece of code, so if you use it you need to be absolutely certain that the taxonomy you’re using it to leverage code across is the right one.
All ‘is-a so it gets this code’ models can be trivially modeled as ‘has-a so it gets this code’ patterns, which don’t have that single-use constraint… so the corollary to this rule tends towards ‘never use inheritance’.
Single use? No way that's what multiple inheritance and mixins are for. Inheritance being only for code reuse is explicitly about not creating a taxonomy. No more is-a just, "I need this code here." Hey this thing behaves like a mapping inherit from the MutipleMapping and get all the usual mapping methods for free. Hey this model needs created/updated_at, inherit from ChangeTracking and get those fields and helper methods for free.
Has-a doesn't make sense for code like the literal text reuse. It makes sense for composition and encapsulation.
Edit: I'm now realizing that Python has one of the only sane multiple inheritance implementations. It's no wonder the rest of y'all hate it.
I don't know. I've seen this approach for projects before go bad - people didn't want to DRY because they might diverge. Except they never did. Our 3rd+ scenarios we abstracted.
But what basically ended up happening was we had 2 codebases: 1 for that non-DRY version, and then 1 for everything else. The non-DRY version limped along and no one ever wanted to work on it. The ways it did things were never updated. It was rarely improved. It was kinda left to rot.
> But what basically ended up happening was we had 2 codebases: 1 for that non-DRY version, and then 1 for everything else. The non-DRY version limped along and no one ever wanted to work on it. The ways it did things were never updated. It was rarely improved. It was kinda left to rot.
It sounds to me that you're trying to pin the blame of failing to maintain software on not following DRY, which makes no sense to me.
Advocating against mindlessly following DRY is not the same as advocating for not maintaining your software. Also, DRY does not magically earn you extra maintenance credits. In fact, it sounds to me that the bit of the code you called DRY ended up being easier to maintain because it wasn't forced to pile on abstractions needed to support the non-DRY code. If it was easy, you'd already have done it and you wouldn't be complaining about the special-purpose code you kept separated.
In my experience, once you copy code its bound to diverge, intentional or not. Bugs become features and you can never put the cat back in the bag without a monumental amount of work.
Undoing an abstraction is way easier. Eventually, they all turn bad anyways.
Why wasn't the original implementation swapped for the new one? The unwillingness/inability to do that seems to be most likely the core of the issues here?
The majority of our business was through the 1st implementation. Because of that it was the base we used to refactor into a more abstract solution for further scenarios. It was never deemed "worth it" to transition the 2nd non-DRY version. Why refactor an existing implementation if its working well enough and we could expand to new markets instead?
Yes, why do it? :p I mean, there are pros and cons - costs and benefits. And I can see both scenarios where it is better to spend the time on something else (that has better chance of bringing in money), and cases where it would be the right thing to do the cleanup (maybe original is just about to fall apart, or the new has straight up benefits to the business, or the act of doing it will greatly improve testing/QA in a critical area, etc).
Writing it DRY in the first place would also have costs, including the alternative costs. Would it have been better to take those there and then?
I vaguely recall Fred Brooks, in The Mythical Man-Month, using a somewhat similar situation, but involving the various US states' income tax rules and their relationship to the federal tax, as an example in order to make some sort of point (that point being 'know your data', IIRC.)
In a situation where there is a base model with specific modifications - which is, I feel, how airspace regulation mostly works - then I suspect that a DRY approach would make it easier to inspect and test, so long as it stays that way.
https://www.sciencedirect.com/science/article/pii/B978032399...