Show HN: Visualization of Longevity and Mortality

xaa · on Oct 1, 2015

Very cool. I think if you added a few more ways of slicing and dicing this data, you would have a decent chance at getting it accepted to a bioinformatics journal. For example, Oxford Journals Bioinformatics has an "Application Note" type of submission for useful programs that aren't quite original research. I work with one of the editors and he was impressed with the interface and thought it would have a good shot at acceptance.

In particular I think there need to be options added to normalize the y-axis of the graph beyond raw counts to percentage of deaths by age group to control for the fact that the number of deaths per bin is not the same.

It would also be nice to add a little more categorization to the causes of death, in a tree-like structure. For example, all vascular disease, with cerebrovascular disease, CVD, etc, as subtypes.

Also it would be nice to be able to ask questions like, "which states have the most (or least) fraction of deaths by, e.g., CVD"? Do some states have smaller or larger gender gaps in particular diseases?

subcosmos · on Oct 1, 2015

I DO have a much larger viz in the works! It lets you cut down by race, year (1999-2013), and a few other metrics. I wanted to put this out first to see how it performs on various devices. Stay tuned!

I come from an academic background. Made this in my PhD: amass-db.org

But my core passion this days is the project that this viz is hosted at (www.infino.me). I think I can make a much bigger impact with more consumer-facing nonprofit apps than in publishing academic articles.

xaa · on Oct 1, 2015

Cool, I am looking forward to an improved version.

I would be interested in your rationale behind how the public would/could use this kind of data. I think the infino.me/health seems to be an example, pointing out major risk factors behind cancer and CVD. But don't you think this is already common knowledge?

Or is the primary goal to get people to voluntarily share their health and genotype data to get a big dataset for analysis, and maybe eventually provide a sort of personalized risk assessment? I wonder how the FDA views that sort of thing.

subcosmos · on Oct 1, 2015

In short, I built a search engine for my genome. Im letting the rest of the world use it ;)

I did my PhD work in diabetes genetics. It runs in my family, and Im kinda pissed that its killing off a large fraction of us. The eventual goal is to make this into some kind of communal science effort where people can contribute open source analysis pipelines. I need the right kind of organizational structure however to keep the data safe and centralized but still allow open source research.

So, some kind of platform where algorithms can get in, results can get out, but raw data stays locked up. I want a world where this kind of research happens in the open, and not privately in biomedical corporations.

xaa · on Oct 1, 2015

It is a good idea in general, but the idea of infosec for genetic data is a minefield. I deal with similar limitations daily -- one of my areas is large-scale meta-analysis of expression data, which was dandy when that data was collected using microarrays. Now, it's RNA-seq, so a lot of that data indeed stays locked up and you have to apply for special permission to access individual datasets from dbGaP, making large-scale studies difficult/impossible.

But although "algorithms in/results out" sounds good in principle, I think it will be hard to implement in practice. You would have to make algorithms run without network access to prevent a bulk_send_data_to_ip() type of function from being written, but that would hamper complex programs requiring external data.

In general I think the only realistic way forward is to take the 1000 genomes approach of finding people who are willing to take the privacy risks of truly open-sourcing their data. But it sounds like an interesting idea and I hope I'm wrong and your approach turns out to be workable.

subcosmos · on Oct 1, 2015

I like your thoughts here! Indeed I have been hoping to mostly attract people who are willing to be fully open with their data. If I ever get big enough to implement this 'algorithms in/results out' approach I intend to re-engage the whole userbase and have people opt-in to crowdsourced scientific analysis. To prevent data leaking I figured we would start with full code review, and indeed air-gapped analysis.

Its a continually evolving thing. I imagine it would be years before I get to that stage. Depends on if I find funding or university help.

Hit me up at info@infino.me if you'd like to chat more.

varelse · on Oct 1, 2015

Type 2 is somewhat an opt-in disease, is it not?

What pisses me off is how much can be done to prevent it and how little people are willing to do so, especially when their genes can tell them point blank that they really ought to be doing something and doing it right now before the very bad thing happens that will negatively impact them forever.

subcosmos · on Oct 1, 2015

It's more genetic than people realize. In my PhD I focused heavily on the key gene responsible. Your stem cells are literally predisposed to become adipose tissue instead of muscle from the start. It sucks.

varelse · on Oct 1, 2015

So what happens when such a person eats relatively healthily and exercises regularly? Does this do any significant good or should they just accept the inevitable?

subcosmos · on Oct 1, 2015

It always helps. Some people just need to try harder.

Much deeper, and more valuable, would be information on the optimal time of day to eat, exercise, etc, and more information on the specifics of the exercise needed (in terms of intensity or heart rate zones).

TLDR, exercise matters, so does genome, but what we really want is more specifics. This might inform better lifestyle adjustments.

varelse · on Oct 1, 2015

I'm all for optimizing the last 20%, but unless I'm mistaken, in America, ~80% of this is due to a sedentary lifestyle. And that bias arises from knowing people genetically predisposed to type 2 who have close relatives who died young from it, yet persuading them to exercise is like pulling teeth.

Which is to say while I think you could build a billion dollar company out of that last 20% (and you can correct me on the percentages here, really, please do so), it would have to not only provide such information, but also present it in such a way that the recipient got up and did something with it. And that's the hard part, no?

subcosmos · on Oct 1, 2015

As lifespan in the modern age continues to increase, it is interesting to dig into the leading causes of death for perspectives on where to put our research efforts.

Made using the dc.js library. It's interactive! You can click on any of the plots to refilter the data.

adrianN · on Oct 1, 2015

I think it's worthwhile not to put all our efforts into curing a particular cause of death. First because this doesn't necessarily increase quality of live for the elderly, which I think is hugely important. Secondly, because after a certain age all kinds of things break down and the current leading cause of death (eg cancer, heart-disease, alzheimer) is likely just an effect of general accumulation of damage in the body.

subcosmos · on Oct 1, 2015

Agreed completely. My deepest passion is aging biology. What I learned however in studying type II diabetes and obesity, is that the molecular mechanisms are hugely aligned. Its all the same core metabolic genes that the aging biologists focus on in worms and flies. Those are the genes that seem to be related to insulin signaling and diabetes.

My core vision of this project is to better understand this overlap.

deckar01 · on Oct 1, 2015

dc.js seems to support mobile devices[0], why doesn't this visualization?

[0] https://dc-js.github.io/dc.js/

subcosmos · on Oct 1, 2015

I have indeed been chasing an obscure bug. It's not within dc.js itself, but the underlying D3 library.

Sorry mobile users! The best Ive been able to do is show a screenshot for many mobile browsers. Even so, the draw performance is terrible since there is so much data being loaded. While smaller dc visualizations can work on a mobile device, its just too dang slow to show 10's of thousands of records. A laptop however has no problem.

deckar01 · on Oct 1, 2015

Is this visualization's source code hosted anywhere?

subcosmos · on Oct 1, 2015

I will post it eventually. Need to clean things up first.

On the frontend, there really isn't anything new aside from whats already been featured in the many examples for dc.js: https://dc-js.github.io/dc.js/

What people might benefit from is the python scripts Ive used to processes the CDC data so that it is nicely compressed.

kbenson · on Oct 1, 2015

The age of death graph cuts off the values of the y axis on the left, so all I can see is a bunch of "00,000" values. I can only really make relative assessments to the age groups. :/

subcosmos · on Oct 1, 2015

Im tempted just to turn the axis off and make a box showing total counts. Getting things to line up is always a pain. Yuck!

Thanks for reporting the bug.

varelse · on Oct 1, 2015

Not to sound snarky, but uploading my fitbit data IMO just solved the mystery of obesity (at least for me).

I burn ~3200 calories a day and I walk >10 miles a day. This is the 100th percentile of your data and that floors me.

So I have to say that step 1 is getting people to get off their asses and move. It'd be nice if they cut down on red meat and sugar intake while they were at it, but small steps, no?

subcosmos · on Oct 1, 2015

Well, so this project of mine is in beta, which makes you the 22nd Fitbit user. Not the best sample size :) Im on the higher end too. Not sure how we stand up against the 20 million other fitbit users out there.

varelse · on Oct 1, 2015

Cool, one thing though, you're not getting my genome. That's like giving you my fingerprints and we've just barely met. My fitbit data has probably already outed me to you given what I said previously. I'm OK with that because there's nothing but good news there. My genome? Well, it's a mixed bag like anyone else's hence the 20K+ steps per day as an ongoing service patch.

I do wish there were a way to anonymously calculate the very comparative statistics you're generating, but alas I don't see one. Am I missing something?

subcosmos · on Oct 1, 2015

No problemo!

Really what we need is some kind of homeomorphic encryption process that would enable large scale Genome Wide Association Studies (GWAS) to be performed without actually divulging the underlying raw data. Until that happens though, scientists like myself have to contend with the privacy concerns of many.