Hacker Newsnew | past | comments | ask | show | jobs | submit | iten's commentslogin

Small correction: BLAT is a local alignment tool Jim Kent also wrote. I think his assembler you're referring to is GigAssembler (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC311095/).


I also love that all of their genomic tools are released as a virtualbox virtual machine and their entire site can basically be run locally with personal genomic data. I think it’s called genome in a box.

And if I remember correctly BLAT was so useful because it could be run on machines with less cpu power by loading more data into memory…or was it the other way around?


The author makes a critical mistake when analyzing the BNT/Pfizer clinical trial. Over and over when discussing the results they mention calculating efficacy against "infections" using this data. But this specific clinical trial provided no data about infections. The metric measured was symptomatic COVID-19 disease.

This kind of mistake is understandable, but doesn't really inspire much confidence in a screed about the mistakes others are making in analyzing COVID-19 data.


I recently completed my PhD in vertebrate comparative genomics so this is fun to see.

The single most important factor that needs to be accounted for in analyses like these is the correlation between phylogenetic similarity and the trait in question. In short, closely related species will tend to have similar lifespans, and closely related species will tend to have similar CpG density in any fixed genomic region. So the fact that you can predict lifespan from CpG density with enough parameters is unsurprising. You could almost certainly predict lifespan fairly well from any feature measuring phylogenetic similarity -- I would have liked to see some evidence showing that CpG density in these promoters is somehow uniquely suited for the task.


They did mention they were looking at a couple of long lived examples like rockfish that live 200 years vs killifish that live for 1, so this seems to be variable among fish at least. As for mechanism, they were pretty hand wavey but cited reference 33 and mentioned that longer CpG tracts are thought to be protective against spontaneous methylation. Unmethylated CpG islands in promotors are associate with active transcription, especially found in housekeeping genes that are transcribed constantly by definition. Methylating these islands can silence the gene. Makes sense but needs an experiment to know for sure.


I wonder whether they also looked at body size. Larger species tend to live longer, so it could just be that their regions are just associated with growth.


They did not.


We use KT's in-memory database in a scientific computing application I work on (to store a graph 30-100GB in size accessed by 100-1000 workers). The performance is very impressive, and it's been reliable for our use case. But I don't think I'd recommend anyone use it in 2019 -- having an active community (and someone to actively maintain the software!) is too important. A small performance gain over alternatives like Redis is probably not worth the tradeoff of using software that is (sadly) abandoned.

edit: That's not to say it is really in need of much development -- it's pretty feature-complete. But it's undergone a bit of software rot: the Debian package, for example, ships header files which fail to compile under many recent gcc versions. And the network effect is just not there. If you run into some database slowness, searching the Web for "Redis performance problem" might get you some ideas. Searching for "kyototycoon performance problem" will get you nowhere.


Seems like it should be put up on GitHub where people can casually fix issues as needed.


The source is available, why does it need to be on GitHub specifically?

But hey, why not, done deal:

https://github.com/cloudflare/kyotocabinet


To be fair, to my knowledge, LASTZ, BLAST, or BLAT don't treat IUPAC ambiguous bases (K, Y, R, etc.) in the way the OP was looking for. That's not to blame the tools, since they have a very good reason not to (they build an index on the target first, and treating ambiguous bases properly would increase the size of the index).

That said, I wonder if grep wouldn't be much faster, since this program is only looking for exact matches, which are easily transformed into a regex by replacing the ambiguous nucleotides with something like (A|T|C).


What proportion of bases in your query were ambiguous? If it was a fairly low percentage (~10-20%?), I think you could probably get away with using LASTZ, treating all ambiguous bases as completely ambiguous (--ambiguous=iupac), then write a short script to go through the alignments afterward and filter out ones that don't have a reference base that matches the IUPAC character in the query.

But that's the lazy approach :) Good article.


Thanks. :)

Sometimes reinventing a wheel (or at the very least a spoke or two) can be fun and provide some neat results.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: