Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I worked a bit in computational chemistry (small molecule drug discovery) and currently do some early-stage biotech angel investing.

Chemistry is very much on the edge of what is possible with ML / AI because it requires training on hard data, first-principles QM/physics simulation, and finally actual new science that has to be tested whenever the edge of the data is reached.

Modern computational chemistry marries these techniques, effectively operating a search tree from least to most expensive. Picture a huge multi-dimensional game of minesweeper where the board is the entire chemical space for the problem. And to boot, every step is a huge pain for it's own reasons:

- Data is limited, given the (obvious) huge possibilities of chemical space

- Structure data such as the PDB is still one-off captures (x ray crystallography and cryo em), and often don't even capture the molecule pose as it would appear in biology.

- Data is heavily siloed. Data is a big reason biotechs buy other biotechs.

- All your math and chemistry models may say that a Sulfur will do what you think when it's somewhere, but legitimate, publishable new science happens a lot in the practice of discovery. Like a lot.

For those of you interested, I would check out the work pymol, Schrodinger, Chemical Computing Group, and others put out when they have to problem solve for a specific use case. You'd be surprised how much of it mirrors traditional software development when using AI (P/M fit, knowing your user, operational costs, etc). It's just that getting to the actual product is 10x more expensive, and sometimes you stumble on something genuinely undiscovered.



> and often don't even capture the molecule pose as it would appear in biology.

Mapping between ligands in PDB and cognate ligands as being annotated in UniProt is improving :) my UniProt curator colleagues are working hard on this. Though a lot was made possible by re-annotating all cognate ligands with ChEBI.


Thank you for your work :). It's been a long time since I used the PDB directly but I remember being frustrated about how essential and sparse it could be.

I'm curious about your toolchain. Is it just a community going through and manually annotating, or do you have something that helps pick out obvious things that can be fixed using something computationally predictive? If you have links on the UniProt website I can also just read those. Thanks!


I just wanted to say that I use UniProt nearly every day and it's an absolutely invaluable resource. Didn't expect to find anyone from UniProt here.


I'm working on a startup in a related field and would love to get your input/advice. Your HN profile doesn't have any contact method listed. How can I best reach out to you?


>Structure data such as the PDB is still one-off captures (x ray crystallography and cryo em), and often don't even capture the molecule pose as it would appear in biology.

Cryo-EM structures frequently capture dozens or more discrete states of a complex.


It may do so but there is still a very open question about whether the structure models represent biologically relevant information.


Not really. Unlike crystallography, the molecules are well solvated and flash-frozen. Thermodynamically, the range of possible conformations, once folded, is only so big, and structures can often be validated with some other technique. There's really not been many, if any structures proven to be irrelevant.


"Thermodynamically, the range of possible conformations, once folded, is only so big," <- this is not even remotely true, especially in the context of actual biology. Many proteins undergo constant small transitions between nearby substates, this is known to be important, and doesn't occur in flash frozen proteins (or is greatly reduced). And there are much, much larger conformational changes that can be unlocked in specific conditions- absolutely not going to happen once frozen. There's no real guarantee your sample population of frozen proteins are going to include the full biologically relevent set of conformations.

Like I said, ongoing problem, here is a recent paper addressing this : https://www.nature.com/articles/s42256-020-00290-y and another: https://www.nature.com/articles/s41592-020-0925-6

(my background was in structural biology and I worked next to folks who helped Wah Chiu some ~20 years ago, but my experience in protein dynamics is fairly broad)


Ultimtately, I think we:re talking about different things. From mu perspective, you're making a very different statement here than what you made above. The earlier comment alluded that the conformational dynamics seen in Cryo-EM structures may not be biologically relevant. That is the claim I contested, and very different than the one you make in this comment (which I mostly agree with) which is that EM only captures a small amount of biologically relevant states.

>"Thermodynamically, the range of possible conformations, once folded, is only so big," <- this is not even remotely true, especially in the context of actual biology.

There's a bit of a shoreline paradox at work here. The range is quite small for the folded structure, vs the total possible sampling space. It seems now that you're talking about things at a much finer resolution (which is fine), which didn't seem relevant to your initial suggestion that there's debate around whether EM models are biologically relevant.


Full disclosure: I am not an active researcher and haven't been in a wet lab in a while.

Not disagreeing with your statement. IMO Cryo-EM is a huge step above crystallography and captures way more biologically accurate structures on top of being way easier to do.

To get a bit more nuanced: I think there's still a significant gap between the captured structure and biological action, and often people believe that the structure is the be-all and end-all to these conclusions. A simple example would be the location of water molecules and how the hydrogen bonds interact with protein active sites. In many cases it's not hard to impute, but it can be tricky and often requires outside techniques.

I think the long term solution would be to directly capture structure level precision in motion, similar to looking at a slide from a mouse model or something. AFAIK we're not there yet, even though we can get pretty close by stitching together captured structures with predictive chemistry models.


We should probably use all available techniques to inform ourselves. The reason structural biology is often in limelight is because it's a prerequisite to a number of other techniques. But methods like fluorescent polarization, hydrogen deuterium exchange, FRET, all help fill in the picture as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: