I can have multiple copies of the same document in the system. My notes are associated with documents via md5 hash, so it'll link them with all copies that are present. At some point I'll get the script to automatically hardlink or symlink duplicates. To be honest though, trying to decide which location best fits a document is quite a good way to engage with the document content -- so even though decisions are often suboptimal, it isn't really the final hierarchy which is the product here, it's what goes on in my brain during the process of trying to organize the papers. As far as the NLP libs question is concerned -- so far this is just an excuse for me to play with SpaCy .... I'm still at quite an early stage and haven't made much progress (my background is more machine vision than NLP - which I haven't touched since I was an undergrad 20 years ago).
Very interesting, thanks for your answer. good point too abt using the rigidity as a forcing function for yourself.
I have quite a big system myself, using bibdesk as my interface to filesystem, and searchability would be very nice indeed. Atm i only use default system tools like spotlight (macos) or mdfind. More custom nlp solution, your post inspires me to think more harder abt that.