Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
NLP with spaCy (mlreference.com)
151 points by tylerneylon on March 22, 2018 | hide | past | favorite | 6 comments


This site is super cool! Love the design.

If you make a pull request with your examples in a test file I'll be sure to notify you if we break anything. The trickiest thing will be the outputs from the statistical models. If we release new models, the examples might give different outputs, even if the API hasn't changed.

A quick clarification as well:

> As implied in the spacy docs for the Token class, the is_alpha, is_digit, is_upper, is_lower, is_title, and is_space attributes all delegate their operations to Python's built-in str methods with similar names (such as str.isalpha()).

This is generally true, but the values are cached within the vocabulary, for each word-type. I'm just worried this could confuse someone.


Thanks! Clearly we are fans of spaCy. I appreciate the offer on the pull request and would like to do so. Also, thanks for mentioning the clarification about the is_x attributes - I'll update the pages to improve the accuracy/clarity on that!


This looks like some nice code examples, but it's worth pointing out that Spacy's website has really well-written documentation with plenty of code. For example, their page on generating word vectors:

https://spacy.io/usage/vectors-similarity


Yep! Completely agree that spaCy's docs are great, and have benefited from solid attention.

Our motivation in making this is to build up the site as a destination where you can map a conceptual ML problem to copy-and-pasteable code that immediately works and is readable / learn-fromable.

For the two of us (Mike Sall and I) creating the site, it's basically something we've often wanted in our day-to-day machine learning and data science work.


Totally get it, I hope you didn't take my comment as a criticism.


I agree SpaCy's doc's are great, though I still managed to miss the point about the small models missing real word vectors, which http://mlreference.com/word-vectors-spacy notes at the very top. I wasted some time getting confused on why nlp("cat").vector gave a nice-looking vector while len(nlp.vocab.vectors)==0 etc., I'm guessing I wasn't the only one :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: