Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

CSCI 3210: Modern text encoding and processing

Learn unicode and utf-8.

Unlearn the 1 char = 1 byte concept

Not only encoding/decoding but searching and sorting is also different. We may also cover font rendering, unicode modifiers and emoji. They are so common and fundamental but very few understand them.



Handling text is a minefield. UTF-8 is great but when you get into graphemes, there's basically no way to handle them properly unless you write some code to generate graphene recognition based off the spec which is rather large and continuously updated.

Same for font rendering, there is a reason why harfbuzz is used everywhere. Getting an 80% working renderer is easy but the remaining 20% can take years.

It really "handling text correctly"should be a masters, and I'd sign up in a heartbeat.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: