More

shousper · on May 1, 2014

lol, try living in Australia.

seanmcdirmid · on May 2, 2014

Why can't we just create an underwater pneumatic pipeline for shipping easily to Australia?

shousper · on Jan 15, 2014

Who cares? If I'm on a train heading home, I don't want my conversation terminated because I get too far away. Just link the rooms to one or more relevant locations so people can find them initially, then let people favourite them or whatever so they can rejoin later without having to be in that location.

Yeah?

EDIT: typo.

shousper · on Jan 12, 2014

Anything is possible.

shousper · on Jan 1, 2014

I wonder how hard it'd be to get JavaScript/ECMAScript onto a better encoding.. Do we actually have a "better" encoding?

Scorponok · on Jan 1, 2014

Depends what you mean by "better". UTF-8 generally ends up using fewer bytes to represent the same string than UTF-16, unless you're using certain characters a lot (e.g. for asian languages), so it's a candidate, but it's not like you could just flip a switch and make all javascript use UTF-8.

oofabz · on Jan 1, 2014

I think the size issue is a red herring. UTF-8 wins some, UTF-16 wins others, but either encoding is acceptable. There is no clear winner here so we should look at other properties.

UTF-8 is more reliable, because mishandling variable-length characters is more obvious. In UTF-16 it's easy to write something that works with the BMP and call it good enough. Even worse, you may not even know it fails above the BMP, because those characters are so rare you might never test with them. But in UTF-8, if you screw up multi-byte characters, any non-ASCII character will trigger the bug, and you will fix your code more quickly.

Also, UTF-8 does not suffer from endianness issues like UTF-16 does. Few people use the BOM and no one likes it. And most importantly, UTF-8 is compatible with ASCII.

lucian1900 · on Jan 2, 2014

There is absolutely no situation in which UTF-16 wins over UTF-8, because of the surrogate pairs required. That makes both encodings variable length.

UTF-32 is probably what you're thinking of.

oofabz · on Jan 2, 2014

I know that both encodings are variable-length. That is the issue I am trying to address.

My point is that in UTF-16 it's too easy to ignore surrogate pairs. Lots of UTF-16 software fails to handle variable-length characters because they are so rare. But in UTF-8 you can't ignore multi-byte characters without obvious bugs. These bugs are noticed and fixed more quickly than UTF-16 surrogate pair bugs. This makes UTF-8 more reliable.

I am not sure why you think I am advocating UTF-16. I said almost nothing good about it.

millstone · on Jan 2, 2014

Bugs in UTF-8 handling of multibyte sequences need not be obvious. Google "CAPEC-80."

UTF-16 has an advantage in that there's fewer failure modes, and fewer ways for a string to be invalid.

edit: As for surrogate pairs, this is an issue, but I think it's overstated. A naïve program may accidentally split a UTF-16 surrogate pair, but that same program is just as liable to accidentally split a decomposed character sequence in UTF-8. You have to deal with those issues regardless of encoding.

lmm · on Jan 2, 2014

> A naïve program may accidentally split a UTF-16 surrogate pair, but that same program is just as liable to accidentally split a decomposed character sequence in UTF-8. You have to deal with those issues regardless of encoding.

The point is that using UTF-8 makes these issues more obvious. Most programmers these days think to test with non-ascii characters. Fewer think to test with astral characters.

Someone · on Jan 2, 2014

Anything in the range U+0800 to U+FFFF takes three bytes per character in UTF-8 and two in UTF-16 (http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings...:

"Therefore if there are more characters in the range U+0000 to U+007F than there are in the range U+0800 to U+FFFF then UTF-8 is more efficient, while if there are fewer then UTF-16 is more efficient. "

That same page also states: "A surprising result is that real-world documents written in languages that use characters only in the high range are still often shorter in UTF-8, due to the extensive use of spaces, digits, newlines, html markup, and embedded English words", but I think the "citation needed]" is added rightfully there (it may be close in many texts, though)

userbinator · on Jan 2, 2014

UTF-8 is variable length in that it can be anywhere from 1 to 4 bytes, while UTF-16 can either be 2 or 4. That makes a UTF-16 decoder/encoder half as complex as a UTF-8 one.

lucian1900 · on Jan 2, 2014

Surrogate pairs are way more complex than anything in UTF-8.

userbinator · on Jan 2, 2014

> Even worse, you may not even know it fails above the BMP, because those characters are so rare you might never test with them.

I don't think this is too relevant because anyone who claims to know UTF-16 should know about the surrogates. And if you are handling mostly Asian text (which is where UTF-16 is more likely to be chosen), then those high characters become a lot more common.

millstone · on Jan 2, 2014

UTF-8 has its own unique issues, like non-shortest forms and invalid code units, that you are even less likely to encounter in the wild. Bugs in handling of these have enabled security exploits in the past.

shousper · on Nov 26, 2013

Seconded. I've never gone to conferences though, but I've always felt I learn more via online sources and/or experience. I've been trying more Twitter recently, but honestly I think I'm giving more than I'm getting from it..

shousper · on Nov 12, 2013

This is probably abundantly clear, but someone has to say it. You need to refactor that beast and split it out into separate files for development. Add a build process to smash it all together for you. End of story.

shousper · on Oct 22, 2013

Here I was hoping for a new, cool and exciting open source facebook clone :(

shousper · on Sept 30, 2013

So, I found this from back in 2011: http://www.tomshardware.com/news/ibm-patent-gpu-accelerated-... However, I couldn't find any commercial or even (active) open source projects on this topic. It seems like something that would be valuable to businesses working with big data, so what's the hold up? Has nobody reached this scale yet? Is it still too expensive? I don't get it.. Maybe I'm overthinking it.

sendob · on Sept 30, 2013

http://wiki.postgresql.org/wiki/PGStrom

bobbles · on Sept 30, 2013

I know folding@home was taking big advantage of CUDA enabled nvidia GPUs

shousper · on Sept 17, 2013

What are blu-rays like for the rising 4K/UHD resolution? If that takes off, could we be looking at new format requirements? Higher bandwidth? Higher capacity? What about 3D? What about beyond that, holographic?

I don't know. This is all speculation, but I would grant Gabe the benefit of the doubt here. Over the next.. 5+ years, who knows what the entertainment world might be like?

shousper · on Sept 4, 2013

Or more simply, you can't help people who don't want to be helped =)