Nice work. It's an interesting problem but I don't think it's solved yet.
In the video and from my experience I see a bunch of dead users spawning from a single point making a 'pillar of death' and a few users moving that are skipping around, but none in any kind of fluid motion.
Perhaps have multiple spawns, offset by a small random amount.
Not sure how your system works but perhaps have areas in separate channels, and users only subscribe to updates from that area?
I'd rather see a few users in my area that can interact with me quickly than a flood of non-responsive ones.
JTxt. We had a similar problem (but for a different reasons) with the original (1110.n01se.net) when the initial rush from HN happened. After a few minutes we moved the server to an AWS t1.micro instance (our previous hosting company didn't like CPU intensive processes and was killing the server).
However, one of the problems with a t1.micro server on AWS is that the hypervisor will detect heavy load and start throttling the VM using a heavy handed approach which basically just pauses the VM for several seconds. This would cause buffers to build up and when the VM started running again you would have a burst of traffic.
The symptom was that you would see other avatars stop moving for a few seconds and then suddenly jump around and then be back to smooth. All clients would still receive all the updates (WebSockets is reliable over TCP), but they would come in bursts whenever the VM was throttled. So a few minutes after moving to the AWS server we realized what was happening (we use AWS for development and knew to recognize the behavior) so we implemented a few easy fixes to the server (such as change deltas and doing JSON of the data once instead of per client, etc) to bring the CPU usage down to a reasonable level so that it wouldn't get throttled. Combined with the 20 connection cap that resulted in a very smooth and low latency experience for the players that were able to connect.
One thing we are considering with the original is what you suggested where players only get updates for things that are visible to them. However, this means more processing on the server because each client gets a different data set which has to be generated. So it's sort of a tradeoff between decreasing bandwidth and increasing CPU (surprisingly often the case in the real world). If the CPU increase causes throttling then we would start encountering burstiness again. So it might be worth it or it might not.
Anyways, if we have some free time we may play with some ideas. One of the first things is being able to simulate load manually. Doing a mad dash to try and implement improvements while thousands of HN users are trying to connect (we didn't know that a friend had posted it there) is certainly an adrenaline rush but not the most ideal way to do development :-)
Another solution would be to use a larger AWS instance. Right now on the t1.micro with 20 users the CPU stays around 1-5% (which is a safe zone for throttling). Larger instances don't have the throttling problem so in addition to more horse power we would be able to use a much higher percentage of it. However, it's just a spare time project for us and I'm still trying to figure out to explain to the wife why we are paying (mostly bandwidth) for other people to play an online game. :-)
And surprisingly AWS doesn't seem to have a way to accept donations or gift cards for AWS costs. Seems like a logical for encouraging free and open source development projects like this.
Thanks for getting this going and sharing your experience.
It's really fun when it's interactive.
I'd like to hope having 1000's in the same map with fast updates, without being a huge burden to any one person is possible.
Perhaps run a master instance at 1110.n01se.net
Invite others to run a slave instance that is configured to connect to your master, report health, and serve areas as directed.
The master serves the client and assigns a spawn point to the client to other interesting locations when load is high.
It also manages the list of slaves and the areas covered and updates it with clients as it changes.
(But there's a potential bottleneck here. Not sure how to do this yet...
Also how do you subdivide the map so high traffic areas are smaller? I'm guessing Quadtrees so the subdivide data can be very small.
How well can clients be connected to multiple servers?
And is a master/slave relationship needed?)
Well I'll leave it there. It's an interesting problem and I think it's part of what node.js is trying to become.
In the video and from my experience I see a bunch of dead users spawning from a single point making a 'pillar of death' and a few users moving that are skipping around, but none in any kind of fluid motion.
Perhaps have multiple spawns, offset by a small random amount.
Not sure how your system works but perhaps have areas in separate channels, and users only subscribe to updates from that area?
I'd rather see a few users in my area that can interact with me quickly than a flood of non-responsive ones.