more captrb's comments

captrb · on Jan 24, 2021

Yes. I like that TDD - when I judiciously employ it - makes me think about my interfaces. It’s like a pre-PR that forces me to question whether the intent is clear. It makes me reference guidance from DDD and single-responsibility-ish idioms.

captrb · on Jan 24, 2021

Especially true when the services are all stateless. If there isn’t a conway-esque or scaling advantage to decoupling the deployment... don’t.

I had a fevered dream the other night where it turned out that the bulk of AWS’s electricity consumption was just marshaling and unmarshalling JSON, for no benefit.

jiggawatts · on Jan 24, 2021

I recently decided to benchmark some Azure services for... reasons.

Anyway, along this journey I discovered that it's surprisingly difficult to get a HTTPS JSON RPC call below 3ms latency even on localhost! It's mindboggling how inefficient it actually is to encode every call through a bunch of layers, stuff it into a network stream, undo that on the other end, and then repeat on the way back.

Meanwhile, if you tick the right checkboxes on the infrastructure configuration, then a binary protocol between two Azure VMs can easily achieve a latency as low as 50 microseconds.

hardlianotion · on Jan 24, 2021

A few years ago, my good friend used to say that the first two main duties of a financial quant library are string manipulation and memory allocation.

potta_coffee · on Jan 24, 2021

Exactly!

captrb · on Jan 10, 2021

I used a propane grill and it was very, very effective. I removed the grates and (if I recall correctly) rested the pan on the heat deflection sheets. I left all burners on for about 2 1/2 hours. After cooling, the previous coating was reduced to a thin dust. After blowing it off, the pan was gun metal grey.

I then used a cheap corded drill and inexpensive flap wheels and similar attachments from the hardware store to make it smooth, wearing an N95 mask to protect my lungs.

captrb · on Nov 7, 2020

"Parquet files work well, but streaming is a tad more complex (you need to be able to seek to the end of the file to read the metadata before you can stream the contents)"

I didn't realize that all the metadata in Parquet was stored at the end. That is indeed unfortunate for streaming use cases. Especially sad because columnar dictionary formats can offer great compaction for some data. I've been achieving 20x+ size redutions by converting from CSV to Parquet.

BorisTheBrave · on Nov 7, 2020

Parquet is intended as a file storage format primarily. When streaming, I think you are recommended to use Arrow, which is basically an in-memory Parquet. It supports putting the schema first and streaming a undefined number of rows.

https://arrow.apache.org/docs/python/ipc.html

KptMarchewa · on Nov 7, 2020

Do you think of any solutions for this other than batching? I'm working on something similar just now, and I buffer several million records in memory, then write the file to external storage.

captrb · on Oct 23, 2020

Markup character and namespaces aside, I think what makes JSON easier to read is that it is more explicit about whether a child object is a singleton or a list, where XML requires you to consult the schema for minOccurs and maxOccurs.

captrb · on Oct 11, 2019

https://gsuite.google.com/customers/

I don't think your comment is accurate, based on my first-hand knowledge of large companies that use gsuite, as well as their customer list in the above link.