alfozan's comments

alfozan · on May 15, 2024

I've been using the Awair air quality monitor (https://getawair.com) for over 5 years and I'm really happy with it. I've built a simple dashboard to visualize the readings: https://github.com/alfozan/Air-Quality-Dashboard

alfozan · on Jan 30, 2024

Checkout Magic Prompts https://magicprompts.lyzr.ai/

alfozan · on May 15, 2020

This is a cool concept, though it appears to only care about the total number of people being below a set threshold regardless how people interact (i.e. maintain social distancing)

Shameless plug: two friends and I put together a small app that uses security cameras feed to calculate real-time density + estimate "safe" space capacity:

Demo video: https://www.youtube.com/watch?v=RTgxhptePxM Demo site: https://flou.fyi/ (takes a bit to load)

alfozan · on Dec 9, 2019

Who still uses Hadoop anyway?

https://spark.apache.org/ https://www.iguazio.com/data-science-post-hadoop/

buzzkillington · on Dec 9, 2019

Spark is worse because you need Scala as well as regular Java.

I've tried building it for my day job, I would rather have a colonoscopy without sedation.

It's more pleasant and dignified.

itg · on Dec 9, 2019

And then add PySpark on top of that. Couldn't leave my last job fast enough when they decided to use Hadoop/PySpark when the largest incoming files we received were at most a few GBs.

boxy310 · on Dec 9, 2019

I once had a consulting gig where the customer desperately wanted to build a Spark/Scala ML pipeline, for a dataset that was 10 MB. We spent 3 months hammering it together for a flat Python process that would've taken us 2 weeks.

snaky · on Dec 9, 2019

> This find xargs mawk pipeline gets us down to a runtime of about 12 seconds, or about 270MB/sec, which is around 235 times faster than the Hadoop implementation.

https://adamdrake.com/command-line-tools-can-be-235x-faster-...

buzzkillington · on Dec 9, 2019

If you'd sent it off to mechanical Turk it would have been done in an afternoon.

marcinzm · on Dec 9, 2019

I build it in a container for work and didn't find it that difficult to be honest. And Google has plenty of example Dockerfiles that show the steps needed.

The only real system dependencies are Java8, maven and texlive (and Python/R if you build for that). Then it's `make-distribution.sh` with the appropriate flags. Scala and everything else that is needed is downloaded by maven. The resulting directory is self-contained assuming you have java8 runtime on your target machine.

dcolkitt · on Dec 9, 2019

FWIW, the below linked Dockerfile will download, build and install Spark in a single step.

https://gist.github.com/Mister-Meeseeks/1ebf875b6e1262449cbc...

dig1 · on Dec 9, 2019

Sell talk and buzzwords. Either author has no idea that Hadoop is ecosystem and Spark depends on it or deliberately mix Hadoop and Kubernetes, which aren't much related.

And good luck running Spark without Hadoop ;)

smabie · on Dec 9, 2019

Spark doesn’t have a hard dependency on Hadoop. Spark doesn’t have a storage engine, but you don’t necessarily need one.

atomicity · on Dec 9, 2019

Spark still depends on Hadoop for a lot:

- Using Parquet files = parquet-mr which is tied to Hadoop MR https://github.com/apache/spark/tree/master/sql/core/src/mai...

- Using S3 instead of HDFS = Hadoop S3a connector

Even if you don't run HDFS and YARN, you aren't escaping Hadoop. And if some configuration goes wrong, and you'll probably need to look into the Hadoop conf files.

The original comment was about the mass of libraries that Hadoop brings in. Spark isn't a solution that allows you to leave the mess. If you try to dockerize spark, you'll still see that you have 300 MB size images full of JARs that came from wherever.

dig1 · on Dec 9, 2019

Yes, but my comment was about serious, production grade setup.

_ytji · on Dec 9, 2019

Spark and dependencies are just as much of a tire fire and you’ll often want Hadoop as well.

mixmastamyk · on Dec 9, 2019

Sounds like this is the root issue, and not significantly about package managers.

No one is willing to invest the time to untangle the build process and fix compat issues in the software. The project is slowly dying out.

alfozan · on Sept 19, 2019

Alexa rankings are meaningless for services that are mostly accessed via mobile: https://www.quora.com/Does-Alexa-include-app-usage-in-its-st... https://www.quora.com/Does-Alexa-track-traffic-from-iPhones-...

alfozan · on June 15, 2019

I have been using https://getawair.com/ (2nd edition) for over 6 months and I can't recommend it enough.

alfozan · on Aug 9, 2018

Air Quality Monitors Comparison: https://explorables.cmucreatelab.org/explorables/air-quality...

alfozan · on April 9, 2018

Very interesting experiments. I have been thinking for a long time that it's a good idea to do these atomic tests. About test 8, it is actually well known that if you want accuracy vs # of epochs, decrease the batch size -- however it causes time inefficiency. And I found number 7 quite surprising!

alfozan · on Nov 25, 2017

I use https://www.fakespot.com to filter out fake product reviews and I found to match my gut feelings most of the time.