More

izendejas · on Dec 2, 2022

DataGrail | Software Engineers, ML Engineers, Product Managers | Full-Time, Remote

Help us solve one of the most important problems of the decade: privacy.

Now more than ever individuals need to be in control of their privacy and identity--that is our vision.

We're a fast-growing, Series C startup (raised recently despite the downturn), working with some of the best brands who get this, and are built on trust.

For the ML role, you'll be working with me, DG co-founder, to classify data in a privacy-preserving way, amongst other cool challenges. The ideal candidate has strong software engineering + well-rounded ml chops, and can build solutions end-to-end. Bonus points if you're entrepreneurial. We've had a couple of people start companies, and more in the pipeline.

Reach out directly hnUsername.substring(0,2) at datagrail.io.

princevegeta89 · on Dec 2, 2022

hnUsername.substring(0,2)

No need for this nonsense here. Just paste the stuff directly.

izendejas · on Dec 2, 2022

Appreciate the feedback. I'm trying to avoid more spam, is all.

anand098 · on Dec 5, 2022

and that mail address (hn at ..) bounce as well.. If you intend to add username as well then the expression should have been different..

izendejas · on Aug 3, 2020

DataGrail | Senior Software Engineer (Frontend) | Onsite in San Francisco (once opened) or Fully Remote

We're building a data privacy platform that ensures data privacy doesn't suck.

As more regions implement privacy laws, most companies are challenged to comply. Our platform streamlines this entire process while enabling companies to give their users more control and transparency over their data.

We are currently powering privacy for many well-known orgs, and have continued with very healthy growth despite the times.

We're looking for someone with 5+ years of experience to help shape our frontend architecture and build great experiences.

Tech Stack: React, Styled Components, Jest, Ruby/Rails, Postgres, and more

If you'd like to learn more, email me at (let domain = "datagrail.io"; "iz"+"@"+domain)

izendejas · on March 23, 2020

DataGrail | Senior / Software Engineer | San Francisco, CA | Onsite / Remote

https://datagrail.io/careers

Help us build data privacy as a service.

Data privacy / data protection should be a no-brainer, but the reality is that many organizations were not built with privacy in mind (the HN crowd knows all too well), so we're looking to make this not suck. In doing so, we'll give consumers better control over their data without all the needless hurdles one currently confronts.

Our stack (and growing):

* aws (codedeploy, rds, athena, etc) * ruby/rails, react * a bit of python, go and likely to grow * integrations with redshift, mysql, oracle db, snowflake, s3 data lakes, and more.

If interested, email me (co-founder & cto): iz|at|datagrail.io

izendejas · on Feb 25, 2020

That's because, ML and operations-research problems can be simplified to set of optimization problems and the underlying math and statistics are all very similar if not identical in some cases.

And the input matters, a lot. So the differentiating factor isn't the models, it's the data and companies like Google figured it out a long time ago.

In short, find interesting problems, then the solutions -- not the other way around.

killjoywashere · on Feb 25, 2020

"The data" means more than pure computer science people want to admit. In any "advanced" application, that means annotators. Radiologists drawing circles around cancer, attorneys labeling contract clauses as unacceptable, drivers labeling stop signs, etc.

ML is a mining problem. Digitizers are the miners. Annotators are the refiners.

joe_the_user · on Feb 25, 2020

Basically, the system is massively ad-hoc and driven by this large scale annotation, training and testing.

The big question here is, what happens when the world changes next year? You rebuild the application. I know there are companies that advertise doing continuous updating of deep learning models but it seems like calculating total costs and total benefits is going to be hard here.

killjoywashere · on Feb 25, 2020

Sometimes the mine makes money, sometimes it doesn't make sense to run the mine.

moandcompany · on Feb 25, 2020

To extend the mining metaphor, and relate back to the original articles:

People and organizations are chasing what they believe, or are told to believe, is pay dirt.

Many unfamiliar investors have rushed in, possibly fearing missing out, and fund many of the prospectors, yet many of the prospectors and investors aren't really aware of the costs of running a mine, nor the practices required to run them efficiently.

It turns out that there's more aspects to the value creation process than dig/refine/polish (data/train/predict), especially when usefulness in application matters and there are finite resources available for digging.

Companies selling shovels are some of the primary beneficiaries of this, by selling shovels (i.e. renting compute) funded by the malinvestment.

Additional beneficiaries are the refiners (training experts) that are able to charge steep labor premiums, however organizations are starting to figure out that their refiners are expensive to keep idle and often operate the mines poorly in terms of throughput/cost-effectiveness/repeatability/application (see the various threads on "Data Engineers")

streetcat1 · on Feb 25, 2020

This is correct, however, the distinction between labeling and training is artificial, and probably arises from the fact that ML came from academia, where it was not part of the business process.

I.e. a modern ML system should just plug into the business process from day 0, where the ML task should be performed by human and recorded by the machine.

After a while, the machine would train on this recorded data, and start replacing the humans.

Rinse and repeat.

killjoywashere · on Feb 25, 2020

> a modern ML system should just plug into the business process from day 0, where the ML task should be performed by human and recorded by the machine.

Ah, this is a typical thing I hear people in the Valley say: just push it all ... somewhere. No.

If we digitized all microscopy slides, it would require YouTube-scale storage several times over. People think genomics is big. People think reconnaissance imaging is big. They're big, but there's only so much of them.

IF it were digitized, there would be far more pathology whole slide imaging being generated every day. I did some estimates at one point and had to throw a couple orders of magnitude into the genomics data to even make it competitive at enterprise scale.

And keep in mind, we're talking clinical medicine. We want the data now. We're looking at the slides while the glue is still wet. You don't have the bandwidth, no one has the bandwidth, to do some of this stuff they way you propose and maintain the current "business process" of clinical medicine.

Building models and iterating, the old fashioned way, is the only way it makes sense.

6510 · on Feb 25, 2020

Funny, we all thought computers were fast. Turns out its nowhere what we need.

Piskvorrr · on Feb 26, 2020

They're fast, sure. But not very efficient in certain problem domains, specificially where humans are efficient (for reasons that are IMHO historical, not innate).

divbzero · on Feb 25, 2020

This is spot on. Hence the open sourcing of ML code while keeping an iron grip on data.

Erlich_Bachman · on Feb 25, 2020

> And the input matters, a lot. So the differentiating factor isn't the models, it's the data and companies like Google figured it out a long time ago.

The models are likely also a differentiating factor in a sense that there are models that perform much better than others, to a point of completely new functionality. But also all of these models are basically open source currently... So they can't by definition be differentiating between different companies, because all of the companies generally have access to all of the algorithms. At leat to all of the types of algorithms.

izendejas · on Feb 4, 2020

DataGrail | Senior / Software Engineer | San Francisco, CA | $130k - $200k + equity | Onsite

Help us build a data privacy platform.

Data privacy and data protection should be a no-brainer, but the reality is that many organizations are ill-prepared to comply with privacy laws, so we're helping to streamline the entire process. In doing so, we will help give users more control over their personal data.

We integrate with many saas solutions, data lakes (s3 + json/parquet/orc/etc) and data warehouses (redshift, snowflake, etc) to easily access, delete and/or anonymize data.

We have very healthy growth.

We're currently looking to hire someone with 5+ years experience, and preferably a seasoned frontend engineer, or a backend engineer with experience on various backend systems.

Our stack (and growing):

* aws (codedeploy, rds, athena, etc) * ruby/rails, react, python, go * postgres, redis, s3

Experience with data engineering or data science a plus.

If interested, email me (co-founder & cto): iz|at|datagrail.io

izendejas · on Nov 6, 2019

I'd love to recommend pingdom, or a service like it. I'm in no way affiliated with them, just a very happy customer and one of those products where I'm jelly I didn't come up with the idea. It integrates very nicely with pagerduty and slack/sms, etc.

It's just extra redundancy in case something like cloudwatch (which you should use -- with ELBs) also goes down.

izendejas · on Sept 3, 2019

DataGrail | Senior / Software Engineer | San Francisco, CA | $130k - $180k + equity | Onsite

Help us build a data privacy platform.

Data privacy and data protection should be a no-brainer, but the reality is that many organizations are ill-prepared to give consumers more transparency and control over their personal data.

We're building a SaaS that integrates with other services, data lakes (s3 + json/parquet/orc/etc) and data warehouses (redshift, snowflake, etc) to easily access, delete and/or anonymize data.. and more.

We have paying customers and a very healthy sales pipeline.

Our stack (and growing): * aws (codepipeline, rds, kinesis, athena, etc)

* ruby/rails, react, python, go

* postgres, redis, s3

* experience with data engineering or data science a plus.

Join our stellar, diverse team including two engineers who found us here on HN -- email me (co-founder & cto): iz|at|datagrail.io

izendejas · on July 1, 2019

DataGrail | Senior / Software Engineer | San Francisco, CA | $130k - $180k + equity | Onsite

Help us build a data privacy platform.

Data privacy and data protection should be a no-brainer, but the reality is that many organizations are ill-prepared to comply with privacy laws, so we're helping to streamline the entire process. In doing so, we will help give users more control over their personal data.

We integrate with many saas solutions, data lakes (s3 + json/parquet/orc/etc) and data warehouses (redshift, snowflake, etc) to easily access, delete and/or anonymize data.

We have paying customers and have plenty of runway.

Our stack (and growing):

* aws (codedeploy, rds, athena, etc) * ruby/rails, react, python, go * postgres, redis, s3

Experience with data engineering or data science a plus.

If interested, email me (co-founder & cto): iz|at|datagrail.io

izendejas · on June 26, 2019

This. Everyone's missing the point of a search engine.

We're talking about billions of pages and if not ranked (authority is a good hueristic), filtered (de-ranked), etc then good luck finding valuable information because everyone is gaming the systems to improve their ranking.

I think this is part of the reason you get a lot of fake news on social media. It's a constant stream of information (a new dimension of time has been added to the ranking, basically) that needs to be ranked and with humans in the loop, there's no way to do this very easily without filtering for noise and outright malicious content.

basch · on June 26, 2019

i disagree that there isnt a way, just that nobodies tried a good one yet.

take reddit for example. it should be very easy to establish a few voters who make "good" decisions, and then extrapolate their good decisions based on people with similar voting patterns. it would combine a million monkeys with typewriters with expert meritocracy. you want different sorting, sort by different experts until you get the results you want. it seems every platform is too busy fighting noise to focus on amplifying signal, or are focused on teaching machines to do the entire task, instead of using machines to multiply the efficiency of people with taste who can make a good judgement call with regard to whether something is novel or or pseudo-intellectual. Not to pick on them, but I would suspect an expert to be better at deranking aeon/brainpickings type clickbait than an eruditelike ai, if only because humans can still more easily determine if someone is making an actual worthwhile point, vs repeating a platitude, conventional wisdom, or something hollow.

abathur · on June 26, 2019

It should, but if anyone knows who these kingmakers are, it's still probably just a matter of time before they accrue enough power for it to be worth someone's time to at least try to track them down and manipulate their decisions (bribe, blackmail, sponsor, send free trials, target with marketing/propaganda campaigns, etc.)

basch · on June 26, 2019

Who says it even has the same kingmakers every day? Slashdot solved that part of metamoderation two decades ago.

A person might be an expert in cars but not horses. A car expert might be superseded . The seed data creators could be a fluid thing.

cthaeh · on June 26, 2019

This is a technocracy. Noone wants this but Hacker News.

basch · on June 27, 2019

Let's say you have a subreddit like /r/cooking. You think exposing a control in the user agent (browser, app, ui) that let's you sort recipe results by lay democracy, professional chefs, or restaurant critics taste is a technocracy?

Are consumer reports and wirecutter less valuable than Walmarts best sellers? Is techmeme.com worse than Hackernews by virtue of being a small cabal of voters? Should I dismiss longform.org and aldaily as elitist because they aren't determining priority solely from the larger populations preferences. Is Facebooks news algorithm better because it uses my friends to suggest content?

Is it a technocracy that metacritic and rotten tomatoes show both user and critic score? I'm proposing an additional algorithm that compares critic score with user score to find like voters and extrapolate how a critic would score a movie they have never seen. I think that would be useful without diminishing the other true scores. I would find it useful to be able to choose my own set of favorite letterboxd or redef voters and see results it predicts they would recommend, despite them never having actually voted on a movie or article. Instead of seeding a movie recommendation algorithm with my thoughts, I could input others already well documented opinions to speed up the process.

This idea would work better if people voted without seeing each others votes until after they vote. It might be hard to extrapolate Roger Ebert's preferences if voters formed their opinions of movies based on his reviews. You'd end up with a false positive that mimics his past but poorly predicts his future.

luxuryballs · on June 26, 2019

The reverse is a problem too, Google filtering things out based on their political leanings in an attempt to shape public opinion.

Nasrudith · on June 26, 2019

I haven't seen any examples which were anything other than runaway persecution complexes of those who found their world view was less popular than they believed - which were greeted with exasperation by testifying engineers who had to explain how absurdly unscaleable it would be to do it manually.

aslaan · on June 27, 2019

https://gohmert.house.gov/news/documentsingle.aspx?DocumentI...

izendejas · on June 3, 2019

DataGrail | Senior / Software Engineer | San Francisco, CA | $130k - $180k + 0.1% - %0.5 | ONSITE

Help us build a data privacy platform.

Data privacy and data protection should be a no-brainer, but the reality is that many organizations are ill-prepared to comply with privacy laws, so we're helping to streamline the entire process. In doing so, we believe this will help give users more control over their personal data.

We integrate with many saas solutions, data lakes (s3 + json/parquet/orc/etc) and data warehouses (redshift, snowflake, etc) to easily access, delete and/or anonymize data.

We have paying customers and have raised over $4m (with plenty of runway).

Our stack (and growing):

* aws (codedeploy, rds, athena, etc)

* ruby/rails, react, go

* postgres, redis, s3

Experience with data engineering or data science a plus.

If interested, email me (co-founder & cto): iz|at|datagrail.io