Hacker Newsnew | past | comments | ask | show | jobs | submit | swhan's commentslogin

Tokenizer training doesn't scale as well as model training, so general practice is to train on a subset of the full corpus.


Why is that an issue? Training the tokenizer seems much more straightforward than training the model as it is based on the statistics of the input data. I guess it may take a while for massive datasets, but is calculating the frequencies impossible to be done on a bigger scale?


I’ve trained tokenizers on medium-sized datasets (+5GB of text, although that could be considered small or large depending on who you ask) and have always found training quite fast. As in, it takes a couple minutes.

Maybe if we’re talking terabytes it might not scale as well but so far in my experience training tokenizers has never been an issue. It’s training models that takes ages.


Question out of curiosity: did you happen to have some sort of USB device plugged in when you put it into suspend?

Both pin and phone unlock are supported, as of today.


No USB device was ever plugged into the device.

There were days that you need to enter developer mode to turn on pin unlock, and Bluetooth devices came later on. I remember that pain of needing to enter my very long password in each boot. Horrible UX.

I interacted with a recent (a couple of months) Chrome OS Flex device (a regular old PC laptop of a senior that I know and helped to establish a basic computer for him), and it didn’t work on the first boot. It was a huge problem for him to enter the password, especially a strong one. I set up a pin with his birthday, which never worked. It worked only if you logged off the account (but didn’t turned off the computer). But if you start fresh, then you won’t be able to use the pin. I ended up setting his Google password to his birthday with dots and some letters, e.g. Qw25.12.1935, which Google allowed and which worked for him. He enters his password letter by letter and for him it’s a worse UX than before, when he had Windows XP that just boots and has Chrome (outdated with no option to upgrade) installed. But I convinced him this new way of things is better. At least it loads momentarily, which he likes.

And on top of that, built in Bluetooth adapter doesn’t work on that very laptop to connect his Android smartphone. The Bluetooth module works, but it doesn’t with the software for some reason. Brief googling showed me it’s easier to buy usb Bluetooth module and try with it. Which I did, but haven’t checked that yet, as he lives quite far away from me. As of now he uses the laptop somehow.


Good point. Wasn't aware of the Family Link restrictions. Will see what can be done here.

Disclaimer: I work on ChromeOS.


VSCode + containers + the powerwash feature would enable kids to STEM.

Are flatpaks out of the question? Used to be "Gnome and Chrome" on ~Gentoo.

Shouldn't the ChromiumOS host be running SELinux, if the ARC support requires extended filesystem attributes for `ls -alz` and `ps -aufxz` to work?

Chromium and Chrome appear to be running unconfined? AppArmor for Firefox worked years ago?

https://www.google.com/search?q=chromium+selinux ; chrome_selinux ?

It seems foolish to have SELinux in a guest VM but not the host.


Task: "Reprovision" the default VMs and Containers after "Powerwash" `rm -rf`s everything

`adb shell pm list packages` and `adb install` a list of APKs and CRXs.

Here's chromebook_ansible: https://github.com/seangreathouse/chromebook-ansible/blob/ma...

Systemd-homed is portable. Still, "Reprovision" the broken userspace for the user.

Local k8s like microshift that does container-selinux like RH / Fedora, with Gnome and Waydroid would be cool to have for the kids.

Podman-desktop (~Docker Desktop) does k8s now.

K8s defaults to blocking containers that run as root now, and there's no mounting thee --privileged docket socket w/ k8s either. Gitea + DroneCI/ACT/ci_runner w/ rootless containers. Gvisor is considered good enough for shared server workloads.

Repo2docker + caching is probably close to "kid proof" or "reproducible".

VScode has "devcontainer.json". Scipy stacks ( https://jupyter-docker-stacks.readthedocs.io/en/latest/using... ) and Kaggle/docker-python (Google) take how many GB to run locally for users < 13 who we don't afford cloud shells with SSH (Colab with SSH, JupyterHub (TLJH w/ k8s),) for either.

Task: Learn automated testing, bash, git, and python (for Q12 K12CS STEM)


> It seems foolish to have SELinux in a guest VM but not the host.

- [ ] task manager: optionally show SELinux contexts like `ls -alz`


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: