Hacker Newsnew | past | comments | ask | show | jobs | submit | duckerude's commentslogin

I've worked on a system where ULIDs (not UUIDv7, but similar) were used with a cursor to fetch data in chronological order and then—surprise!—one day records had to be backdated, meaning that either the IDs for those records had to be counterfeited (potentially violating invariants elsewhere) or the fetching had to be made smarter.

You can choose to never make use of that property. But it's tempting.


I made a service using something like a 64 bit wide ULID but there was never a presumption that data is be inserted or updated earlier than the most recent record.

If the domain is modeling something like external events (in my case), and that external timestamp is packed into your primary key, and you support receiving events out of chronological order, then it just follows that you might insert stuff ealrier than you latest record.

You're gonna have problems "backdating" if you mix up time of insertion with when the event you model actually ocurred. Like id you treat those as the same thing when they aren't.


Python managed to do this by not actually checking the types at runtime. If you declare a list[int] return type but you return a list[string] then nothing happens, you're expected to prevent that by running an offline typechecker.

PHP chose to check types at runtime. To check that a value is really an array<int> the runtime could have to loop through the entire array. All the types PHP currently implements are simple and cheap to check. For more elaborate cases you need an offline checker like PHPstan and comment-based type annotations. (PHPstan catches 99% of issues before the runtime gets to it so for my own code I'd prefer the Python approach with its cleaner syntax.)

The runtime checking seems the key difference, not so much the historical strength of the type system. Python's language implementation does very little typechecking itself and PHP's third-party offline typecheckers are respectably advanced.


Precisely. PHP has tools for this too, but lack the syntax. Right now you need to to all typings in comments, and thats just as bad as jsdoc was in 2005.

This could be the way PHP could go, they just need the lexer to handle types, and not do any runtime checking at all.

But i guess that goes against what the php devs want, but it sounds so wasteful, to typecheck the same code time after time even if it passed some sort of initial "compile time step".


The current amount of typechecking might be a net efficiency improvement AFAIK. It provides a hard runtime guarantee that variables are certain types while Python has to check whether something is supported at the last possible moment. But I don't know how much use the optimizer makes of that.


Python is not (usually) run like PHP. Python programs (like most other languages) "run", compared to PHP where you in 99% of all cases instead "execute". (run = the program is running for a long period of time, and execute = run/die immediately).

This subtle difference has huge implications. You could in theory have an "compile step" in. Python, but in PHP you really cant as the program is never "running".

Python built syntax for types / generics etc. Its actually a quite capable typesystem (im not a python developer, but use python on some occasions). Python then has tools for static typechecking that can be run outside execution.

This means that if python would do actual static typechecking on runtime it would be nothing more than wasted cpu cycles.

Thats why python opted for the syntax only, as its basically zero cost. In php land the typechecking is done on EVERY execution, even if the code was unused. (a void functions that has an int param, but gets passed an string, that just discards the parameter). Even worse, a type error thats not executed wont be caught by every execution.

In short PHP typesystem is just runtime checks for primitives / classes and wont catch errors where not executed. Its like the worst of both worlds.


> Right now you need to to all typings in comments

What do you mean by this? The types of variables in PHP >8 are not in comments. Or did I misunderstand something?


I don't think this shows deep thought on his part.

By Stallman's own telling a free Objective-C frontend was an unexpected outcome. Until it came up in practice he thought a proprietary compiler frontend would be legal (https://gitlab.com/gnu-clisp/clisp/blob/dd313099db351c90431c...). So his stance in this email is a reaction to specific incidents, not careful forethought.

And the harms of permissive licensing for compiler frontends seem pretty underwhelming. After Apple moved to LLVM it largely kept releasing free compiler frontends. (But maybe I'd think differently if I e.g. understood GNAT's licensing better.)


I think you're getting fooled:

  >>> from io import StringIO
  >>> for line in StringIO("foo\x85bar\vquux\u2028zoot"): print(repr(line))
  ... 
  'foo\x85bar\x0bquux\u2028zoot'


rustc is only loosely tied to LLVM. Other code generation backends exist in various states of production-readiness. There are also two other compilers, mrustc and GCC-rs.

mrustc is a bootstrap Rust compiler that doesn't implement a borrow checker but can compile valid programs, so it's similar to to your proposed subset. Rust minus verification is still a very large and complex language though, just like C++ is large and complex.

A core language that's as simple to implement as C would have to be very different and many people (I suspect most) would like it less than the Rust that exists.


RFC 3629 says surrogate codepoints are not valid in UTF-8. So if you're decoding/validating UTF-8 it's just another kind of invalid byte sequence like a 0xFF byte or an overlong encoding. AFAIK implementations tend to follow this. (You have to make a choice but you'd have to make that choice regardless for the other kinds of error.)

If you run into this when encoding to UTF-8 then your source data isn't valid Unicode and it depends on what it really is if not proper Unicode. If you can validate at other boundaries then you won't have to deal with it there.


> You have to make a choice but you'd have to make that choice regardless for the other kinds of error.

If you don't actively make a choice then decoding al la WTF-8 comes natural. Anything else is going to need additional branches.


The big problem isn't invalid UTF-8 but invalid UTF-16 (on Windows et al). AIUI Go had nasty bugs around this (https://github.com/golang/go/issues/59971) until it recently adopted WTF-8, an encoding that was actually invented for Rust's OsStr.

WTF-8 has some inconvenient properties. Concatenating two strings requires special handling. Rust's opaque types can patch over this but I bet Go's WTF-8 handling exposes some unintuitive behavior.

There is a desire to add a normal string API to OsStr but the details aren't settled. For example: should it be possible to split an OsStr on an OsStr needle? This can be implemented but it'd require switching to OMG-WTF-8 (https://rust-lang.github.io/rfcs/2295-os-str-pattern.html), an encoding with even more special cases. (I've thrown my own hat into this ring with OsStr::slice_encoded_bytes().)

The current state is pretty sad yeah. If you're OK with losing portability you can use the OsStrExt extension traits.


Yeah, I avoided talking about Windows which isn’t UTF-16 but “int16 string” the same way Unix filenames are int8 strings.

IMO the differences with Windows are such that I’m much more unhappy with WTF-8. There’s a lot that sucks about C++ but at least I can do something like

  #if _WIN32
  using pathchar = wchar_t;
  constexpr pathchar sep = L'\\';
  #else
  using pathchar = char;
  constexpr pathchar sep = '/';
  #endif
  using pathstring = std::basic_string<pathchar>;
Mind you this sucks for a lot of reasons, one big reason being that you’re directly exposed to the differences between path representations on different operating systems. Despite all the ways that this (above) sucks, I still generally prefer it over the approaches of Go or Rust.


Rust has the clap_complete package for its most popular arg parsing library: https://crates.io/crates/clap_complete

ripgrep exposes its (bespoke) shell completion and man page generation through a --generate option: rg --generate=man, rg --generate=complete-bash, etcetera. In xh (clap-based) we provide the same but AFAIK we're the only one to copy that interface.

Symfony (for PHP) provides some kind of runtime completion generation but I don't know the details.


See also: https://internals.rust-lang.org/t/can-the-standard-library-s...

A file descriptor can't be -1 but it's not 100% clear whether POSIX bans other negative numbers. So Rust's stdlib only bans -1 (for a space optimization) while still allowing for e.g. -2.


It means that anything strange that happens next isn't a language bug.

Whether something is a bug or not is sometimes hard to pin down because there's no formal spec. Most of the time it's pretty clear though. Most software doesn't have a formal spec and manages to categorize bugs anyway.


> It means that anything strange that happens next isn't a language bug.

This is even more vague. The language is getting blamed regardless. This makes no sense.


No: the language defined that e.g. a NonZeroU8 can't contain 0, and the only way it could is via illegal means. You don't need a formal proof to describe that.

To try to characterise what any compiler, hypothetical or not, does if you nonetheless produce one (again, via means that aren't valid) isn't meaningful.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: