More

ChrisSD · 2026-01-24T00:01:44 1769212904

`char8_t` is probably one of the more baffling blunders of the standards committee.

jjmarr · 2026-01-24T00:50:16 1769215816

there is no guarantee `char` is 8 bits, nor that it represents text, or even a particular encoding.

If your codebase has those guarantees, go ahead and use it.

hackyhacky · 2026-01-24T07:30:09 1769239809

> there is no guarantee `char` is 8 bits, nor that it represents text, or even a particular encoding.

True, but sizeof(char) is defined to be 1. In section 7.6.2.5:

"The result of sizeof applied to any of the narrow character types is 1"

In fact, char and associated types are the only types in the standard where the size is not implementation-defined.

So the only way that a C++ implementation can conform to the standard and have a char type that is not 8 bits is if the size of a byte is not 8 bits. There are historical systems that meet that constraint but no modern systems that I am aware of.

[1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n49...

int_19h · 2026-01-25T08:09:06 1769328546

That would be any CPU with word-addressing only. Which, granted, is very exotic today, but they do still exist: https://www.analog.com/en/products/adsp1802.html

gpderetta · 2026-01-24T11:46:11 1769255171

Don't some modern DSPs still have 32bit as minimum addressable memory? Or is it a thing of the past?

AnimalMuppet · 2026-01-24T16:31:17 1769272277

If you're on such a system, and you write code that uses char, then perhaps you deserve whatever mess that causes you.

20k · 2026-01-24T01:02:08 1769216528

char8_t also isn't guaranteed to be 8-bits, because sizeof(char) == 1 and sizeof(char8_t) >= 1. On a platform where char is 16 bits, char8_t will be 16 bits as well

The cpp standard explicitly says that it has the same size, typed, signedness and alignment as unsigned char, but its a distinct type. So its pretty useless, and badly named

1718627440 · 2026-01-24T17:42:44 1769276564

Wouldn't it be rather the case that char8_t just wouldn't exist on that platform? At least that's the case with the uintN_t types, they are just not available everywhere. If you want something that is always available you need to use uintN_least_t or uintN_fast_t.

jjmarr · 2026-01-24T06:45:45 1769237145

josefx · 2026-01-24T14:59:20 1769266760

It is pretty consistent. It is part of the C Standard and a feature meant to make string handling better, it would be crazy if it wasn't a complete clusterfuck.

Maxatar · 2026-01-24T03:25:23 1769225123

There's no guarantee char8_t is 8 bits either, it's only guaranteed to be at least 8 bits.

hackyhacky · 2026-01-24T07:31:44 1769239904

> There's no guarantee char8_t is 8 bits either, it's only guaranteed to be at least 8 bits.

Have you read the standard? It says: "The result of sizeof applied to any of the narrow character types is 1." Here, "narrow character types" means char and char8_t. So technically they aren't guaranteed to be 8 bits, but they are guaranteed to be one byte.

adrian_b · 2026-01-24T13:58:48 1769263128

Yes, but the byte is not guaranteed to be 8 bits, because on many ancient computers it wasn't.

The poster to whom you have replied has read correctly the standard.

CyberDildonics · 2026-01-24T06:49:31 1769237371

What platforms have char8_t as more than 8 bits?

marcthe12 · 2026-01-24T09:21:08 1769246468

Well platforms with CHAR_BIT != 8. In c and c++ char and there for byte is atleast 8 bytes not 8 bytes. POSIX does force CHAR_BIT == 8. I think only place is in embeded and that to some DSPs or ASICs like device. So in practice most code will break on those platforms and they are very rare. But they are still technically supported by c and c++ std. Similarly how c still suported non 2's complement arch till 2023.

jhasse · 2026-01-24T09:14:42 1769246082

That's where the standard should come in and say something like "starting with C++26 char is always 1 byte and signed. std::string is always UTF-8" Done, fixed unicode in C++.

But instead we get this mess. I guess it's because there's too much Microsoft in the standard and they are the only ones not having UTF-8 everywhere in Windows yet.

fluoridation · 2026-01-24T11:45:51 1769255151

char is always 1 byte. What it's not always is 1 octet.

jhasse · 2026-01-24T15:04:47 1769267087

you're right. What I meant was that it should always be 8 bit, too.

jstimpfle · 2026-01-24T09:50:02 1769248202

std::string is not UTF-8 and can't be made UTF-8. It's encoding agnostic, its API is in terms of bytes not codepoints.

jhasse · 2026-01-24T15:03:17 1769266997

Of course it can be made UTF-8. Just add a codepoints_size() method and other helpers.

But it isn't really needed anyway: I'm using it for UTF-8 (with helper functions for the 1% cases where I need codepoints) and it works fine. But starting with C++20 it's starting to get annoying because I have to reinterpret_cast to the useless u8 versions.

jstimpfle · 2026-01-25T22:26:16 1769379976

First, because of existing constraints like mutability though direct buffer access, a hypothetical codepoints_size() would require recomputation each time which would be prohibitively expensive, in particular because std::string is virtually unbounded.

Second, there is also no way to be able to guarantee that a string encodes valid UTF-8, it could just be whatever.

You can still just use std::string to store valid encoded UTF-8, you just have to be a little bit careful. And functions like codepoints_size() are pretty fringe -- unless you're not doing specialized Unicode transformations, it's more typical to just treat strings as opaque byte slices in a typical C++ application.

jhasse · 2026-01-25T22:50:24 1769381424

Perfect is the enemy of good. Or do you think the current mess is better?

jstimpfle · 2026-01-27T20:00:26 1769544026

std::string _cannot_ be made "always UTF-8". Is that really so contentious?

You can still use it to contain UTF-8 data. It is commonly done.

jhasse · 2026-01-28T08:20:40 1769588440

I never said always. Just add some new methods for which it has to be UTF-8. All current functions that need an encoding (e.g. text IO) also switch to UTF-8. Of course you could still save arbitrary binary data in it.

dataflow · 2026-01-24T00:54:29 1769216069

How many non-8-bit-char platforms are there with char8_t support, and how many do we expect in the future?

RobotToaster · 2026-01-24T01:29:29 1769218169

Mostly DSPs

LexiMax · 2026-01-24T06:09:35 1769234975

Is there a single esoteric DSP in active use that supports C++20? This is the umpteenth time I've seen DSP's brought up in casual conversations about C/C++ standards, so I did a little digging:

Texas Instruments' compiler seems to be celebrating C++14 support: https://www.ti.com/tool/C6000-CGT

CrossCore Embedded Studio apparently supports C++11 if you pass a switch in requesting it, though this FAQ answer suggests the underlying standard library is still C++03: https://ez.analog.com/dsp/software-and-development-tools/cce...

Everything I've found CodeWarrior related suggests that it is C++03-only: https://community.nxp.com/pwmxy87654/attachments/pwmxy87654/...

Aside from that, from what I can tell, those esoteric architectures are being phased out in lieu of running DSP workloads on Cortex-M, which is just ARM.

I'd love it if someone who was more familiar with DSP workloads would chime in, but it really does seem that trying to be the language for all possible and potential architectures might not be the right play for C++ in 202x.

Besides, it's not like those old standards or compilers are going anywhere.

dspwizard · 2026-01-24T15:29:02 1769268542

Cadence DSPs have C++17 compatible compiler and will be c++20 soon, new CEVA cores also (both are are clang based). TI C7x is still C++14 (C6000 is ancient core, yet still got c++14 support as you mentioned). AFIR Cadence ASIP generator will give you C++17 toolchain and c++20 is on roadmap, but not 100% sure.

But for those devices you use limited subset of language features and you would be better of not linking c++ stdlib and even c stdlib at all (so junior developers don't have space for doing stupid things ;))

pkasting · 2026-01-24T15:48:19 1769269699

Green Hills Software's compiler supports more recent versions of C++ (it uses the EDG frontend) and targets some DSPs.

Back when I worked in the embedded space, chips like ZSP were around that used 16-bit bytes. I am twenty years out of date on that space though.

LexiMax · 2026-01-24T17:26:24 1769275584

How common is it to use Green Hills compilers for those DSP targets? I was under the impression that their bread was buttered by more-familiar-looking embedded targets, and more recently ARM Cortex.

pkasting · 2026-01-24T19:15:15 1769282115

Dunno! My last project there was to add support for one of the TI DSPs, but as I said, that's decades past now.

Anyway, I think there are two takeaways:

1. There probably do exist non-8-bit-byte architectures targeted by compilers that provide support for at-least-somewhat-recent C++ versions

2. Such cases are certainly rare

Where that leaves things, in terms of what the C++ standard should specify, I don't know. IIRC JF Bastien or one of the other Apple folks that's driven things like "twos complement is the only integer representation C++ supports" tried to push for "bytes are 8 bits" and got shot down?

BoredomIsFun · 2026-01-24T14:01:35 1769263295

> but it really does seem that trying to be the language for all possible and potential architectures might not be the right play for C++ in 202x.

Portability was always a selling point of C++. I'd personaly advise those who find it uncomfortable, to choose a different PL, perhaps Rust.

LexiMax · 2026-01-24T15:43:01 1769269381

> Portability was always a selling point of C++.

Judging by the lack of modern C++ in these crufty embedded compilers, maybe modern C++ is throwing too much good effort after bad. C++03 isn't going away, and it's not like these compilers always stuck to the standard anyway in terms of runtime type information, exceptions, and full template support.

Besides, I would argue that the selling point of C++ wasn't portability per se, but the fact that it was largely compatible with existing C codebases. It was embrace, extend, extinguish in language form.

BoredomIsFun · 2026-01-24T15:47:32 1769269652

> Judging by the lack of modern C++ in these crufty embedded compilers,

Being conservative with features and deliberately not implementing them are two different thing. Some embedded compilers go through certification, to be allowed to be used producing mission critical software. Chasing features is prohibitively expensive, for no obvious benefit. I'd bet in 2030s most embedded compiler would support C++ 14 or even 17. Good enough for me.

LexiMax · 2026-01-25T01:57:30 1769306250

> Being conservative with features and deliberately not implementing them are two different thing.

There is no version of the C++ standard that lacks features like exceptions, RTTI, and fully functional templates.

If the compiler isn't implementing all of a particular standard then it's not standard C++. If an implementation has no interest in standard C++, why give those implementations a seat at the table in the first place? Those implementations can continue on with their C++ fork without mandating requirements to anyone else.

BoredomIsFun · 2026-01-25T07:53:28 1769327608

> If the compiler isn't implementing all of a particular standard then it's not standard C++.

C++ have historically been driven by practicalities, and violated standards on regular basis, when it deemed useful.

> Those implementations can continue on with their C++ fork without mandating requirements to anyone else.

Then they will diverge too much, like it happened with countless number of other languages, like Lisp.

LexiMax · 2026-01-27T17:04:31 1769533471

> Then they will diverge too much, like it happened with countless number of other languages, like Lisp.

Forgive me if I am unconvinced that the existence of DSP-friendly dialects of C++ will cause the kinds of language fracturing that befell Lisp.

DSP workloads are relatively rare compared to the other kinds of workloads C++ is tasked with, and even in those instances a lot of DSP work is starting to be done on more traditional architectures like ARM Cortex-M.

dataflow · 2026-01-24T02:32:29 1769221949

Non-8-bit-char DSPs would have char8_t support? Definitely not something I expected, links would be cool.

j16sdiz · 2026-01-24T06:44:56 1769237096

Why not? except it is same as `unsigned char` and can be larger than 8 bit

ISO/IEC 9899:2024 section 7.30

> char8_t which is an unsigned integer type used for 8-bit characters and is the same type as unsigned char;

dataflow · 2026-01-24T16:27:01 1769272021

> Why not?

Because "it supports Unicode" is not an expected use case for a non-8-bit DSP?

Do you have a link to a single one that does support it?

kevin_thibedeau · 2026-01-25T05:23:08 1769318588

The exact size types are never present on platforms that don't support them.

dspwizard · 2026-01-24T15:30:32 1769268632

TI C2000 is one example

dataflow · 2026-01-24T16:30:25 1769272225

Thank you. I assume you're correct, though for some reason I can't find references claiming C++20 being supported with some cursory searches.

Asmod4n · 2026-01-24T08:38:16 1769243896

char on linux arm is unsigned, makes for fun surprises when you only ever dealt with x86 and assumed char to be signed everywhere.

pkasting · 2026-01-24T15:49:59 1769269799

This bit us in Chromium. We at least discussed forcing the compiler to use unsigned char on all platforms; I don't recall if that actually happened.

MaskRay · 2026-01-24T17:45:28 1769276728

I recall that google3 switched to -funsigned-char for x86-64 a long time ago.

pkasting · 2026-01-24T19:18:50 1769282330

A cursory Chromium code search does not find anything outside third_party/ forcing either signed or unsigned char.

I suspect if I dug into the archives, I'd find a discussion on cxx@ with some comments about how doing this would result in some esoteric risk. If I was still on the Chrome team I'd go looking and see if it made sense to reraise the issue now; I know we had at least one stable branch security bug this caused.

kps · 2026-01-24T15:09:58 1769267398

Related: in C at least (C++ standards are tl;dr), type names like `int32_t` are not required to exist. Most uses, in portable code, should be `int_least32_t`, which is required.

ChrisSD · 2026-01-05T22:21:50 1767651710

Huh, I've been using Zed for awhile now and never even noticed it until you mentioned it. Fortunately there's a setting to remove it.

ChrisSD · 2025-12-22T10:45:15 1766400315

Do you have plans for handling C FFI without "unsafe"? Will it require some sort of extension module written in C/C++/Rust?

steveklabnik · 2025-12-22T11:40:46 1766403646

No direct plans. For the immediate future, only the runtime is allowed to call into C.

If this ever becomes a production thing, then I can worry about FFI, and I'll probably just follow what managed languages do here.

tracker1 · 2025-12-22T16:42:03 1766421723

FWIW, I really like the way C# has approached this need... most usage is exposed via attribute declaration/declaration DllImport for P/Invoke. Contrasted with say JNI in Java or even the Go syntax. The only thing that might be a significant improvement would be an array/vector of lookup names for the library on the system given how specific versions are often tagged in Linux vs Windows.

ChrisSD · 2025-12-22T09:43:50 1766396630

It's ok to be strange. It's ok to be bizarre. Be free.

I do not understand the desire for everybody else in the world to act exactly like you. Variety is the spice of life.

GaryBluto · 2025-12-22T09:50:05 1766397005

People can be strange or bizarre if they want too but they have to understand it means some people won't like them, especially if their shtick is deliberately making people uncomfortable and being annoying.

> I do not understand the desire for everybody else in the world to act exactly like you. Variety is the spice of life.

I don't want people to act exactly like me. I greatly appreciate the existence of people different from me with differing points of view and differing nations with differing cultures. This doesn't mean I have to like one specific archetype that I feel acts obnoxiously.

ChrisSD · 2025-12-22T09:51:34 1766397094

Why do you feel uncomfortable? Why do you think anyone is trying to make you feel uncomfortable?

GaryBluto · 2025-12-22T10:00:39 1766397639

The author quite literally mentions that part of their motivation to do things is to make people want them to stop, not to mention the deliberate and conscious choice to write the article in lowercase.

It's also natural to be uncomfortable because of the various references to sexual fetishes throughout the article.

mackeye · 2025-12-22T16:31:01 1766421061

> to make people want them to stop

in the sense of "writing a brainfuck compiler in ed," not in making them so uncomfortable they beg for release. plus, "feminization" is not a fetish, at least in the sense of making rustc say "i love you;" that feels incredibly uncharitable.

DrPimienta · 2025-12-22T21:15:57 1766438157

You're being intentionally obtuse, you know what they meant when they wrote that and you're pretending not to.

mackeye · 2025-12-23T03:26:38 1766460398

i was being charitable, not obtuse. a great number of my closest friends are trans; no element of their experience as i observe it fetishizes the very concept of transition, and those who've spoken to me about it are quite opposed to the "pornification" (as opposed to even sexualization) of trans people (particularly women) by the community itself, and others. if you're at all curious, i thought [0] was pretty informative.

all that to say, trans people (or anyone) shouldn't need to qualify their position (or very lighthearted, energetic opinion piece) with some genericizing disclaimer as to their identity, intents, etc., on the very basis of their identity. live and let live (i.e. fuck off)

[0] https://tr4nbie.substack.com/p/cat-ears-skater-skirts-and-kn...

hofrogs · 2025-12-22T10:36:46 1766399806

I don't see any references to "sexual fetishes" in the article.

GaryBluto · 2025-12-22T10:41:55 1766400115

That's impressive considering the article mentions "feminizing" things in big text the moment you load the page.

hofrogs · 2025-12-22T10:53:18 1766400798

"feminizing" doesn't refer to a sexual fetish, it just means making something more feminine. Do you assume that something being feminine is automatically sexualized and fetishistic?

GaryBluto · 2025-12-22T11:02:23 1766401343

I knew you were going to say this.

"Feminizing" doesn't inherently refer to a sexual fetish but context matters. I invite you to examine the article more in depth, look at the chatroom conversations and then come to your own conclusion.

jynelson · 2025-12-25T16:53:09 1766681589

tbh i think you just hate trans people but you're afraid to say it directly

DrPimienta · 2025-12-22T19:52:02 1766433122

[flagged]

mackeye · 2025-12-22T21:04:48 1766437488

fuck off

DrPimienta · 2025-12-23T15:06:15 1766502375

No. I will continue noticing patterns.

ChrisSD · 2025-12-21T20:07:58 1766347678

It's beside the point of the article but...

> The hardware limitation is specifically TPM 2.0

Almost every even half decent CPU made in the last decade does have TPM 2.0, albeit for some strange reason OEMs used to ship with it disabled. You may be able to turn it on in the bios.

derekdahmer · 2025-12-21T20:51:13 1766350273

My 7700k, a top of the line CPU from 2017, doesn’t support Windows 11 even though it has TPM 2.0. I had to install using rufus.

ChrisSD · 2025-12-21T20:52:19 1766350339

For sure, there are other hardware requirements a 2017 CPU may fail.

lachiflippi · 2025-12-21T22:23:56 1766355836

This is a massive pet peeve of mine as well. As far as I'm aware there's not a single consumer CPU listed in the Windows 11 compatibility list that doesn't have builtin TPM2.0.

ChrisSD · 2025-12-15T12:02:39 1765800159

To be clear, Ubuntu did nothing. This is a third party implementation that Ubuntu decided to ship in their OS.

ChrisSD · 2025-12-02T02:04:40 1764641080

That study only says that most Americans think they interact with AI at least a few times a week (it doesn't say how or if it's intentional). And it also says the vast majority feel they have little or no control over whether AI is used in the lives.

For example, someone getting a google search result containing an AI response is technically interacting with AI but not necessarily making use of its response or even wanting to see it in the first place. Or perhaps someone suspects their insurance premiums were decided by AI (whether that's true or not). Or customer service that requires you go through a chat bot before you get real service.

ChrisSD · 2025-11-30T16:26:38 1764519998

Windows also has uuids. E.g.:

    \\.\Volume{3558506b-6ae4-11eb-8698-806e6f6e6963}\

Someone1234 · 2025-11-30T16:38:57 1764520737

Which can be trivially mapped to directories for aliasing. Just like Linux.

Windows NT and UNIX are much more similar than many people realize; Windows NT just has a giant pile of Dos/Win9x compatibility baked on top hiding how great the core kernel design actually is.

I think this article demonstrates that very well.

anthk · 2025-12-01T10:12:45 1764583965

In the end, if you think about it, the Win32 subsystem running on top of NT OSes it's pretty much the same concept as Wine running on Unix. That's why Wine is not an emulator. And neither is XP emulating old Win32 stuff to run Win9x binaries.

jug · 2025-11-30T19:51:08 1764532268

Yeah, NTFS is quite capable. I mostly blame the Windows UI for being a bit too dumbed down and not advertising the capabilities well.

ChrisSD · 2025-10-28T21:35:40 1761687340

They're using slide rule users as a stand-in for serious mathematician as opposed to people who incidentally use mathematics. It makes some sense in historical context but becomes a bit anachronistic after the invention of electronic calculators.

ChrisSD · 2025-10-19T04:50:44 1760849444

^_^ sucks when you actually need to talk about emoji though :/

layer8 · 2025-10-19T09:40:35 1760866835

Stating the Unicode code points as U+1F4A9 or (D syntax) \U0001F4A9 is a reasonable workaround.

WalterBright · 2025-10-19T07:17:58 1760858278

We discourage posts that aren't relevant in some way to D programming.

One of the reasons I enjoy HackerNews is dang's enlightened and sensible moderation policy.

int_19h · 2025-10-19T07:39:17 1760859557

I think OP meant cases like, "I need to process a string with this emoji in D" etc

LPisGood · 2025-10-21T04:17:54 1761020274

Would you ever need to talk about a specific emoji?

bartvk · 2025-10-19T06:37:50 1760855870

¯\_(ツ)_/¯