Canada is the source of more than 80% of the US's potash, the source of fertilizer. Without that, the entire farming industry and its crops withers and dies. It is not something easily sourced and not something easily replaced, and the crops all die long before it can be sourced or replaced. Your suggestion to close the border is absurd, jingoistic, empty, and meaningless.
OP, I love your passion, but as someone who's been in these trenches for a long time as a visibily queer person, you are a Titanic heading for an iceberg of legal issues in the US, UK,and a few other places. You are putting yourself and your users at an extremely high risk of outing, doxing, and potential legal issues. The fact that you haven't addressed these issues scares me and it should scare you.
Yes, huge +1 to this OP, especially if you're American bc you are heading for a big collision with FOSTA-SESTA and other legal issues that could put you and your users at risk of harm.
GP mentioned NCMEC, those guys don't care if it's generated or not. Anything aligns their preference gets labeled as "nonconsensual child pornography" and they come after your hosting, DNS, payment, visas, so on. Consensus or IRL children aren't their concerns.
Ultimately they'll flashbang you on a Sunday morning, put you on a no-fly list, and circulate warrants, at least until you add heavy geoblocking and start framing them right so they'll be the socially awkward side. They won't ask you for a list of clientele, they can look that up, they're after contents.
No they're not; their roots are in the YMCA and the British society for the suppression of vice and earlier societies dating back to the 1700s. They originated when the church courts were shut down and are a feature of British evangelicalism that Americans turbocharged.
> Why didn't other people, other national archives, other commercial concerns or non-profits join in this work?
I'm very confused by this statement and I don't understand if it comes from you not working in library and information science, your definition of an archives or your opinion on what an acquisition policy should be, but lots of national archives have and continue to archive the Web.
I think pointing just to Wikipedia ignores the growing use adoption and massive impact of Wikidata. perhaps I'm biased because of my field but everything I see indicates the growing and not shrinking power of it, I would categorize it as different than Wikipedia's effects though.
I would happily bite on that; I mostly deal with archives, libraries, museums and how they deal with people and communities. Because of that there is a ton of nuance when it comes to identities (there is a lot of gradation in meaning between "African American" and "Black" or gay and homosexual for example). Things that seem simple are often very complicated and I've spent a good deal of my PhD work working on that (the Homosaurus is the most popular example of this work); and lately I've been working on how to represent identities in a linked and changing way that can still be used by cultural heritage institutions (i.e. simple enough to be linked to SKOS and Wikidata). I feel pretty close on how to represent some aspects of this semantically, thanks to help from a brilliant ontologist friend.
I don't think I'll live up to "brilliant ontologist", but here goes:
First of all, what's the problem? Computing human-written text.
What's the problem domain? Story. In other words: intentionally written text. By that, I mean text that was written to express some arbitrary meaning. This is smaller than the set of all possible written text, because no one intentionally writes anything that is exclusively nonsensical.
So what's my solution? I call it the Story Empathizer.
---
Every time someone writes text, they encode meaning into it. This even happens on accident: try to write something completely random, and there will always be a reason guiding your result. I call this the original Backstory. This original Backstory contains all of the information that is not written down. It's gone forever, lost to history. What if we could dig it up?
Backstory is a powerful tool. To see why, let's consider one of the most frustratingly powerful features of Story: ambiguity. In order to express a unique idea in Story, you don't need an equivalently unique expression! You can write a Story that literally already means some other specific thing, yet somehow your unique meaning still fits! Doesn't that break some mathematical law of compression? We do this all day every day, so there must be something that makes it possible. That thing is Backstory. We are full of them. In a sense, we are even made of them.
We can never get the original Backstory back, but we can do the next best thing: make a new one. How? By reading Story. When we successfully read a Story, we transform it into a new Backstory. That goes somewhere in the brain. We call it knowledge. We call it memory. We call it worldview. I call this process Empathy.
Empathy is a two way street. We can use it to read, and we can use it to write. When two people communicate, they each create their own contextual Backstory. The goal is to make the two Backstories match.
---
So how do we do it with a computer? This is the tricky part. First, we need some fundamental Backstories to read with, and a program that uses Backstory to read. Then we should be able to put them to work, and recursively build something useful.
I envision a diverse library of Backstories. Once we have that, the hardest part will be choosing which Backstory to use, and why. Backstories provide utility, but they come with assumptions. Enough meta-reading, and we should be able to organize this library well enough. The simple ability to choose what assumptions we are computing with will be incredibly useful.
---
So that's all I've got so far. Every time I try to write a real program, my surroundings take over. Software engineering is fraught with assumptions. It's very difficult to set aside the canonical ways that software is made, and those are precisely what I'm trying to reinvent. I'm getting tripped up by the very problem I intend to solve, and the irony is not lost on me.
Any help or insight would be greatly appreciated. I know this idea is pretty out there, but if it works, it will solve NLP, and factor out all software incompatibility.