The "Fiction, culture, and history" section includes a bunch of works written in various languages, but apparently they're using the English translation. Surely the original is more representative of human cultures as a whole than its translation into English (which is more representative of anglophone cultures).
The core of the project is written in English, which is the current lingua franca of the world. If the purpose of the archive is for future scholars to have a corpus of knowledge from our time, then limiting it to one language maximizes the probability that the entirety of the information stored there will be recoverable.
Including texts in Finnish and Hungarian almost ensures that parts of the archive will be lost 1000 years from now. Even if those languages are alive in 1000 years, the likelihood that interpreting period Finnish and Hungarian from 2020 will be possible is far smaller than the likelihood that interpreting period English from 2020 will be possible.
Being trained as a historian, the idea of throwing away original texts in favour of translations because they're not in « the right language » is hurting my soul.
If a text is lost because it's only written in Hungarian, it means the Hungarian language is lost. And that means not enough texts written in it were kept.
See the problem ?
Keeping as much linguistic data as possible is beneficial. Intentionally curtailing is criminal.
>> Including texts in Finnish and Hungarian almost ensures that parts of the archive will be lost 1000 years from now. Even if those languages are alive in 1000 years, the likelihood that interpreting period Finnish and Hungarian from 2020 will be possible is far smaller than the likelihood that interpreting period English from 2020 will be possible.
> Being trained as a historian, the idea of throwing away original texts in favour of translations because they're not in « the right language » is hurting my soul.
> If a text is lost because it's only written in Hungarian, it means the Hungarian language is lost. And that means not enough texts written in it were kept.
Yeah, exactly. The GP has it bass ackwards. If you have concerns like the GP about intelligibility, then include both the original and the translation. That way even if Finnish and Hungarian go extinct and the archive is recovered, those parallel texts can be used to recover the Finnish and Hungarian languages themselves.
And I'm sure someone who is reading this is questioning the value of even preserving the Finnish and Hungarian languages when you've already captured the "knowledge" in English. All I have to say to that is future linguists will probably be very frustrated with losing two non-Indo-European languages to study, just like we're frustrated that we can't read Etruscan writings anymore.
I thought the point of this was to preserve technical knowledge. It's not poetry. You can translate it from language to language and it should all be isomorphic, because there is something measurable and concrete underlying both expressions.
If readability in 1000s of years of time is the goal, wouldn't you want to add as many (common) languages as possible? And for each book add as many translations of them as possible? That way they will be able to read most/all even if just one language survived - irrespective which one.
The only ones that maintain any significant population that can use them are those that are liturgical languages (e.g. Latin, Hebrew, Classical Greek, Classical Arabic, Sanscrit, ...)
That would be an interesting task for some scholars: "We have a bunch of technology literature here, could you translate it for us into classical greek?"
There is a finite amount of space in which to store data.
Even if diversity of language is a principle that you adopt while creating this, not every single item in the database can be a Rosetta Stone-style snapshot of the state of human language in 2020.
I think there's more to the story than just anglocentric oversight.
English translations are more likely to be unencumbered by copyright, and US copyright is often shorter (or more specific) than in other countries which can apply separately to translations. You can see this on https://babel.hathitrust.org/ where English translations are available but, say, Spanish editions are only available for search—you can't read them.
And because there are a lot more scholars using English, English translations tend to be more numerous and higher quality and more available than the original language. You see this a lot on Project Gutenberg where there's a super polished English translation and a non-existent or crappy original-language transcription.
For example, can you get a Spanish copy of "One Hundred Years Of Solitude" for this without legal issue? Maybe Harper Perennial was willing to cooperate with their English edition and others weren't? I don't think things are as obvious and straightforward as we like them to be.
I've run into all of this while "remastering" old Spanish works. English dominates culture, and not in a bad way. The only reason some new editions/transcriptions of old non-English works exist is because an English-speaking scholar was interested in it or a professor remastered an illegible edition to teach his Spanish language class. And now it's the only version of that work that's not behind a paywall. And you'll have to re-transcribe a messy scan of a book from the 17th century if you want a digital copy in the original language.
Anyways, has Github stated why it's English-only? Did they not have a good reason or are we just guessing that they didn't know that other languages are important (like we, HNers, people of culture, know)?
No problem being mostly anglophone centric and english being used as a common ground, otherwise it would complicate too much everything.
But i have a problem with the lack of representation of great books and cultural achievements/standards that are not anglo-centric at all.
I miss a lot of great works of the human kind.
This is so important that they should have specialized people to curate that list and not just get "the list of great books that are on the top of your head when you only have an average capacity to do so".
Where is Cervantes, James Joyce, Kafka, Rimbaud, Pessoa, Homer, Goethe, Proust, Shelley, Voltaire, etc..?
It doesn't need to be that much inclusive of course, but it would be cool if it was a small window to the broader human soul instead of a subjective perspective that seems to be missing a lot of the common ground that helped to shape the civilization the way it is.