I’m not sure you understand normalization. A consistent schema is *the product o...

mulmen · on Aug 25, 2021

No2SQL is just physically implementing a normalized logical model on a NoSQL database instead of an RDBMS. That can be done with classes, methods, and attributes (OOP) providing RDBMS functionality like constraints and data types.

In DynamoDB you can use a single table model [1] to construct collections, which are similar to joins. The partition key is like a foreign key in an RDBMS. The sort key is like a combination of the normalized table identifier and primary key.

All of this can be as transparent to the business logic as it is in an RDBMS.

[1]: https://www.alexdebrie.com/posts/dynamodb-single-table/

gregjor · on Aug 25, 2021

That's an optimization known as denormalizing, often seen in BI systems intended for querying (no updates). It's not a different kind of normalization, it's explicitly denormalized to suite DynamoDB.

The relational model is independent of physical implementation (by definition). It's a logical model. Designing for or around the physical implementation (storage or performance characteristics) may be necessary sometimes, but it's also a red flag.

The article you linked repeats the canard about joins: "While convenient, SQL joins are also expensive. They require scanning large portions of multiple tables in your relational database, comparing different values, and returning a result set." Joins may be expensive, but in a well-designed relational schema joins are done on indexed columns, so there's no "scanning large portions of multiple tables" involved, the time complexity is O(log2 N), not exponential. The claim that SQL databases don't scale is rather obviously belied by the widespread (to the point of exclusive) use of large relational databases at scale. That doesn't mean relational databases address every business requirement, but they certainly do scale. When I hear this I assume the person saying it has never used Oracle or worked at a large company.

The problem with NoSQL databases isn't so much (lack of) schemas, it's lack of ACID guarantees. Those may not make much difference in some applications, but to big companies with important databases something like MongoDB with "eventual consistency" doesn't cut it.

I suppose someone could implement something that looks relational in OOP, but it's not a good fit. Imagine a class structure intended to implement customers and orders. How in the OO version do you do even simple things, like enforce uniqueness? How do you distribute this model, or replicate it? How do you accomplish joins without exponential time complexity? How do the objects persist (durability is the D in ACID)? Without ACID properties you don't even have a database, much less a relational database.

jeffbee · on Aug 25, 2021

There are loads of big companies with huge, important databases in eventually-consistent nosql databases. In fact I’d say the bigger the company the more likely it becomes that they’ve got the software development chops to exploit that kind of database.

zbentley · on Aug 25, 2021

I've observed the opposite: larger and higher-skill engineering departments are more likely to have the expertise and perspective to scale RDBMSes and use them in advanced ways.

It's smaller or lower-skill/fad-chasing orgs that end up doubling down on non relational systems for as the primary data store, lured in money-for-free promises of "no schema needed" and "scales automatically".

Nonrelational datastores have a place, of course, but they tend to be much more niche/purpose specific in functional engineering orgs in my experience--to the point that an engineering group's ideas around relational databases (what can they be used for? Should they be the default option for new use cases? How far can they scale?) are an effective proxy for that group's skill level.

jeffbee · on Aug 25, 2021

We all have our own experiences! Mine have been that a mediocre dev team will assume the RDBMS does things that it doesn’t actually do. A good dev team will build their application around its own real requirements and then the database features they need are reduced to durability alone.

gregjor · on Aug 25, 2021

I've seen non-relational databases used for things like logs, streaming data, time series, caching. I'm not saying they don't have uses. You probably know that prior to the release of Oracle in the mid-80s all databases were non-relational. Some of the "NoSQL" ideas are actually old ideas, database models that got replaced by relational databases decades ago.

Lots of big companies use Redis, MongoDB, Cassandra, DynamoDB, etc. when those tools address a business requirement better than a relational database. I have not seen NoSQL databases replacing important relational databases, I have seen NoSQL databases used for specific business requirements.

Of course big companies do a lot of software development, so you're more likely to see a variety of languages and tools in a larger organization. That may or may not validate the tools some people in the organization choose. Many large companies still use COBOL.

garethmcc · on Aug 25, 2021

Non-relational databases match one of the largest use cases that exist; large volume OLTP workloads. Non-relational databases typically can handle far more load on reads and writes than relational databases due to their intrinsic distributed structure, meaning that if you have any kind of transactional workload, for example, a frontend to a busy store (such as Amazon.com), non-relational databases are incredible. You can also back these frontend non-relational stores with relational ones but at much smaller scale in order to handle the OLAP requirement where relational databases shine, allowing for massively flexible queries when needing to pull data for reporting etc.

gregjor · on Aug 26, 2021

No disagreement from me. Relational doesn’t address every business requirement, nor do non-relational databases. They have enough overlap that you have to make an informed decision from experience.

The SQL vs. NoSQL debate got framed early on in terms of better and worse, old vs. new, which is unfortunate because it created ideological camps. Not every supposedly new thing is better than the old thing. Some NoSQL techniques predate relational databases, but no one younger than 50 has experience with what we did before Oracle and DB/2.

Choice of tools should match the requirements, not fads or anecdotes or personal preferences (or ignorance).

jd_mongodb · on Aug 25, 2021

"I have not seen NoSQL databases replacing important relational databases, I have seen NoSQL databases used for specific business requirements."

Look harder :-) Every single NoSQL vendor has examples of substantial "replace' projects.

gregjor · on Aug 26, 2021

I’m sure you’re right. I haven’t seen it. I have seen projects use NoSQL databases and fail when a relational database was a better fit for the requirements. The fault isn’t in the tools, it’s in the decision-making process. It seems common now for teams to choose a “stack” before they define what they plan to build.

coolgeek · on Aug 25, 2021

Every single NoSQL vendor pretends that there is no difference between what you can do and what you should do.

pjmlp · on Aug 25, 2021

I guess because in practice not everyone bothers to normalize, and many RDMS have native support for table inheritance.

jussij · on Aug 25, 2021

While that is very true, it doesn't deter from the fact good relational database design relies on the design being well normalized.

By skipping the normalization steps you end up with a design that is far from optimal.

pjmlp · on Aug 25, 2021

Which is why query optimization engines try their best to work around that, and most RDMS support JSON as native types.

gregjor · on Aug 25, 2021

JSON is a serialization format, not a database schema. We went through this exact same thing with XML, also widely supported as a type in relational databases. Some people didn't understand that XML is not a database and went ahead to write all kinds of nonsense around it. JSON is far less rigorous (and far less bloated) than XML, it's closer to a CSV file. It's a great format for a lot of things but it's not a database.

From the Wikipedia article on JSON, right at the top: "JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a common data format with a diverse range of functionality in data interchange including communication of web applications with servers."

Looks like JSON schemas are in the works, so we'll probably go through the whole XML as a database bullshit all over again.

themulticaster · on Aug 25, 2021

I think you're a little late to the party. JSON Schemas have been a thing for over 10 years now (source: https://datatracker.ietf.org/doc/html/draft-zyp-json-schema-..., see also https://json-schema.org/).

gregjor · on Aug 25, 2021

Maybe. The current draft is here:

https://json-schema.org/draft/2020-12/json-schema-core.html

Still a "draft" standard as far as I can tell.

I have never seen this implemented or used but I'm not deep into the JSON world. This looks to me like "XML as a database" all over again, but correct me if it has some other application. There's nothing wrong with XML or JSON schemas, but they are not equivalent to a relational schema -- they are more (properly) used for validation and standardization, i.e. for a data interchange format both sides need to agree on the schema. An interchange format is not a database, it's a representation of things that might be retrieved from or stored in a database.

jussij · on Aug 25, 2021

If choose to organize your data using a relational model and then choose not to normalize that relational design you're doing it wrong.

Some of the benefits you get by doing proper normalized include:

* Avoids repetitive entries and duplicate data.

* Helps reduce the storage space required.

* Prevents the restructuring of the database to accommodate new requirements.

* Prevents the need for coding changes to accommodate new requirements.

* Increases the speed and flexibility of queries.

With a poorly normalized relational design you are throwing away the essence of the 'relational' model and that 'relational' concept is right there in the name, RDMS.

gregjor · on Aug 26, 2021

Normalizing allows enforcing referential integrity and reduces/eliminates update anomalies. Those seem like theoretical concerns until your data gets corrupted.

jussij · on Aug 26, 2021

Surely since you agree normalization helps to 'reduces/eliminates update anomalies' then that also means it's going to reduce the chance of data corruption?

Also what type of software design and development is not based on theoretical concerns?

Software developers spend years studying theory in the hope they can use that knowledge in their own designs.

Now I don't disagree normalization is a 'theoretical exercise' but it is also a helpful tool/technique that will make your life easier.

gregjor · on Aug 26, 2021

I think you misunderstood my comment. I have heard developers dismiss normalization as a theoretical concern. It’s not. Normalizing prevents update anomalies, for one thing. That’s not the same as “data corruption.”

In my experience most software developers don’t spend years studying theory. They spend hours studying buzzwords and fads.