I’m not sure you understand normalization. A consistent schema is the product of proper normalization, not the other way around.
Normalization is a logical process, part of database design. It’s not a by-product of a schema or design pattern, nor is it an implementation detail.
OO is ultimately based on pointers, even when hidden as inheritance, or called something else (references, composition). That form of organizing data is not relational pretty much by definition. You can’t” just do that [normalization] with OO patterns.” I’m not even sure what that means.
No2SQL is just physically implementing a normalized logical model on a NoSQL database instead of an RDBMS. That can be done with classes, methods, and attributes (OOP) providing RDBMS functionality like constraints and data types.
In DynamoDB you can use a single table model [1] to construct collections, which are similar to joins. The partition key is like a foreign key in an RDBMS. The sort key is like a combination of the normalized table identifier and primary key.
All of this can be as transparent to the business logic as it is in an RDBMS.
That's an optimization known as denormalizing, often seen in BI systems intended for querying (no updates). It's not a different kind of normalization, it's explicitly denormalized to suite DynamoDB.
The relational model is independent of physical implementation (by definition). It's a logical model. Designing for or around the physical implementation (storage or performance characteristics) may be necessary sometimes, but it's also a red flag.
The article you linked repeats the canard about joins: "While convenient, SQL joins are also expensive. They require scanning large portions of multiple tables in your relational database, comparing different values, and returning a result set." Joins may be expensive, but in a well-designed relational schema joins are done on indexed columns, so there's no "scanning large portions of multiple tables" involved, the time complexity is O(log2 N), not exponential. The claim that SQL databases don't scale is rather obviously belied by the widespread (to the point of exclusive) use of large relational databases at scale. That doesn't mean relational databases address every business requirement, but they certainly do scale. When I hear this I assume the person saying it has never used Oracle or worked at a large company.
The problem with NoSQL databases isn't so much (lack of) schemas, it's lack of ACID guarantees. Those may not make much difference in some applications, but to big companies with important databases something like MongoDB with "eventual consistency" doesn't cut it.
I suppose someone could implement something that looks relational in OOP, but it's not a good fit. Imagine a class structure intended to implement customers and orders. How in the OO version do you do even simple things, like enforce uniqueness? How do you distribute this model, or replicate it? How do you accomplish joins without exponential time complexity? How do the objects persist (durability is the D in ACID)? Without ACID properties you don't even have a database, much less a relational database.
There are loads of big companies with huge, important databases in eventually-consistent nosql databases. In fact I’d say the bigger the company the more likely it becomes that they’ve got the software development chops to exploit that kind of database.
I've observed the opposite: larger and higher-skill engineering departments are more likely to have the expertise and perspective to scale RDBMSes and use them in advanced ways.
It's smaller or lower-skill/fad-chasing orgs that end up doubling down on non relational systems for as the primary data store, lured in money-for-free promises of "no schema needed" and "scales automatically".
Nonrelational datastores have a place, of course, but they tend to be much more niche/purpose specific in functional engineering orgs in my experience--to the point that an engineering group's ideas around relational databases (what can they be used for? Should they be the default option for new use cases? How far can they scale?) are an effective proxy for that group's skill level.
We all have our own experiences! Mine have been that a mediocre dev team will assume the RDBMS does things that it doesn’t actually do. A good dev team will build their application around its own real requirements and then the database features they need are reduced to durability alone.
I've seen non-relational databases used for things like logs, streaming data, time series, caching. I'm not saying they don't have uses. You probably know that prior to the release of Oracle in the mid-80s all databases were non-relational. Some of the "NoSQL" ideas are actually old ideas, database models that got replaced by relational databases decades ago.
Lots of big companies use Redis, MongoDB, Cassandra, DynamoDB, etc. when those tools address a business requirement better than a relational database. I have not seen NoSQL databases replacing important relational databases, I have seen NoSQL databases used for specific business requirements.
Of course big companies do a lot of software development, so you're more likely to see a variety of languages and tools in a larger organization. That may or may not validate the tools some people in the organization choose. Many large companies still use COBOL.
Non-relational databases match one of the largest use cases that exist; large volume OLTP workloads. Non-relational databases typically can handle far more load on reads and writes than relational databases due to their intrinsic distributed structure, meaning that if you have any kind of transactional workload, for example, a frontend to a busy store (such as Amazon.com), non-relational databases are incredible. You can also back these frontend non-relational stores with relational ones but at much smaller scale in order to handle the OLAP requirement where relational databases shine, allowing for massively flexible queries when needing to pull data for reporting etc.
No disagreement from me. Relational doesn’t address every business requirement, nor do non-relational databases. They have enough overlap that you have to make an informed decision from experience.
The SQL vs. NoSQL debate got framed early on in terms of better and worse, old vs. new, which is unfortunate because it created ideological camps. Not every supposedly new thing is better than the old thing. Some NoSQL techniques predate relational databases, but no one younger than 50 has experience with what we did before Oracle and DB/2.
Choice of tools should match the requirements, not fads or anecdotes or personal preferences (or ignorance).
I’m sure you’re right. I haven’t seen it. I have seen projects use NoSQL databases and fail when a relational database was a better fit for the requirements. The fault isn’t in the tools, it’s in the decision-making process. It seems common now for teams to choose a “stack” before they define what they plan to build.
JSON is a serialization format, not a database schema. We went through this exact same thing with XML, also widely supported as a type in relational databases. Some people didn't understand that XML is not a database and went ahead to write all kinds of nonsense around it. JSON is far less rigorous (and far less bloated) than XML, it's closer to a CSV file. It's a great format for a lot of things but it's not a database.
From the Wikipedia article on JSON, right at the top: "JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a common data format with a diverse range of functionality in data interchange including communication of web applications with servers."
Looks like JSON schemas are in the works, so we'll probably go through the whole XML as a database bullshit all over again.
I have never seen this implemented or used but I'm not deep into the JSON world. This looks to me like "XML as a database" all over again, but correct me if it has some other application. There's nothing wrong with XML or JSON schemas, but they are not equivalent to a relational schema -- they are more (properly) used for validation and standardization, i.e. for a data interchange format both sides need to agree on the schema. An interchange format is not a database, it's a representation of things that might be retrieved from or stored in a database.
If choose to organize your data using a relational model and then choose not to normalize that relational design you're doing it wrong.
Some of the benefits you get by doing proper normalized include:
* Avoids repetitive entries and duplicate data.
* Helps reduce the storage space required.
* Prevents the restructuring of the database to accommodate new requirements.
* Prevents the need for coding changes to accommodate new requirements.
* Increases the speed and flexibility of queries.
With a poorly normalized relational design you are throwing away the essence of the 'relational' model and
that 'relational' concept is right there in the name, RDMS.
Normalizing allows enforcing referential integrity and reduces/eliminates update anomalies. Those seem like theoretical concerns until your data gets corrupted.
Surely since you agree normalization helps to 'reduces/eliminates update anomalies' then that also means it's going to reduce the chance of data corruption?
Also what type of software design and development is not based on theoretical concerns?
Software developers spend years studying theory in the hope they can use that knowledge in their own designs.
Now I don't disagree normalization is a 'theoretical exercise' but it is also a helpful tool/technique that will make your life easier.
I think you misunderstood my comment. I have heard developers dismiss normalization as a theoretical concern. It’s not. Normalizing prevents update anomalies, for one thing. That’s not the same as “data corruption.”
In my experience most software developers don’t spend years studying theory. They spend hours studying buzzwords and fads.
Normalization is a logical process, part of database design. It’s not a by-product of a schema or design pattern, nor is it an implementation detail.
OO is ultimately based on pointers, even when hidden as inheritance, or called something else (references, composition). That form of organizing data is not relational pretty much by definition. You can’t” just do that [normalization] with OO patterns.” I’m not even sure what that means.