Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
SQL vs. NoSQL Is the Wrong Distinction (softwareatscale.dev)
108 points by todsacerdoti on Aug 25, 2021 | hide | past | favorite | 62 comments


> But these two camps of databases are slowly converging too, with JSON datatype support in MySQL and PostgreSQL, and pseudo-JOINs and transactions in MongoDB.

That's not enough to justify 'Oh I guess SQL vs NoSQL isn't about relational vs non-relational anymore'. It is 100% about relational vs non-relational. That SQL databases now support JSON is besides the point: nobody is saying to model your data as JSON documents just because SQL databases now support JSON. Similarly I don't believe that MongoDB is saying to model your data in a normalized relational fashion just because it supports 'pseudo joins' (are they even efficient?). The relational model is powerful because it stores data in a query agnostic way, and NoSQL databases are very much not that. If they were, they'd be SQL databases.

* yes, relational database != SQL. But SQL is the only widely used implementation for relation modelling, for better or worse. Show me a NoSQL databases that is built on relational modelling.


All you need for normalization is a consistent schema. SQL DDL on an RDBMS can do that but you could just do it with OO patterns. This is what my team does.

My prediction is that the next fad in database land will be normalized NoSQL. We can call it NoNoSQL.

e: Better idea. No2SQL.


I’m not sure you understand normalization. A consistent schema is the product of proper normalization, not the other way around.

Normalization is a logical process, part of database design. It’s not a by-product of a schema or design pattern, nor is it an implementation detail.

OO is ultimately based on pointers, even when hidden as inheritance, or called something else (references, composition). That form of organizing data is not relational pretty much by definition. You can’t” just do that [normalization] with OO patterns.” I’m not even sure what that means.


No2SQL is just physically implementing a normalized logical model on a NoSQL database instead of an RDBMS. That can be done with classes, methods, and attributes (OOP) providing RDBMS functionality like constraints and data types.

In DynamoDB you can use a single table model [1] to construct collections, which are similar to joins. The partition key is like a foreign key in an RDBMS. The sort key is like a combination of the normalized table identifier and primary key.

All of this can be as transparent to the business logic as it is in an RDBMS.

[1]: https://www.alexdebrie.com/posts/dynamodb-single-table/


That's an optimization known as denormalizing, often seen in BI systems intended for querying (no updates). It's not a different kind of normalization, it's explicitly denormalized to suite DynamoDB.

The relational model is independent of physical implementation (by definition). It's a logical model. Designing for or around the physical implementation (storage or performance characteristics) may be necessary sometimes, but it's also a red flag.

The article you linked repeats the canard about joins: "While convenient, SQL joins are also expensive. They require scanning large portions of multiple tables in your relational database, comparing different values, and returning a result set." Joins may be expensive, but in a well-designed relational schema joins are done on indexed columns, so there's no "scanning large portions of multiple tables" involved, the time complexity is O(log2 N), not exponential. The claim that SQL databases don't scale is rather obviously belied by the widespread (to the point of exclusive) use of large relational databases at scale. That doesn't mean relational databases address every business requirement, but they certainly do scale. When I hear this I assume the person saying it has never used Oracle or worked at a large company.

The problem with NoSQL databases isn't so much (lack of) schemas, it's lack of ACID guarantees. Those may not make much difference in some applications, but to big companies with important databases something like MongoDB with "eventual consistency" doesn't cut it.

I suppose someone could implement something that looks relational in OOP, but it's not a good fit. Imagine a class structure intended to implement customers and orders. How in the OO version do you do even simple things, like enforce uniqueness? How do you distribute this model, or replicate it? How do you accomplish joins without exponential time complexity? How do the objects persist (durability is the D in ACID)? Without ACID properties you don't even have a database, much less a relational database.


There are loads of big companies with huge, important databases in eventually-consistent nosql databases. In fact I’d say the bigger the company the more likely it becomes that they’ve got the software development chops to exploit that kind of database.


I've observed the opposite: larger and higher-skill engineering departments are more likely to have the expertise and perspective to scale RDBMSes and use them in advanced ways.

It's smaller or lower-skill/fad-chasing orgs that end up doubling down on non relational systems for as the primary data store, lured in money-for-free promises of "no schema needed" and "scales automatically".

Nonrelational datastores have a place, of course, but they tend to be much more niche/purpose specific in functional engineering orgs in my experience--to the point that an engineering group's ideas around relational databases (what can they be used for? Should they be the default option for new use cases? How far can they scale?) are an effective proxy for that group's skill level.


We all have our own experiences! Mine have been that a mediocre dev team will assume the RDBMS does things that it doesn’t actually do. A good dev team will build their application around its own real requirements and then the database features they need are reduced to durability alone.


I've seen non-relational databases used for things like logs, streaming data, time series, caching. I'm not saying they don't have uses. You probably know that prior to the release of Oracle in the mid-80s all databases were non-relational. Some of the "NoSQL" ideas are actually old ideas, database models that got replaced by relational databases decades ago.

Lots of big companies use Redis, MongoDB, Cassandra, DynamoDB, etc. when those tools address a business requirement better than a relational database. I have not seen NoSQL databases replacing important relational databases, I have seen NoSQL databases used for specific business requirements.

Of course big companies do a lot of software development, so you're more likely to see a variety of languages and tools in a larger organization. That may or may not validate the tools some people in the organization choose. Many large companies still use COBOL.


Non-relational databases match one of the largest use cases that exist; large volume OLTP workloads. Non-relational databases typically can handle far more load on reads and writes than relational databases due to their intrinsic distributed structure, meaning that if you have any kind of transactional workload, for example, a frontend to a busy store (such as Amazon.com), non-relational databases are incredible. You can also back these frontend non-relational stores with relational ones but at much smaller scale in order to handle the OLAP requirement where relational databases shine, allowing for massively flexible queries when needing to pull data for reporting etc.


No disagreement from me. Relational doesn’t address every business requirement, nor do non-relational databases. They have enough overlap that you have to make an informed decision from experience.

The SQL vs. NoSQL debate got framed early on in terms of better and worse, old vs. new, which is unfortunate because it created ideological camps. Not every supposedly new thing is better than the old thing. Some NoSQL techniques predate relational databases, but no one younger than 50 has experience with what we did before Oracle and DB/2.

Choice of tools should match the requirements, not fads or anecdotes or personal preferences (or ignorance).


"I have not seen NoSQL databases replacing important relational databases, I have seen NoSQL databases used for specific business requirements."

Look harder :-) Every single NoSQL vendor has examples of substantial "replace' projects.


I’m sure you’re right. I haven’t seen it. I have seen projects use NoSQL databases and fail when a relational database was a better fit for the requirements. The fault isn’t in the tools, it’s in the decision-making process. It seems common now for teams to choose a “stack” before they define what they plan to build.


Every single NoSQL vendor pretends that there is no difference between what you can do and what you should do.


I guess because in practice not everyone bothers to normalize, and many RDMS have native support for table inheritance.


While that is very true, it doesn't deter from the fact good relational database design relies on the design being well normalized.

By skipping the normalization steps you end up with a design that is far from optimal.


Which is why query optimization engines try their best to work around that, and most RDMS support JSON as native types.


JSON is a serialization format, not a database schema. We went through this exact same thing with XML, also widely supported as a type in relational databases. Some people didn't understand that XML is not a database and went ahead to write all kinds of nonsense around it. JSON is far less rigorous (and far less bloated) than XML, it's closer to a CSV file. It's a great format for a lot of things but it's not a database.

From the Wikipedia article on JSON, right at the top: "JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a common data format with a diverse range of functionality in data interchange including communication of web applications with servers."

Looks like JSON schemas are in the works, so we'll probably go through the whole XML as a database bullshit all over again.


I think you're a little late to the party. JSON Schemas have been a thing for over 10 years now (source: https://datatracker.ietf.org/doc/html/draft-zyp-json-schema-..., see also https://json-schema.org/).


Maybe. The current draft is here:

https://json-schema.org/draft/2020-12/json-schema-core.html

Still a "draft" standard as far as I can tell.

I have never seen this implemented or used but I'm not deep into the JSON world. This looks to me like "XML as a database" all over again, but correct me if it has some other application. There's nothing wrong with XML or JSON schemas, but they are not equivalent to a relational schema -- they are more (properly) used for validation and standardization, i.e. for a data interchange format both sides need to agree on the schema. An interchange format is not a database, it's a representation of things that might be retrieved from or stored in a database.


If choose to organize your data using a relational model and then choose not to normalize that relational design you're doing it wrong.

Some of the benefits you get by doing proper normalized include:

* Avoids repetitive entries and duplicate data.

* Helps reduce the storage space required.

* Prevents the restructuring of the database to accommodate new requirements.

* Prevents the need for coding changes to accommodate new requirements.

* Increases the speed and flexibility of queries.

With a poorly normalized relational design you are throwing away the essence of the 'relational' model and that 'relational' concept is right there in the name, RDMS.


Normalizing allows enforcing referential integrity and reduces/eliminates update anomalies. Those seem like theoretical concerns until your data gets corrupted.


Surely since you agree normalization helps to 'reduces/eliminates update anomalies' then that also means it's going to reduce the chance of data corruption?

Also what type of software design and development is not based on theoretical concerns?

Software developers spend years studying theory in the hope they can use that knowledge in their own designs.

Now I don't disagree normalization is a 'theoretical exercise' but it is also a helpful tool/technique that will make your life easier.


I think you misunderstood my comment. I have heard developers dismiss normalization as a theoretical concern. It’s not. Normalizing prevents update anomalies, for one thing. That’s not the same as “data corruption.”

In my experience most software developers don’t spend years studying theory. They spend hours studying buzzwords and fads.


Upvoted for the No2SQL pun.


Mongo has strong relational capabilities I believe



In the context of databases the terms “relational,” “schema,” and “normalized” have specific meanings. Relational databases have specific properties and guarantees that have nothing to do with SQL. Mongo doesn’t get “more relational” by supporting joins any more than Postgres gets less relational supporting JSON types.


And conversely leaving keys, constraints and foreign keys to goodwill and application layers makes traditional SQL-centered databases less relational, while draconian validation of JSON trees makes traditional "NoSQL" databases almost relational.


Sure. What you describe -- leaving keys, constraints, foreign keys to goodwill and application layers -- is fairly common. I'm not sure if that comes from ignorance or a fanatical devotion to OOP. When OO runs into the relational model (the so-called object/relational impedance mismatch) OO seems to "win" in the sense the RDBMS gets turned into dumb backing store.

I think it's obvious why this particular use (or misuse) of relational databases fails to scale, or gets hard to manage across multiple clients of the database possibly written in different programming languages. I have seen the same consistency guarantees implemented in Java and Python application code when the RDBMS could be doing that, and then hearing "SQL databases" blamed for the inconsistencies and scaling problems.


I think the main feature of relational databases is that

1. They store data as tables 2. The result of any query is also a table

THEREFORE you can apply queries to results of queries as long as you want to. You can compose queries out of smaller queries which makes them easier to reuse and understand. They are like LEGOs.


Not quite; relational refers not only to the storage of relations but also (more importantly) to the ability to request operations against those relations declaratively using a set-theory-esque set of predicates (usually SQL).

That's not a pedantic distinction; RDBs are a specific concept, and discussion in this area is rife with confusion about what is/isn't "relational", so it pays to use specific words.

I'd also argue that a critical feature of many relational databases (in practice; not inherently related to the relational model but facilitated by many implementations of it) is transactional guarantees and otherwise predictable management of concurrent data access.


> the ability to request operations against those relations declaratively

That is true but I think the main significance is not that you can "request operations ... declaratively" but that the results of those operations have the SAME FORM as the thing/relation/table they were performed on.

That is what makes relational algebra as a concept so powerful. It is like numbers. You can perform operations on numbers and you get ... numbers. You can perform operations on tables (a.k.a "relations") and you get ... tables.


> RDBs are a specific concept, and discussion in this area is rife with confusion about what is/isn't "relational", so it pays to use specific words.

This argument is getting a lot of are time in this thread. It might be worth considering that if you're arguing words have specific meanings...they no longer do. Databases are technology with decades of historical effort. Any specific terms you come up with will grow fuzzy as new entrants stretch and strain old ideas.


> if you're arguing words have specific meanings...they no longer do.

Some terms have a more precise definition than others. I think "relational model" has a very specific meaning, apart from products that claim to implement it or not.

https://en.wikipedia.org/wiki/Relational_model#:~:text=The%2....

> if you're arguing words have specific meanings...they no longer do.

More on that: Recently Zoom the company was fined 85 million because it used the term "End-to-End encryption" inappropriately in its marketing materials. They could have tried to defend themselves by saying "The word end-to-end-encryption" no longer has a specific meaning, therefore we can use it to mean anything we want".

But obviously that would not have worked I don't think.

https://arstechnica.com/tech-policy/2021/08/zoom-to-pay-85m-...


Maybe so. But a lot of people are drawing incorrect conclusions due to that fuzziness of language. What words do you think we should use instead when contrasting the qualities of various database systems?


Well, you could say that about key--value stores and document stores, too, couldn't you? Ex: 1. They store data as lists of documents, 2. The result of any query is also a list of documents.

If composition of queries is all you want, you can get it from way more things than just relational databases.


That is true. But there is more richness to tables/relations with their set-theoretic operations. What are the operations you can perform on a LIST? Well maybe the LISP like operations CAR CDR etc. But those are not very good set of DATABASE operations, like SELECT etc.

So relational means every operation produces a result to which all the same great set-theoretic database-operations are applicable. That is why it is a great basis for databases.


> Let’s try to distinguish these databases by strict vs. loose schema validation, instead of SQL vs. NoSQL, and things might become less confusing for the next generation of developers.

That along with denormalization is how I’ve always thought about it. I learned SQL first and I’ve always seen the queries as just different ways of expressing “give me all of the documents where x is true”.

To me, Mongo is just the one where you can accidentally have different schemas in the same table (collection), you can have lists and embedded documents in a document, and the data is already JSON. Make your data as relational or nonrelational (or both) as you want.


Schema on read versus schema on write?


I choose SQL every time. In fact Postgres.

I want a decent query language and I don't believe the hype about performance with NoSQL.

I certainly don't want to manage schema in code.

I was hoping the NoSQL thing would die a bit like XML. Perhaps it needs more time.


Many NoSQL options have schema management systems as well. That's not what makes NoSQL NoSQL.


That won't be enough to get me off Postgres. There would have to be a real benefit to switching.


> we could try to divide them up by relational vs. non-relational (perhaps that’s how most of us think of the distinction anyway). But these two camps of databases are slowly converging too, with JSON datatype support in MySQL and PostgreSQL, and pseudo-JOINs and transactions in MongoDB

They might be converging if you use non-relational to mean something pretty specifc (like "MongoDB and PostgreSQL JSON"). But unless we want to coin another term that means something else than the natural english meaning, it's problematic.

There are lots of databases that are not relational, but still have schema and structure in non table centric ways, like schemaful graph databases, object databases and things like Datomic and triple stores.


SQL is a dialect of a relational algebra. It assumes a relational model. It is used to query relations.


"Even the query language of SQL is loosely based on a relational algebra, though the operands in SQL (tables) are not exactly relations and several useful theorems about the relational algebra do not hold in the SQL counterpart (arguably to the detriment of optimisers and/or users). The SQL table model is a bag (multiset), rather than a set."

from: https://en.wikipedia.org/wiki/Relational_algebra#Implementat...

I remember how this suprised me a lot after thinking that SQL and SQL databases are based on relations, thinking "how can two equal rows co-exist in a relation". SQL needs things like "hidden identifiers / row keys" and so on to be squeezed into being "relational".


I'd argue from practical experience that an SQL which universally enforced set-relations would introduce significant implementation overhead: DISTINCT on large relations is v. expensive!

I believe it would also add complexity on net for users, as they would have to re-introduce those "hidden identifiers / row keys" in the not-uncommon cases where bag semantics are desired.


Generally speaking, I'd argue that data should almost always have a strict schema. But often times you don't have this kind of luxury: APIs are inconsistent and you may not have the CPU cycles to spare for a transform step; you're quickly iterating on a proof of concept and aren't even sure what your final schema is going to look like; and so on.

From a practical perspective, SQL is much more idiomatic than mongoDB's confusing query "language." But, cards on the table, almost every single new project I start up, I start with MongoDB :)


I often hear mentions of "no schema" when people talk about NoSQL, but I think that's a lie; it's not possible for data to have no schema because an application that assumes its data has no structure can't do anything with that data. However, a schema can be dynamic.

Having a dynamic schema implies that you get no guarantees from the database, so the application has to discover and validate the schema at runtime to be able to do useful things.

I like data integrity so I prefer static schemas, but sometimes it can be useful to have a component in your tables where you do the extra work of dynamic inspection or just transport it as an opaque blob.


Even if you want a strict schema, schema migration and updates are still nicer outside of sql.


I tend to think of it as a representation of feature sets in products. I know I might want feature x or y, but I might not know how frequent the use case is or how it might fit into the scalability aspect of my schema and architecture. So the ability to slow roll something new into a well defined (SQL) schema by amending it with a JSONB field property is the perfect time/complexity trade off. Once I have a clear understanding of how a user might actually use it, and if it offers value, it becomes an official part of the schema.

There are other parts of the process to determine where a new property fits for me( for example it might actually need an entire state store if its own), but in general this is where I begin.


You can certainly query relational data without using SQL syntax i.e. datalog. You can also query documents with SQL these days. I think the distinction lies along the lines of: does the system process queries using a query solver based on relational algebra?


Relational vs non-relation is one of the biggest misconceptions about NoSQL

Data is usually relational, there's no getting away from that just because of the database you use. NoSQL can manage all kinds of relations, just with an extremely different approach.

If you're interested in learning how this works instead of repeating the tired meme of "mongodb lol" checkout Alex Debrie's book on DynamoDB and Rick Houlihans YouTube content.

I also wrote a brief thread about this on Twitter: https://twitter.com/thdxr/status/1394903426023272452?s=19


I've seen a very good article on this: https://www.trek10.com/blog/dynamodb-single-table-relational.... It's an insightful dive into how KV stores can model relational data. However I also feel that these 'NoSQL can do relational modelling!' proponents are kind of hijacking the term: a relational model is understood to be based on relational algebra, and DynamoDB's Everything is One Big Table design ain't it.

NoSQL can model relational data, but it is not a relational model (as understood by E.F. Codd).

Also, NoSQL data access patterns are pretty much set in stone, which is kind of what the relational model explicitly avoids.


Book link for the lazy: https://www.dynamodbbook.com/


Isn't this pretty trivial and well-understood? I don't get the point of the post or why it's popular.


There is also N1QL (https://docs.couchbase.com/server/current/getting-started/tr...) that lets you query JSON documents while using familiar SQL commands. It also supports ACID transactions (https://docs.couchbase.com/server/current/n1ql/n1ql-language...) (disclaimer: I work for Couchbase, the creator of N1QL)


imo the main difference between databases is some databases have an address for data and some don't, meaning, when you want to go update some value, some databases say, ok, the data is there, go update it, other databases say, i don't care where it was written before, just append it here. The later has huge performance advantages if you are willing to live with the design constraints that puts on you.


> Instead of dividing databases by SQL vs. NoSQL, we could try to divide them up by relational vs. non-relational

Yeah, that's pretty much how I translated that whole NoSQL thing. As far as I'm concerned, when the industry uses SQL, I assume they mean relational with schemas.


There are multi-paradigm databases, e. g. Intersystems Caché, which allow you to access and manipulate the same set of data in no-SQL (MUMPS language) and SQL (and also as objects).

IIRC the open source YottaDB also has a SQL access mode, in addition to MUMPS.


They are categorical duals of each other.


I always prefer to SQLite3 & MySQL(>=5.7) .




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: