>that is hard and takes time. And I'm arguing that's going to be a hell of r...

petrohi · on Jan 1, 2012

In Microsoft SQL Server row/extent/table locks are of transactional semantics and often turned off with nolock option. What really matters for concurrency is page latch, which is per 8K page.

In SQLite readers actually do block writers by default. Writing transactions are committed with lock escalation steps. First shared lock is acquired, then reserved, then pending and finally exclusive. Pending blocks new shared locks and waits until all in-flight shared locks are released. Again this is default rollback journal behavior. As of 3.7 write-ahead log allows readers to be concurrent with writers, but AFAIK is still rarely used.

MichaelGG · on Jan 1, 2012

I recently found out that turning on nolock is a horrible idea. It doesn't just take you out of transactions. That is, you won't just get uncommitted data, but apparently can get completely inconsistent data as internal structures are updated. That is, even rows not part of a current transaction might not be seen, if you use nolock.

Edit: Turning ON nolock, that is.

petrohi · on Jan 1, 2012

My experience with nolock is that you may get inconsistent rows with some fields before and some after update. Or even seeing duplicated rows when b-tree is rebalanced. But I never seen single fields being partially updated. Per Microsoft, page latch protects atomicity of a single field update. This is why nolock was extremely useful for insert-only tables and in our database design we had many of them.

jhancock · on Jan 1, 2012

My position on mongo is thus: Its goal is humongous data sets, hence the name. Until well proven, I'm not the type to use it for huge data, but will keep an eye out for case studies.

I have used Mongo on two projects with reasonably small data sets. My largest collection at the moment is 5 million, and that's basically a log. Other collections are less than 100,000. I've been running mongo 1.6 for a year on these two sites without so much as a hiccup. I do the normal very simple things to protect myself: a cron job to dump db and then copy it to a backup server. And that's it.

I enjoy using mongo for these projects because when I want to add a new feature to one of my domain models, I don't need to think much about retrofitting the data for all instances of that model. I just add an attribute where its needed for the new use case, ensure I have basic checking in my ruby model object and my system keeps incrementally improving.

I think the mongo folks are fantastic in their open dev process and maybe one day, some threshold will be crossed where I can say that for certain types of big data usage mongo is a clear solid choice.

ehthere · on Jan 1, 2012

'I don't need to think much about retrofitting the data for all instances of that model. I just add an attribute where its needed for the new use case, ensure I have basic checking in my ruby model object and my system keeps incrementally improving.'

That's exactly the same as adding a new column to your DB with NULL as the default value.

philjackson · on Jan 1, 2012

Not really.

Mongo has the notion of undefined and Null. You can just start putting the new field on new records without having to backfill. Also, you don't have to do the migration thing, which can get messy in big teams (from my experience).

Moving to a doc store from an RDMS really does bring with it an odd sense of freedom when it comes to the schema.

ehthere · on Jan 1, 2012

You don't need to do any 'migration thing', you just add the column to the DB and choose a sensible default value? I don't see what you gain by having both 'undefined' and 'null'.

The 'odd sense of freedom' is not always a good thing either. It's like BASIC allowing you to use a new variable without declaring it. It may be convenient but nobody calls it a good idea.

philjackson · on Jan 2, 2012

"You don't need to do any 'migration thing', you just add the column to the DB and choose a sensible default value?"

Taking the team I worked with at the BBC as an example:

1) There were staging, integration and production environments. Staging and integration would often not be aligned with production, or even one another, because we might find a bit of code turned out not to be production suitable/needed. If this happened we would have to drop the database back to a known, good state. You can't have columns with constraints left around when the code which might have satisfied those constraints is reverted. Doing it without migrations would have been idiotic to say the least.

2) Developers work on different features in different branches, often collaborating. Different features apply new attributes to the db schema. It's important for a developer to know his DB is in the correct state when he starts hacking. You do that with migrations.

Because you almost completely remove the need for schema definition (and what little of it you do, you can do in app. code) you simply don't need the migrations any more. Using mongo means you can pretty much just export your applications domain without having to coerce it into the relational model.

"I don't see what you gain by having both 'undefined' and 'null'."

They mean totally different things. Undefined means that the field has never been explicitly set, null means the field has been set. This means you know what's been backfilled and what hasn't - you can't tell without extra metadata in mysql. Also, in mysql if you provide a null default then every row has to be updated.

"It's like BASIC allowing you to use a new variable without declaring it. It may be convenient but nobody calls it a good idea."

I don't know BASIC but you can put Perl into a certain configuration that allows this. That makes for horrible scoping issues that aren't analogous or applicable to what we're talking about.

ehthere · on Jan 3, 2012

If you added a column and you want to revert back, you just drop the column again! What's so hard about that? No 'migration' needed.

In BASIC you can 'declare' a variable by simply using it. The compiler will not warn you if you use an undeclared variable. That's the analogous situation here.

dextorious · on Jan 1, 2012

"""My position on mongo is thus: Its goal is humongous data sets, hence the name. Until well proven, I'm not the type to use it for huge data, but will keep an eye out for case studies."""

Actually Mongo is bad for really humongous data sets.

It works well if the working data set (the data you commonly need) can fit in memory.

Of course this doesn't scale very well with say several terabytes of data, while there are Oracle databases that handle a lot more...

In the case you Mongo you go to sharding etc and things get complicated in your app handling.

philjackson · on Jan 1, 2012

"In the case you Mongo you go to sharding etc and things get complicated in your app handling."

Can you elaborate on this?