Thanks for accurate title - this is not a full queueing system, but a unified API for hooking in bigger, badder queueing engines like Resque.
The point is to standardize the interface so other plugins/gems can simple make calls to Rails.queue rather than try to accomodate every queueing engine themselves.
Skimming through the code, this lets you register a Queue class to serialize your jobs. So, if you use something like Delayed Job, you register the (corresponding) DJ::Queue class that stores the jobs in whatever backend you desire and then process it later via your daemon of choice.
So far so peachy keen. This is alright, I can get behind this - it will make moving between queueing solutions more palatable which is not a feature I can complain about.
My question then is: how will this work by default? Will the default Queue have some sort of callback that executes after it returns the response? For stuff like sending emails, for small apps, this is actually palatable - I'm concerned about user latency than sheer requests/second.
J2EE was an entire specification for application frameworks. This is just defining the bridging interface between Rails and any (existing or not) queueing framework. It's already been achieved nicely with Rails.cache.
In any event, the web shows it is actually possible to separate interfaces from implementations. J2EE had other issues.
Let's not, Celery is doing an absolutely fantastic job in this space, let's just stay out of their way and do what we can in terms of exposing APIs to make their job easier. There's a reason django-core didn't write celery in the first place, we didn't have the need or the expertise; there are other people with both, let's let them do it.
> do what we can in terms of exposing APIs to make their job easier.
Isn't that exactly what Simon is suggesting above?
He's not saying Django should provide its own implementation of a background queue, he's saying it should provide a base API which could be implemented by any number of backends, of which assuredly Celery would be one -- just as happens now with cache backends.
I agree with Simon and the commenter above, this would be a great addition to Django. I think it fits with the Django "batteries included" philosophy -- in this day and age, a background queue is practically a requirement for anything but the most basic web app. It also encourages standalone Django application developers to make use of background queuing without fear of forcing a specific implementation on users.
Yes, that's exactly what I meant. Like you say: today, a background queue should be part of the default stack for a web (just like a template engine, database, session storage and a cache have been in the past - components which Django has provided since day one). No need to re-implement celery, but encouraging the Django ecosystem to embrace offline queues (and letting reusable apps know that they can push tasks in to an abstract queue of some sort) would be very healthy.
I wasn't suggesting have a generic queue API in Django, I was suggesting any APIs needed to enable that to live outside Django entirely, whether that's better hooks into transactions (e.g. so I can wait until a DB transaction is committed to fire the task item) or something else.
I'd love to see these job queuing platforms have better support for high performance computing (HPC). Currently there are two paradigms of queuing systems. Things like PBS/Torque and Sun/Oracle/Univa Grid engine which work very well for small numbers of largish batch jobs, and things Delayed Job, Background Job and Resque which work well for huge numbers of small jobs.
When you start dealing with large jobs, system resources start to become an issue. A job might take 48GB of memory, or it might take 1GB of memory, and the scheduler needs to be aware of this so that it isn't scheduling jobs on top of each other. Or you might have some low priority jobs that should only be run when the queue is mostly full so as not to compete with the high priority jobs. Or you might have jobs that depend on other jobs, and you want to enqueue them all and let the scheduler handle the dependencies. HPC schedulers deal with these requirements well.
On the other hand, you might be in a situation where you have 10s of thousands of jobs in the queue, and you need to add and remove jobs quickly. Things like resque and delayed job handle these situations well.
HPC schedulers were built for research purposes, and background job schedulers were built for the web applications. However there are more and more companies dealing with large data problems that span both worlds. They have some large jobs and tons of small jobs, and they don't want to manage two separate clusters with two schedulers to handle the tasks.
I plucked the relevant points of discussion that reveal the thought process.
Q: "I've heard for years that pagination should remain outside rails since it has to be lightweight, and now that !?"
homakov: good example, but "pagination" is a design-related thing(like decal on a car) but "queue" or delayed jobs(jquery-deferred for example) is deep engine built in feature. As cars vendor You shouldn't choose decals for driver but you should install the best and reliable stuff under its hood IMO
...
Q: What's the point?
josevalim: The point of the Queue is to be small and provide an API that more robust engines like resque and sidekiq can hook in. So you can easily start with an in memory queue (as you can see, the implementation does not even reach 100LOC) which is also easy to test and then easily swap to another one. Why this is good? By having an unified API, tools like Devise, Action Mailer can simply use Rails.queue.push() instead of worrying with compatibility for different plugins. So the goal here is provide an API for queueing and with a simple in memory implementation. It is not meant to be a robust queue system.
...
Q: Why not make it into a gem?
josevalim: The implementation today is less than 100LOC, so there is no reason to move it to an external gem. If the implementation actually grows a lot, which I highly doubt, we can surely consider moving it to a gem.
...
Q: Why include it in Rails at all?
DHH: This is really very simple: Do most full-size Rails applications, think Basecamp or Github, need to use a queue? If the answer is yes, and of course it is, this belongs in Rails proper.
...
Q: Then, and I'm not just trolling, should Rails provide an API for user authentication or authorization?
DHH: authentication, pagination, etc are all application-level concerns -- not infrastructure. Think Person model vs ActiveRecord model. Another way to think of it is, would two applications have materially different opinions on queue.push depending on what they're doing? The answer is no. That is not the case for authentication, pagination, and other application-level concerns where the usage is often very different depending on what the application is trying to do.
...
Q: Is Rails getting too big?
DHH: The size of Rails itself is not a first-order metric of neither progress nor decline. The right question is: Does Rails solve more common problems than before without making the earlier solutions convoluted? In other words, what are the externalities of progress? Will introducing a queue API make it harder to render templates? Or route requests? No. It's most direct influence will be on things like ActionMailer, so a fair question will be: Is it harder or easier to use ActionMailer in a best-practice way after we get this? That's a fair question, but I'm absolutely confident that this will make using idiomatic AM usage (queuing mail delivery outside of the request cycle) much easier. Thus, progress.
I am curious, under what circumstances would one use this, rather than something like Rescue? And there is so much competition in this space, what exactly is the argument for having this as part of Rails?
Or, let me put my question a little differently. Github did an awesome job writing about their experiences, and the reasoning that lead them to create Resque. I'm wondering if anyone on the Rails team has posted an essay with as much background info as what Github did here:
and the original article touches upon the issue that I'd like to ask about here:
"It's not that I hate you or anything, but you didn't get much attention lately. There're so many alternatives out there, and I think people have made their choice to use them than you. I think it's time for you to have a big rest, peacefully in this Git repository."
Can't something similar be said about job queues? "There're so many alternatives out there, and I think people have made their choice to use them than you."?
So why create a new job queue system, and make it an official part of Rails? I am not sure I understand the intent.
> I am curious, under what circumstances would one use this, rather than something like Rescue? And there is so much competition in this space, what
The goal is not to replace the existing queue solutions, but to create a common API, so the rest of the gems can can just treat all of them in a uniform way.
Quoting Jose Valim:
"The point of the Queue is to be small and provide an API that more robust engines like resque and sidekiq can hook in. So you can easily start with an in memory queue (as you can see, the implementation does not even reach 100LOC) which is also easy to test and then easily swap to another one.
Why this is good? By having an unified API, tools like Devise, Action Mailer can simply use Rails.queue.push() instead of worrying with compatibility for different plugins.
So the goal here is provide an API for queueing and with a simple in memory implementation. It is not meant to be a robust queue system. "
It looks like this is meant to be an interface with multiple backend implementations, so Resque would become one of the potential backends.
I see this as a similar thing to having an interface for caching which can then be backed by memcached, redis or the filesystem. It strikes me as an excellent idea - pretty much every web application should have an offline queue of some sort these days.
I don't believe the intent here is to replace Resque (Resque is awesome), but provide a slim API at Rails.queue that Resque/Delayed Job/BackgroundDRB/Torquebox/etc. could tie into, similar to how Rails.cache works now, in addition to adding a simplistic default implementation.
Considering Rails has always been about best practices--and background job queueing is definitely a best practice--I think this is a great move.
This will also allow other gems/plugins to have an easy way to push their own jobs into the queue rather than trying to support a bunch of different queue implementations.
As I understand the discussion (underneath the commit log, josevalim gives a comment), it's not about re-inventing a job queue but to offer an API for queues where you can hook in what you want. That way other services can use a queue (sending mails, processing frobnicates) through an advertised interface without having to rely on a specific implementation. You still can run resque behind it. (Caveat: I only read the discussion, this is not informed by interpreting the code)
GitHub doesn't actually use Resque directly (well, except some rare cases). defunkt built RockQueue to be our internal queue interface while he migrate the app from DelayedJob to Resque. This looks like the same concept.
A feature that may very well make me finally jump over to RoR. I've recently built quite a large site, and the only current bottle neck is when a few emails need to be sent off at the same time with attachments, and to be able to add that into a "que" and let the user continue browsing the site instead of stuck on a loading page (if only for a few seconds) would make the current set up ideal.
Incidentally - if any one has any way of doing this in PHP without having to setup cron jobs (and not using node or its derivatives), I'm really open to any ideas!
I've got great news then: you can make the jump to RoR today! :)
This news isn't about Rails implementing its own background queue, but rather creating a unified API for interacting with background queuing systems; of which there are many. Resque (crafted at GitHub [1]) is probably the most popular: https://github.com/defunkt/resque.
Although certainly not without its issues, the most popular solution for that platform is Gearman http://gearman.org . It's fairly ops-intensive, but the most friendly for PHP without having to resort to things like Stomp to interface with messaging (MQ) systems. Which are not optimally designed for job enqueing, per se.
With this commit, you can if you want. This code decouples Rails from external queue solutions. If your application needs to interact with a queue, you only have to write it once and you can use a standard API to do it. If your external queue solution (your DB queue code) conforms to the API, you can switch it out with another conforming solution when your needs call for it.
As someone pointed out in the OP comments, this is like Rack for queues.
A queue is FIFO oriented, a database is least-recently-used (LRU). It works, but is not going to be the most efficient tool.
Where a queue is really useful is converting from foreground to background, so that you can optimize for throughput, rather than having to leave free capacity for 'random arrivals' of your foreground servers. Think of it as the same as the same problem as the bursty traffic that a bank machine gets, and why you always seem to have to line up.
I too rolled my own, and while trivial to create, it's always made me uneasy. If there's a bug, I won't know about it; Amazon SES will reject the emails if they're sent all at once, or perhaps the calls won't be made at all.
I ended up doing a little status page for my newsletter; I set it up to auto refresh in Opera, each one of of the refreshes sends 10 emails, and prints their statuses/destination/titles as they go (it's also rate limited in memcached). I chuck that the laptop or a third monitor and leave it for a couple of hours, keeping an eye on it as it goes.
Using something off the shelf I could trust would be much nicer.
Incidentally, Amazon SES has limits on how many mails you can send a second - even after your account is confirmed by them. You can see this limit on your control panel. Mine shows around 5 mails per second.
So you will have to add some kind of throttling to make it work.
Yep, which was a big reason why I did it the way it was. I was paranoid that I'd make a slip up in the throttling code and send too many emails (I guess they actually check it over a 10 or 60 second window), and the rest of the batch wouldn't go through properly.
Just move your code to a register_shutdown_function() call and it will execute after the output has been sent, but without having to deal with forking a background PHP process or running out-of-context.
I don't use Rails but I often look to it for good/simple design ideas. I'm interested in seeing how they implement simple, effective, reliable background queuing.
interesting - not sure if it's really needed though - I've used Redis and Resque before and found it's performance was blisteringly fast. (Resque was made by Github https://github.com/blog/542-introducing-resque)
It isn't about speed or choice of queue, it's about a standerized API for working with queues so you can focus on developing your application domain. You will still be able to use Resque or Sidekiq or DJ or anything else, there will just be a standard API for all of them to use.
Also, it's an ecosystem feature. If a library or a component of rails (e.g. ActionMailer) wants to process something in a background queue, the choice doesn't have to be between a host of bad options:
* Forcing a dependency on a particular queue
* Writing a wrapper for all possible queues
* Falling back to queue-less behavior in the absence of a detected queue
They just use the Rails queue and it works on whatever real-world queue the user picks. Definitely good infrastructure IMO.
I think the point is to provide an abstraction layer, so that the community has some common feature set and protocol expectations when we're discussing different technical solutions to queuing.
Coupled with that, I would love to see Passenger support background workers with the same lifecycle as front-end workers (but last time I suggested that, it wasn't planned at all if I remember well).
We implemented something like this at the place I work.
We have a tiny ruby process, based on event machine, that subscribes to various queues (we happen to use RabbitMQ). When a message arrives, the process makes a request to the passenger instance passing along the message data and waits for a response. The process limits the number of requests it makes to prevent background requests from blocking out front-end requests (for example, 20% of passenger_max_pool_size). We're also simulating priority by using different prefetch values for different queues (for example, 10 messages for high queue and 5 messages for low queue).
...why? This isn't supposed to be the one true rails queueing implementation, just a standard interface to code against. Without it third party libraries have to resort to all sorts of ugly workarounds to push work into the background ... or just not provide that feature. With this in place third party tools can push into the queue without knowing or caring what specific kind of queue you're using.
Great news. A built-in background job queue should reduce the rails learning curve - simpler to use a default option than research and test the various custom options that are available now.
The point is to standardize the interface so other plugins/gems can simple make calls to Rails.queue rather than try to accomodate every queueing engine themselves.