I'm working on building an educational environment with Jupyter, and I'm interested in the multiple hubs.
A few basic questions:
Why multiple hubs? (was there some point of scale where you needed this)
Did multiple hubs allow you to have better migrations? (where you drain one and move it over to the other)
Totally agree that state is the enemy of scale, so having a separate service backing your storage independent of what hub you're on seems like a big win.
A few reasons for this, most of which are related to points you mentioned:
1. Having multiple hubs makes it much easier to do zero-downtime deploys.
2. Having multiple hubs makes us more resilient to transient machine failures.
3. We were worried that having a single proxy for all our notebook traffic might become a system-wide bottleneck. Notebooks with a lot of images can get pretty large, and at the time we were rolling this out JupyterHub was pretty new. We weren't sure how well it was going to scale (the target audience for the JupyterHub team at the time was small labs and research teams), so it seemed safest to aim for horizontal scalability from the start. The JupyterHub team has since done a lot of awesome performance work to support the huge data science classes being taught at UC Berkeley, so it's possible that a single hub with the kubernetes spawner could handle our traffic today, but given points (1) and (2) plus the fact that we already have a working system, I don't have much incentive to find out :).
That's great, thanks! I was also curious if you hit scale issues on just one hub. I agree, it's best practice to not have all your eggs in one basket. I'd love to see an HA hub where this would be all taken care of for me, but hopefully by the time we go live we'll have this.
Are the cell sharing extensions mentioned in the slides open sourced? (Sorry if it says either way in the video, I didn't get chance to watch it in full yet). Lack of sharing / collaboration extensions for Jupyter Notebooks / Lab are still a weak point I think.
The sharing machinery isn't open source, mostly because it's pretty tightly coupled to our community forums, which is a custom rails application.
I know that the jupyterhub team was working on https://github.com/jupyterhub/hubshare for a while as an open source sharing solution. I've also commented in https://github.com/jupyterhub/hubshare/issues/14 and elsewhere that I think PGContents (one of the libraries I talk about in the video) could be used as a basis for many kinds of sharing (though probably not realtime collaboration).
I gave an earlier version of this talk at JupyterCon 2017 https://www.youtube.com/watch?v=TtsbspKHJGo, which captured my full screen output. The pgcontents demo starts around 19:30 in that video.
Come on. Quantopian doesn't have 100k users... maybe 5k?
EDIT: Here we go again... downvoted, then probably flagged and reprimanded by mod dang or something. Sigh.
I actually spent hundreds of hours on Quantopian, and from the activity in the forums you wouldn't think it has 100k users. Either that, or it's the most muted community on the internet.
There have been a couple attempts to add dashboarding to Jupyter:
https://github.com/jupyter/dashboards was/is a dashboard system built by a team at IBM. I think the project stalled somewhat after IBM stopped funding it.
Zipline dev here. Zipline happily works on minutely data (in fact, we recently dropped support for daily mode entirely on Quantopian, which is built on top of Zipline).
All the tutorials and examples for Zipline use daily data because there's no freely-available minutely data that we can distribute to our users.
I looked at it briefly over the weekend and then got distracted trying to make numpy.isfinite() work on datetimes :(. It's still in the queue though! Feel encouraged to gently bump it if I don't get back to you in the next day or two.
I'm not sure I always agree with this. David Beazley's 2015 PyCon talk on concurrency (https://www.youtube.com/watch?v=MCs5OvhV9S4) was one of my favorite talks of the conference, and it was almost all just live coding.
Part of what made that talk compelling was that it took a concept that lots of people find complex/intimidating (how the internals of an asynchronous IO library work) and in ~30 minutes created a full working example in front of a live audience. Writing the code live in front of the audience helps to nail down the central theme of "this stuff isn't actually as scary as it looks".
There are certainly talks that would be better of just presenting snippets of code, but I think there's a time and a place for live coding examples as well.
Being in the crowd during this talk was seriously like being at a rock concert.
Beazley was 'playing' the keyboard like an instrument. Every square inch of floor space had someone sitting or standing. The crowd was incredibly invested - nary an eye nor ear wavered. Even Guido looked on with a hawk eye.
I was in a small circle on the floor of people who had just smoked some amazing herb before the talk. I was hanging on his every word and every expression. I've rarely felt so engaged by a conference talk. I'll never forget this one.
He received a raucous standing ovation that is not evident from the conference video.
I asked a question at the end, and I was so giddy I had trouble getting it out. :-)
As a core contributor to an async framework, I felt that this talk gave me a lot more enthusiasm and confidence about my work which has lasted to this day. I think about it often. Definitely a track for the PyCon greatest hits album.
The people who really know what they are doing make the complicated stuff seem dirt simple. I had Dave as the instructor for my undergrad compilers and operating systems courses back in 2000-2001. His lectures then were every bit as enlightening as his PyCon talks today. Those courses were demanding but extremely fun.
I came to the comments on this one just to make sure this talk got mentioned as a counterpoint. Fantastic explanation of everything as he went along.
As I recall, he actually took the same conceptual problem and rewrote the solution in a handful of different concurrent styles.
And no, at least in this video I can not think faster than Dave Beazley can type. By the time I've just about figured out what nuance of concurrency he's showing off in his last example, he's already got his next example typed out!
Agree with this. In my undergraduate, 2nd year course of Opeating Systems, one day (pretty soon after the start) the teacher decided to write a small terminal emulator in C to show us what it really does, just there, in the classroom. It took him 2 hours of coding, but it really changed my perspective on how things really work in a UNIX based system, and on always checking in depth whether something that sounds like almost impossible really is.
> There are certainly talks that would be better of just presenting snippets of code, but I think there's a time and a place for live coding examples as well.
Step 1. be David Beazley. He really is such an engaging speaker and I think his jokes and lightheartedness might make it look easy, but I don't think it is. Many probably think in their heads "I'll be just like David on stage" but they are not.
I have see nice demos where everything is setup and they just run a command it builds or launches a VM, that's fine. But building code from scratch, watching it compile, dealing with 1 off errors, or some hidden bug that now everyone is debugging, is usually painful to bear through.
Yea I remember that one, it is a great example and awesome talk.
However as far as presenters go, this guy is a bit of an outlier. He is also a teacher, he offers some python mastery classes in Chicago, so he is more practiced at explaining and working through example code.
I came here to say exactly this and link to that talk. If you can't code live then don't, Beazley apparently can. Python lends itself to these kinds of talks because of its brevity.
I agree with you, but I'm probably biased because I've used live coding in one of my talks. However, the intent of coding live was similar to the talk you mentioned - it was to show people that what I was trying to accomplish isn't as hard as people think it is. In fact, it's easy enough that I can do it in an hour while explaining out loud what's happening.
A common practice while working on a Haskell code base is to fill in not-yet-completed functions with `undefined`, which is a divergent expression. This lets you do things like write:
func = undefined --TODO: implement func
and then you can use func in the rest of your code and as long as nothing strictly evaluates the results of func, you don't have to care about the fact that it's not yet defined. This lets you verify that the rest of your program still compiles properly, for example.
This means that preserving the lazy evaluation of undefined expressions matters. For example, it's perfectly OK to do this in ghci:
Prelude> let myfunc = undefined::(Integer -> Integer)
Prelude> let ints = [1, 2, 3]
Prelude> let results = map myfunc ints
Prelude> let square x = x * x
Prelude> let results_squared = map square results
All of this is fine because nothing has actually forced any computation to occur, but I still get the benefits of asserting that my expressions are well-typed. If I actually force evaluation by asking ghci to show a result, I'll get an error:
If you're interested in this sort of thing, you might also check out https://github.com/llllllllll/codetransformer, which is a general-purpose library for doing these sorts of shenanigans in an almost-sane way. (Disclaimer: I'm one of the library authors).
As an example, I've got a PR open right now that lets you do things like:
@mutable_locals
def f():
out = []
x = 1
out.append(x)
locals().update({'x': 2, 'y': 3})
out.append(x)
out.append(y)
return out
assert f() == [1, 2, 3]
which works using a combination of ctypes hackery and replacing all LOAD_FAST instructions with appropriately-resolved LOAD_NAME instructions.
Heh, thanks. It's amazing how well-suited Python's for-loop is for emulating this behavior: `break` just does exactly what you want, and you can throw in an else-block where you'd use a default in another language.
This is really cool. The AST transformation stuff here is neat, but relatively well-trodden ground.
The more impressive new science here is the lazy_function decorator, which is implemented as a bytecode transformer on the code object that lives inside the decorated function. The author built his own library for the bytecode stuff, which lives here: https://github.com/llllllllll/codetransformer.
Also, happy to answer any questions that people might have.