ssanderson11235's comments

ssanderson11235 · on April 30, 2018

Speaker here. If you want to follow along with the slides from the talk, you can find them at https://speakerdeck.com/ssanderson/hosting-notebooks-for-100....

Also, happy to answer any questions that people might have.

cbanek · on April 30, 2018

I'm working on building an educational environment with Jupyter, and I'm interested in the multiple hubs.

A few basic questions: Why multiple hubs? (was there some point of scale where you needed this) Did multiple hubs allow you to have better migrations? (where you drain one and move it over to the other)

Totally agree that state is the enemy of scale, so having a separate service backing your storage independent of what hub you're on seems like a big win.

Thanks for the great talk!

ssanderson11235 · on April 30, 2018

> Why multiple hubs?

A few reasons for this, most of which are related to points you mentioned:

1. Having multiple hubs makes it much easier to do zero-downtime deploys.

2. Having multiple hubs makes us more resilient to transient machine failures.

3. We were worried that having a single proxy for all our notebook traffic might become a system-wide bottleneck. Notebooks with a lot of images can get pretty large, and at the time we were rolling this out JupyterHub was pretty new. We weren't sure how well it was going to scale (the target audience for the JupyterHub team at the time was small labs and research teams), so it seemed safest to aim for horizontal scalability from the start. The JupyterHub team has since done a lot of awesome performance work to support the huge data science classes being taught at UC Berkeley, so it's possible that a single hub with the kubernetes spawner could handle our traffic today, but given points (1) and (2) plus the fact that we already have a working system, I don't have much incentive to find out :).

cbanek · on April 30, 2018

That's great, thanks! I was also curious if you hit scale issues on just one hub. I agree, it's best practice to not have all your eggs in one basket. I'd love to see an HA hub where this would be all taken care of for me, but hopefully by the time we go live we'll have this.

porterde · on April 30, 2018

Are the cell sharing extensions mentioned in the slides open sourced? (Sorry if it says either way in the video, I didn't get chance to watch it in full yet). Lack of sharing / collaboration extensions for Jupyter Notebooks / Lab are still a weak point I think.

ssanderson11235 · on April 30, 2018

The sharing machinery isn't open source, mostly because it's pretty tightly coupled to our community forums, which is a custom rails application.

I know that the jupyterhub team was working on https://github.com/jupyterhub/hubshare for a while as an open source sharing solution. I've also commented in https://github.com/jupyterhub/hubshare/issues/14 and elsewhere that I think PGContents (one of the libraries I talk about in the video) could be used as a basis for many kinds of sharing (though probably not realtime collaboration).

diabeetusman · on April 30, 2018

Do you know if there's any way that we can see what's happening during the mini demo?

ssanderson11235 · on April 30, 2018

I gave an earlier version of this talk at JupyterCon 2017 https://www.youtube.com/watch?v=TtsbspKHJGo, which captured my full screen output. The pgcontents demo starts around 19:30 in that video.

chupapuma · on May 1, 2018

Hey Scott, how did you get your unlisted YouTube link for your presentation? I don't think I ever found mine from the same conference.

ssanderson11235 · on May 1, 2018

I found it in the YouTube playlist from the event: https://www.youtube.com/playlist?list=PL055Epbe6d5aP6Ru42r7h....

chupapuma · on May 7, 2018

Weird. Mine isn't there, or I am blind. shrugs Though I did find it by searching the O'Reilly user :)

fwdpropaganda · on April 30, 2018

Come on. Quantopian doesn't have 100k users... maybe 5k?

EDIT: Here we go again... downvoted, then probably flagged and reprimanded by mod dang or something. Sigh.

I actually spent hundreds of hours on Quantopian, and from the activity in the forums you wouldn't think it has 100k users. Either that, or it's the most muted community on the internet.

ssanderson11235 · on April 30, 2018

There have been a couple attempts to add dashboarding to Jupyter:

https://github.com/jupyter/dashboards was/is a dashboard system built by a team at IBM. I think the project stalled somewhat after IBM stopped funding it.

There are a few long threads in the currently active jupyter repos about building dashboard systems as extensions: https://github.com/jupyterlab/jupyterlab/issues/1640, https://github.com/jupyter-widgets/ipywidgets/issues/2018.

deven88 · on May 1, 2018

I was searching for such dashboard utility and I found this: https://github.com/oschuett/appmode It may be useful for some of the cases.

ssanderson11235 · on July 25, 2016

Zipline dev here. Zipline happily works on minutely data (in fact, we recently dropped support for daily mode entirely on Quantopian, which is built on top of Zipline).

All the tutorials and examples for Zipline use daily data because there's no freely-available minutely data that we can distribute to our users.

edsouza · on July 25, 2016

Good to hear, I stopped using quantopian because of lack of intraday details.

jjm · on July 25, 2016

;-) you ever review that PR I sent? No browsing HN on the Job! (I kid)

scoutoss11235 · on July 26, 2016

I looked at it briefly over the weekend and then got distracted trying to make numpy.isfinite() work on datetimes :(. It's still in the queue though! Feel encouraged to gently bump it if I don't get back to you in the next day or two.

ssanderson11235 · on Nov 13, 2015

I'm not sure I always agree with this. David Beazley's 2015 PyCon talk on concurrency (https://www.youtube.com/watch?v=MCs5OvhV9S4) was one of my favorite talks of the conference, and it was almost all just live coding.

Part of what made that talk compelling was that it took a concept that lots of people find complex/intimidating (how the internals of an asynchronous IO library work) and in ~30 minutes created a full working example in front of a live audience. Writing the code live in front of the audience helps to nail down the central theme of "this stuff isn't actually as scary as it looks".

There are certainly talks that would be better of just presenting snippets of code, but I think there's a time and a place for live coding examples as well.

jMyles · on Nov 13, 2015

Being in the crowd during this talk was seriously like being at a rock concert.

Beazley was 'playing' the keyboard like an instrument. Every square inch of floor space had someone sitting or standing. The crowd was incredibly invested - nary an eye nor ear wavered. Even Guido looked on with a hawk eye.

I was in a small circle on the floor of people who had just smoked some amazing herb before the talk. I was hanging on his every word and every expression. I've rarely felt so engaged by a conference talk. I'll never forget this one.

He received a raucous standing ovation that is not evident from the conference video.

I asked a question at the end, and I was so giddy I had trouble getting it out. :-)

As a core contributor to an async framework, I felt that this talk gave me a lot more enthusiasm and confidence about my work which has lasted to this day. I think about it often. Definitely a track for the PyCon greatest hits album.

dgrant · on Nov 13, 2015

> I was in a small circle on the floor of people who had just smoked some amazing herb before the talk.

This is a thing at software conferences?

jMyles · on Nov 13, 2015

I think it's a thing anywhere people gather, no? It's probably more a thing at community conferences than corporate conferences.

astrange · on Nov 14, 2015

Did you stand in the designated smoking area?

SourPatch · on Nov 13, 2015

The people who really know what they are doing make the complicated stuff seem dirt simple. I had Dave as the instructor for my undergrad compilers and operating systems courses back in 2000-2001. His lectures then were every bit as enlightening as his PyCon talks today. Those courses were demanding but extremely fun.

notdonspaulding · on Nov 13, 2015

I came to the comments on this one just to make sure this talk got mentioned as a counterpoint. Fantastic explanation of everything as he went along.

As I recall, he actually took the same conceptual problem and rewrote the solution in a handful of different concurrent styles.

And no, at least in this video I can not think faster than Dave Beazley can type. By the time I've just about figured out what nuance of concurrency he's showing off in his last example, he's already got his next example typed out!

meej · on Nov 13, 2015

Similarly, I really enjoyed Raymond Hettinger's 2015 PyCon talk, which also had a fair amount of live coding.

https://www.youtube.com/watch?v=wf-BqAjZb8M

winterismute · on Nov 13, 2015

Agree with this. In my undergraduate, 2nd year course of Opeating Systems, one day (pretty soon after the start) the teacher decided to write a small terminal emulator in C to show us what it really does, just there, in the classroom. It took him 2 hours of coding, but it really changed my perspective on how things really work in a UNIX based system, and on always checking in depth whether something that sounds like almost impossible really is.

inconshreveable · on Nov 13, 2015

Yep, there are certainly exceptions! It's not a blanket rule. Rants aren't quite as much fun if you equivocate for the 5% case though. =)

rdtsc · on Nov 13, 2015

> There are certainly talks that would be better of just presenting snippets of code, but I think there's a time and a place for live coding examples as well.

Step 1. be David Beazley. He really is such an engaging speaker and I think his jokes and lightheartedness might make it look easy, but I don't think it is. Many probably think in their heads "I'll be just like David on stage" but they are not.

I have see nice demos where everything is setup and they just run a command it builds or launches a VM, that's fine. But building code from scratch, watching it compile, dealing with 1 off errors, or some hidden bug that now everyone is debugging, is usually painful to bear through.

fuzzythinker · on Nov 13, 2015

Although not a "true" live coding demo in the normal sense, but this talk won't have nearly the same effect if done via video instead.

https://www.destroyallsoftware.com/talks/wat

ser_tyrion · on Nov 13, 2015

Yea I remember that one, it is a great example and awesome talk.

However as far as presenters go, this guy is a bit of an outlier. He is also a teacher, he offers some python mastery classes in Chicago, so he is more practiced at explaining and working through example code.

n0us · on Nov 13, 2015

I came here to say exactly this and link to that talk. If you can't code live then don't, Beazley apparently can. Python lends itself to these kinds of talks because of its brevity.

Bognar · on Nov 13, 2015

I agree with you, but I'm probably biased because I've used live coding in one of my talks. However, the intent of coding live was similar to the talk you mentioned - it was to show people that what I was trying to accomplish isn't as hard as people think it is. In fact, it's easy enough that I can do it in an hour while explaining out loud what's happening.

jskulski · on Nov 13, 2015

Thanks for the video, looks interesting.

ssanderson11235 · on Oct 10, 2015

A common practice while working on a Haskell code base is to fill in not-yet-completed functions with `undefined`, which is a divergent expression. This lets you do things like write:

  func = undefined --TODO: implement func

and then you can use func in the rest of your code and as long as nothing strictly evaluates the results of func, you don't have to care about the fact that it's not yet defined. This lets you verify that the rest of your program still compiles properly, for example.

This means that preserving the lazy evaluation of undefined expressions matters. For example, it's perfectly OK to do this in ghci:

  Prelude> let myfunc = undefined::(Integer -> Integer)
  Prelude> let ints = [1, 2, 3]
  Prelude> let results = map myfunc ints
  Prelude> let square x = x * x
  Prelude> let results_squared = map square results

All of this is fine because nothing has actually forced any computation to occur, but I still get the benefits of asserting that my expressions are well-typed. If I actually force evaluation by asking ghci to show a result, I'll get an error:

Prelude> results_squared [* Exception: Prelude.undefined

But it matters that this only happens when I actually force the computation somehow.

ssanderson11235 · on Sept 21, 2015

If you're interested in this sort of thing, you might also check out https://github.com/llllllllll/codetransformer, which is a general-purpose library for doing these sorts of shenanigans in an almost-sane way. (Disclaimer: I'm one of the library authors).

As an example, I've got a PR open right now that lets you do things like:

    @mutable_locals
    def f():
        out = []
        x = 1
        out.append(x)
        locals().update({'x': 2, 'y': 3})
        out.append(x)
        out.append(y)
        return out

    assert f() == [1, 2, 3]

which works using a combination of ctypes hackery and replacing all LOAD_FAST instructions with appropriately-resolved LOAD_NAME instructions.

ssanderson11235 · on Aug 19, 2015

Heh, thanks. It's amazing how well-suited Python's for-loop is for emulating this behavior: `break` just does exactly what you want, and you can throw in an else-block where you'd use a default in another language.

ssanderson11235 · on March 2, 2015

This is really cool. The AST transformation stuff here is neat, but relatively well-trodden ground.

The more impressive new science here is the lazy_function decorator, which is implemented as a bytecode transformer on the code object that lives inside the decorated function. The author built his own library for the bytecode stuff, which lives here: https://github.com/llllllllll/codetransformer.