Helium: Lighter Web Automation with Python

hugs · on Dec 12, 2024

Selenium project founder here. (Hi!) Thanks for all your work on this project. Lots of negativity around here these days, but just wanted to say thanks. The functional style of Helium's API reminds me a lot of Selenium's original API when it was 100% JavaScript (aka Selenium 1 aka Selenium Core) back in 2004.

(Functional style: "method(thing)" vs object oriented style: "thing.method()")

We mostly abandoned the functional style when we merged with the WebDriver project (aka Selenium 2), but that functional style still lives on in the Selenium IDE record/playback tool.

That is all to say, there are fans of many different styles for automation APIs. No single API will please everyone. (But I personally like the simpler, functional style, fwiw!)

Side-note: This is also why I'm a fan of the Nim programming language. "method(thing)" and "thing.method" are supported syntax for literally the same thing. For others new to the idea, the fancy term for this is "Uniform Function Call Syntax".

lblume · on Dec 12, 2024

UFCS is great. I really wish more languages would support something similar, although both pipe operators (thing |> function1 |> function2) or Rust's proposal for thing.(function) seem to also satisfy the syntactic ideal.

languagehacker · on Dec 11, 2024

Importing * is universally discouraged by most Python linters and best practice docs. You can always "import helium as h" if you're looking to type less.

This looks largely like common workarounds that most people will write using Python-based browser automation. Most of the time, we accept that those capabilities aren't there by default because they are not explicit enough and can result in bugs and undefined behavior even when the elements that we expect to be on the page are actually there.

Given the adage "explicit is better than implicit", I worry that a layer like this might create more trouble than it's worth for the sake of readability. When we get into the nitty-gritty of browser automation, it might just make it harder to debug than going straight to Selenium or Playwright.

mherrmann · on Dec 11, 2024

Importing * is universally discouraged by most Python linters and best practice docs.

Yup, I would never do it in a .py file. But I do it all of the time in the interpreter, which is what the video shows.

This looks largely like common workarounds that most people will write using Python-based browser automation. Most of the time, we accept that those capabilities aren't there by default because they are not explicit enough and can result in bugs and undefined behavior even when the elements that we expect to be on the page are actually there.

It sounds like you haven't tried Helium yet. I think you should, and see for yourself whether the trade-off you talk about actually exists.

Given the adage "explicit is better than implicit", I worry that a layer like this might create more trouble than it's worth for the sake of readability.

You could make the same argument about using C / assembly instead of Python. I suggest you try Helium before making statements about the "trouble" it may create. I believe you will find that there is no trouble.

shepherdjerred · on Dec 12, 2024

It would be much more useful if you tried out the tool before criticizing it

Or, if you have tried it, if you could explain why you don’t think the tool makes the right tradeoffs

Adages like “explicit is better than implicit” are incredibly context dependent, otherwise we’d all be writing assembly

nkrisc · on Dec 11, 2024

Having done some ad-hoc, temporary automation with Selenium in the past (to help fellow, less technically-inclined designers) I wish I had this at the time.

Looks like a nice, almost natural language-like API around what is otherwise a quite cumbersome API.

wokwokwok · on Dec 11, 2024

How can a wrapper around selenium be lighter than it?

A wrapper around an API is by definition heavier (more code, more functions) than using the lower level api.

It’s not using less resources.

It’s not faster (it has implicit waiting).

It’s not less code; it’s literally a superset of selenium?

Feels like a “selenium framework” is more accurate than light weight web automation?

Anyway, there’s no fixing automation tests with fancy APIs.

No matter what you try to do, if people are only interested in writing quick dirty scripts, you’re doomed to a pile of stupid spaghetti no matter what system or framework you have.

If you want sustainable automation, you have to do Real Software Engineering and write actual composable modules; and you can do that in anything, even raw selenium.

So… I’d be more interested if this was pitched as “composable lego for building automation” …

…but, personally, as it stands all I can really see is “makes easy things easier with sensible defaults”.

That’s nice for getting started; but getting started is not the problem with automation tests.

It’s maintaining them.

mherrmann · on Dec 11, 2024

Its use can be lighter. That is, the wrapper can be easier to use.

Helium helps with maintaining automation tests as well. click("Compose") is infinitely more maintainable than document.getElementById("eIu7Db").click(). (I just took this example from Gmail's web interface.)

n144q · on Dec 11, 2024

That's just some superficial changes that often lead to confusion and other negative consequences down the road, especially when not handled carefully.

I would much rather directly rely on Selenium's stable APIs than someone else's wrapped APIs that is opionated and could be incomplete, incorrect, outdated and potentially unmaintained someday. There are always much more resources put into Selenium than these add-ons.

If I really want, I can choose a few APIs that I actually use and wrap them within my codebase. That's more reliable than this.

mherrmann · on Dec 11, 2024

You can freely mix Helium and Selenium API calls. You don't lose any of the power you are describing when you use Helium.

wokwokwok · on Dec 11, 2024

How do you compose low level operations like “click here” into composable modules like:

loginAsUser(user)

id = createBooking(user)

loginAsAdmin()

approveBooking(id)

?

Is it the same as selenium? Do whatever you want your self?

That’s what I’m talking about. Unless you have high level composable modules that let you express high level test activities then your tests will always fall apart.

The syntax of the low level operations doesn’t matter because you will never ever care about a click(“compose”).

That’s not a test.

A test might be:

createEmail()

attachFile(…)

… whatever your bespoke business requirements are.

Having fancy wrappers?

Is it nicer? Sure.

Does it meaningfully improve the tests, maintaining tests?

Nope.

Because at the end of the day the low level operations will be bespoke, nasty, messy and different for each website; that’s why you wrap them up in functions and compose them.

At least, in my experience; this looks a lot like cypress; a high level set of operations with sensible defaults for easy tasks.

…but, practically, I’m skeptical that hiding the low level nasty details actually makes them go away; it’s smoothing them over for the “happy path”; but automation tests are like 90% edge cases.

> It’s use can be lighter

I don’t think that’s the generally accepted meaning of a light weight framework.

…but eh, fair enough. I understand what you mean.

mherrmann · on Dec 11, 2024

> I’m skeptical that hiding the low level nasty details actually makes them go away

It makes 90+% of them go away. That's a big win. Try it.

Zardoz84 · on Dec 11, 2024

what you are asking is GEB

pryelluw · on Dec 11, 2024

> but, personally, as it stands all I can really see is “makes easy things easier with sensible defaults”.

“Lighter” may be used as an alternative adjective to the word easy or easier. Your post, which comes off as very rude, misses the point of how the project is marketed.

At least the OP did not call it Python automation for humans …

wokwokwok · on Dec 12, 2024

> “Lighter” may be used as an alternative adjective to the word easy or easier

That is, again, not common usage, there’s a word for easier to use; it’s “easier”. but whatever. It doesn’t matter; it’s just branding.

My point however, is that making easy to use frameworks for test automation is fundamentally misguided, and the responses like “try it, you’ll be amazed it makes all the problems go away” is the type of “drinking kool aid” that’s displays a deep lack of understanding of the problem space.

Doing easy things does not solve doing hard things; not here. Not in go. Not in rust. Not ever.

So, my point was (and is):

How does this address doing hard things because as someone who is familiar with this space and has tried it, I can’t see anything that helps with the hard things and no one who is heavily invested in automation realllly cares about doing easy things.

We can already do easy things

Another way of doing easy things is like using prettier or not; it’s a style preference.

So, is that what this is?

Selenium with a function calling style preference, or something that actual helps building automation?

There’s nothing wrong with making tools that make superficial cosmetic changes to the way you do things.

…but, that’s not how the project is marketed; as, at least, I’ve understood it.

wslh · on Dec 11, 2024

How does it compare with the "usual suspects"? I mean Playwright, Selenium, Cypress, and Puppeteer.

mherrmann · on Dec 11, 2024

It's more high-level. Instead of saying "click element with ID xv9873", you can say "click Download".

Yossarrian22 · on Dec 11, 2024

That's how Playwright works too

mherrmann · on Dec 11, 2024

Doesn't work for logging into HN:

    from playwright.sync_api import sync_playwright
    playwright = sync_playwright().start()
    browser = playwright.chromium.launch()
    page = browser.new_page()
    page.goto('https://news.ycombinator.com/login?goto=news')
    page.get_by_label('username').fill('mherrmann')
    # playwright._impl._errors.TimeoutError: Locator.fill: Timeout 30000ms exceeded.

I suspect Playwright expects there to be a <label> for an <input> element.

It does work with Helium:

    from helium import *
    start_chrome('https://news.ycombinator.com/login?goto=news')
    write('mherrmann', into='username')

The two scripts are equivalent, except Helium's works and is half as long.

mdaniel · on Dec 11, 2024

Depending on when you tried that, there is no such label "username" in view-source:https://news.ycombinator.com/login?goto=news

  <table border="0"><tr><td>username:</td><td><input type="text" name="acct" size="20" autocorrect="off" spellcheck="false" autocapitalize="off" autofocus="true"></td></tr><tr><td>password:</td><td><input type="password" name="pw" size="20"></td></tr></table><br>

as they only put text username not <label> and the input is named "acct" (without even the common decency to include autocomplete=username)

So if your script really did write that string into a what it thinks is 'username' then that's arguably one more thing to debug when its wizardry goes awry in some unknown way

mherrmann · on Dec 11, 2024

Yes, there is no such <label>. There is only such <td>. And Playwright isn't smart enough to understand that that is still the label for the <input> element. Helium is smart enough.

So if your script really did write that string into a what it thinks is 'username' then that's arguably one more thing to debug when its wizardry goes awry in some unknown way

I tested my script. It writes into the correct field. The logic is not hard: "Find an element to the right of the given label." If there are multiple, then Helium uses the one that's closest to the last element it interacted with. That's just how a human would do it. It works surprisingly well. In many years of using Helium, I barely recall this causing problems once.

Try it before judging. You will be surprised by how well it works.

mdaniel · on Dec 11, 2024

> And Playwright isn't smart enough to understand that that is still the label for the <input> element. Helium is smart enough.

I'm glad you like your project, and I'm sure there are others who will similarly enjoy that kind of magick. However, it's super disingenuous to write an example that asks a standards based API to find a non-existent element and then clutch pearls because it didn't find a non-existent element. page.get_by_label("I dunno, I didn't read, do what I am thinking").fill('lol') similarly would not work but that's not the awesome dunk you think it is

mherrmann · on Dec 11, 2024

It's not disingenuous. The HN example was literally the first one I tried. It's what happens in the real world. The real world doesn't adhere to standards, much of the time.

bdcravens · on Dec 11, 2024

Just like what you've done with Selenium, such a wrapper could be written for Playwright (I think that's what most developers end up doing anyways, just in a more domain-specific manner)

TimTheTinker · on Dec 11, 2024

This project was started long before Playwright existed.

I's an OSS tool that had very good reason to be made the way it was at that time, and continues to be useful (in my opinion).

bdcravens · on Dec 11, 2024

It is a useful tool, similar to others that accomplish the same task (WATIR and Capybara in Ruby, for example). My point was that the comparison of a wrapper to an underlying library is a bit apples to oranges, as a similar wrapper could be written for Playwright as well. I haven't looked at the code, but I assume Helium's API could be used to support Playwright (which itself was an evolution of Puppeteer).

vunderba · on Dec 11, 2024

I like what you're doing with Helium, and while you are technically correct that it's half as long - IMHO it's a bit disingenuous considering that in any meaningful web automation script, you'd only need to put in the initialization code a single time, e.g.:

    from playwright.sync_api import sync_playwright
    playwright = sync_playwright().start()
    browser = playwright.chromium.launch()
    page = browser.new_page()

fermigier · on Dec 11, 2024

"We shut down the company at the end of 2019 and I felt it would be a shame if Helium simply disappeared from the face of the earth."

I appreciate the effort. Thank you M. Hermann.

giis · on Dec 11, 2024

Looks nice. Is it possible start_chrome() with specific chrome browser profile name or re-use existing open firefox/chrome browser session and launch a new tab with specific domain?

mherrmann · on Dec 11, 2024

I don't know. Please check if Selenium supports this and if yes, use Helium's set_driver(...) or options argument to start_chrome(...).

bilater · on Dec 11, 2024

Nice - I can see some cool agentic flows created using this. A thing I want to look into is creating a sandbox instance (Ubuntu?) and letting an agent do its thing. Could be collecting data or answering questions and I can pull up the window to check in from time to time. It'll be like having an assistant.

bryanrasmussen · on Dec 11, 2024

How easy is it to detect that this is automation as opposed to a real user? I suppose probably pretty easy, so not sure if it is useful if I want to automate the web for things I do every day as I would really be running the risk of turning off access to those things if they determined I am automating them.

bdcravens · on Dec 11, 2024

This is a wrapper on top of Selenium, so unless the library implements additional techniques to improve stealth, it's on par with Selenium's detectability (which as you pointed out can be detected easily enough)

edm0nd · on Dec 11, 2024

Very neat!

Rolling in a captcha solving service like DeathByCaptcha or AntiCaptcha and you got yourself a quick and easy script that can do anything on any website regardless of captchas.

quickvi · on Dec 11, 2024

for lightweight automation outside the browser:

https://github.com/elyase/screenium

oulipo · on Dec 11, 2024

Very cool! Could be a kind of open-source, text-based (eg recipes are .md with instructions) version of KeyboardMaestro!

I'd love to see such an "open automation" format (could even be more general than pure software, could also automate your IoT or whatever, through extensions)

eg you could have a file "Type my bank login password" for bank websites which doesn't let you use keyboard input but force you to click on stuff, like a self-documented script using .md with code

    # Type my bank login password
    
    ## Trigger
    ```trigger:hotkey
    key: cmd+l
    filter: frontmost-app=Chrome and chrome.tab.url=~mybank.com/login
    ```
    
    ## Deps
    ```ensure-deps
    shell-runner>=1.*
    screen-ocr>=1.*
    python-runner>=1.*
    ```
    Ensure that my system has the proper extensions for the framework, to run all tasks
    
    ## What it does
    This automation lets me input my password in a "click-only" input for my lousy bank UI
    
    ```run:shell /bin/sh:capture-output=password
    echo $(op --vault personal --site mybank)
    ```
    (the above runs the shell script and captures the output as a "password" variable I can use in other scripts below)
    
    ```run:screen-ocr:capture-output=ocr-result
    window:chrome
    ```
    
    ...go on scripting using typescript/python to locate the numbers in the ocr-result

okso · on Dec 11, 2024

macOS only (uses Apple Vision framework)

erikcw · on Dec 11, 2024

I've used SikuliX[0] in the past for similar purposes. Unfortunately the author hasn't had much time to maintain it recently.

[0] https://github.com/RaiMan/SikuliX1

Havoc · on Dec 12, 2024

That looks useful. How does it know which box is the user field? Just read label and assume the one below that or to the right of the label?

mherrmann · on Dec 12, 2024

Pretty much, yes. And if there are multiple, then it uses the matching element closest to the last one it interacted with. Much like a human.

slt2021 · on Dec 11, 2024

Thank you for sharing this project, this is really good

bg24 · on Dec 11, 2024

Nice work! I looked at the cheatsheet, and it is not obvious to me how to go through two factor authentication during login.

mherrmann · on Dec 11, 2024

Thanks! Helium only automates browsers. If the 2FA is happening in the browser, then you can use Helium to automate the flow. If it's outside, then that part cannot be handled by Helium.

__mharrison__ · on Dec 11, 2024

Thanks for posting. All this AI has been interested in scraping personal sites.

mherrmann · on Dec 11, 2024

I have actually been wondering whether Helium's more high-level API lends itself well for use by AI.

grantc · on Dec 11, 2024

This. Seems like you could wedge this and a model into a scrappy version of computer use for browsers.

Fwiw, thanks for contributing this. It seems apt for a number of repetitive things I probably do dozens of times a week and don't even notice as cruft anymore.

I'm not sure why there were such hot takes on what this is or isn't. Maybe Big Selenium crisis actors? You made something cool, you shared it w/ world -- that should be the system prompt for people posting about it in my kinder world of things.

crazymoka · on Dec 11, 2024

Can it be headless?

mherrmann · on Dec 11, 2024

Yes: start_chrome(headless=True)

Byte64 · on Dec 11, 2024

This is so cool