Selenium project founder here. (Hi!) Thanks for all your work on this project. Lots of negativity around here these days, but just wanted to say thanks. The functional style of Helium's API reminds me a lot of Selenium's original API when it was 100% JavaScript (aka Selenium 1 aka Selenium Core) back in 2004.
(Functional style: "method(thing)"
vs
object oriented style: "thing.method()")
We mostly abandoned the functional style when we merged with the WebDriver project (aka Selenium 2), but that functional style still lives on in the Selenium IDE record/playback tool.
That is all to say, there are fans of many different styles for automation APIs. No single API will please everyone. (But I personally like the simpler, functional style, fwiw!)
Side-note: This is also why I'm a fan of the Nim programming language. "method(thing)" and "thing.method" are supported syntax for literally the same thing. For others new to the idea, the fancy term for this is "Uniform Function Call Syntax".
UFCS is great. I really wish more languages would support something similar, although both pipe operators (thing |> function1 |> function2) or Rust's proposal for thing.(function) seem to also satisfy the syntactic ideal.
Importing * is universally discouraged by most Python linters and best practice docs. You can always "import helium as h" if you're looking to type less.
This looks largely like common workarounds that most people will write using Python-based browser automation. Most of the time, we accept that those capabilities aren't there by default because they are not explicit enough and can result in bugs and undefined behavior even when the elements that we expect to be on the page are actually there.
Given the adage "explicit is better than implicit", I worry that a layer like this might create more trouble than it's worth for the sake of readability. When we get into the nitty-gritty of browser automation, it might just make it harder to debug than going straight to Selenium or Playwright.
Importing * is universally discouraged by most Python linters and best practice docs.
Yup, I would never do it in a .py file. But I do it all of the time in the interpreter, which is what the video shows.
This looks largely like common workarounds that most people will write using Python-based browser automation. Most of the time, we accept that those capabilities aren't there by default because they are not explicit enough and can result in bugs and undefined behavior even when the elements that we expect to be on the page are actually there.
It sounds like you haven't tried Helium yet. I think you should, and see for yourself whether the trade-off you talk about actually exists.
Given the adage "explicit is better than implicit", I worry that a layer like this might create more trouble than it's worth for the sake of readability.
You could make the same argument about using C / assembly instead of Python. I suggest you try Helium before making statements about the "trouble" it may create. I believe you will find that there is no trouble.
Having done some ad-hoc, temporary automation with Selenium in the past (to help fellow, less technically-inclined designers) I wish I had this at the time.
Looks like a nice, almost natural language-like API around what is otherwise a quite cumbersome API.
How can a wrapper around selenium be lighter than it?
A wrapper around an API is by definition heavier (more code, more functions) than using the lower level api.
It’s not using less resources.
It’s not faster (it has implicit waiting).
It’s not less code; it’s literally a superset of selenium?
Feels like a “selenium framework” is more accurate than light weight web automation?
Anyway, there’s no fixing automation tests with fancy APIs.
No matter what you try to do, if people are only interested in writing quick dirty scripts, you’re doomed to a pile of stupid spaghetti no matter what system or framework you have.
If you want sustainable automation, you have to do Real Software Engineering and write actual composable modules; and you can do that in anything, even raw selenium.
So… I’d be more interested if this was pitched as “composable lego for building automation” …
…but, personally, as it stands all I can really see is “makes easy things easier with sensible defaults”.
That’s nice for getting started; but getting started is not the problem with automation tests.
Its use can be lighter. That is, the wrapper can be easier to use.
Helium helps with maintaining automation tests as well. click("Compose") is infinitely more maintainable than document.getElementById("eIu7Db").click(). (I just took this example from Gmail's web interface.)
That's just some superficial changes that often lead to confusion and other negative consequences down the road, especially when not handled carefully.
I would much rather directly rely on Selenium's stable APIs than someone else's wrapped APIs that is opionated and could be incomplete, incorrect, outdated and potentially unmaintained someday. There are always much more resources put into Selenium than these add-ons.
If I really want, I can choose a few APIs that I actually use and wrap them within my codebase. That's more reliable than this.
How do you compose low level operations like “click here” into composable modules like:
loginAsUser(user)
id = createBooking(user)
loginAsAdmin()
approveBooking(id)
?
Is it the same as selenium? Do whatever you want your self?
That’s what I’m talking about. Unless you have high level composable modules that let you express high level test activities then your tests will always fall apart.
The syntax of the low level operations doesn’t matter because you will never ever care about a click(“compose”).
That’s not a test.
A test might be:
createEmail()
attachFile(…)
… whatever your bespoke business requirements are.
Having fancy wrappers?
Is it nicer? Sure.
Does it meaningfully improve the tests, maintaining tests?
Nope.
Because at the end of the day the low level operations will be bespoke, nasty, messy and different for each website; that’s why you wrap them up in functions and compose them.
At least, in my experience; this looks a lot like cypress; a high level set of operations with sensible defaults for easy tasks.
…but, practically, I’m skeptical that hiding the low level nasty details actually makes them go away; it’s smoothing them over for the “happy path”; but automation tests are like 90% edge cases.
> It’s use can be lighter
I don’t think that’s the generally accepted meaning of a light weight framework.
> but, personally, as it stands all I can really see is “makes easy things easier with sensible defaults”.
“Lighter” may be used as an alternative adjective to the word easy or easier. Your post, which comes off as very rude, misses the point of how the project is marketed.
At least the OP did not call it Python automation for humans …
> “Lighter” may be used as an alternative adjective to the word easy or easier
That is, again, not common usage, there’s a word for easier to use; it’s “easier”. but whatever. It doesn’t matter; it’s just branding.
My point however, is that making easy to use frameworks for test automation is fundamentally misguided, and the responses like “try it, you’ll be amazed it makes all the problems go away” is the type of “drinking kool aid” that’s displays a deep lack of understanding of the problem space.
Doing easy things does not solve doing hard things; not here. Not in go. Not in rust. Not ever.
So, my point was (and is):
How does this address doing hard things because as someone who is familiar with this space and has tried it, I can’t see anything that helps with the hard things and no one who is heavily invested in automation realllly cares about doing easy things.
We can already do easy things
Another way of doing easy things is like using prettier or not; it’s a style preference.
So, is that what this is?
Selenium with a function calling style preference, or something that actual helps building automation?
There’s nothing wrong with making tools that make superficial cosmetic changes to the way you do things.
…but, that’s not how the project is marketed; as, at least, I’ve understood it.
as they only put text username not <label> and the input is named "acct" (without even the common decency to include autocomplete=username)
So if your script really did write that string into a what it thinks is 'username' then that's arguably one more thing to debug when its wizardry goes awry in some unknown way
Yes, there is no such <label>. There is only such <td>. And Playwright isn't smart enough to understand that that is still the label for the <input> element. Helium is smart enough.
So if your script really did write that string into a what it thinks is 'username' then that's arguably one more thing to debug when its wizardry goes awry in some unknown way
I tested my script. It writes into the correct field. The logic is not hard: "Find an element to the right of the given label." If there are multiple, then Helium uses the one that's closest to the last element it interacted with. That's just how a human would do it. It works surprisingly well. In many years of using Helium, I barely recall this causing problems once.
Try it before judging. You will be surprised by how well it works.
> And Playwright isn't smart enough to understand that that is still the label for the <input> element. Helium is smart enough.
I'm glad you like your project, and I'm sure there are others who will similarly enjoy that kind of magick. However, it's super disingenuous to write an example that asks a standards based API to find a non-existent element and then clutch pearls because it didn't find a non-existent element. page.get_by_label("I dunno, I didn't read, do what I am thinking").fill('lol') similarly would not work but that's not the awesome dunk you think it is
It's not disingenuous. The HN example was literally the first one I tried. It's what happens in the real world. The real world doesn't adhere to standards, much of the time.
Just like what you've done with Selenium, such a wrapper could be written for Playwright (I think that's what most developers end up doing anyways, just in a more domain-specific manner)
It is a useful tool, similar to others that accomplish the same task (WATIR and Capybara in Ruby, for example). My point was that the comparison of a wrapper to an underlying library is a bit apples to oranges, as a similar wrapper could be written for Playwright as well. I haven't looked at the code, but I assume Helium's API could be used to support Playwright (which itself was an evolution of Puppeteer).
I like what you're doing with Helium, and while you are technically correct that it's half as long - IMHO it's a bit disingenuous considering that in any meaningful web automation script, you'd only need to put in the initialization code a single time, e.g.:
Looks nice. Is it possible start_chrome() with specific chrome browser profile name or re-use existing open firefox/chrome browser session and launch a new tab with specific domain?
Nice - I can see some cool agentic flows created using this. A thing I want to look into is creating a sandbox instance (Ubuntu?) and letting an agent do its thing. Could be collecting data or answering questions and I can pull up the window to check in from time to time. It'll be like having an assistant.
How easy is it to detect that this is automation as opposed to a real user? I suppose probably pretty easy, so not sure if it is useful if I want to automate the web for things I do every day as I would really be running the risk of turning off access to those things if they determined I am automating them.
This is a wrapper on top of Selenium, so unless the library implements additional techniques to improve stealth, it's on par with Selenium's detectability (which as you pointed out can be detected easily enough)
Rolling in a captcha solving service like DeathByCaptcha or AntiCaptcha and you got yourself a quick and easy script that can do anything on any website regardless of captchas.
Very cool! Could be a kind of open-source, text-based (eg recipes are .md with instructions) version of KeyboardMaestro!
I'd love to see such an "open automation" format (could even be more general than pure software, could also automate your IoT or whatever, through extensions)
eg you could have a file "Type my bank login password" for bank websites which doesn't let you use keyboard input but force you to click on stuff, like a self-documented script using .md with code
# Type my bank login password
## Trigger
```trigger:hotkey
key: cmd+l
filter: frontmost-app=Chrome and chrome.tab.url=~mybank.com/login
```
## Deps
```ensure-deps
shell-runner>=1.*
screen-ocr>=1.*
python-runner>=1.*
```
Ensure that my system has the proper extensions for the framework, to run all tasks
## What it does
This automation lets me input my password in a "click-only" input for my lousy bank UI
```run:shell /bin/sh:capture-output=password
echo $(op --vault personal --site mybank)
```
(the above runs the shell script and captures the output as a "password" variable I can use in other scripts below)
```run:screen-ocr:capture-output=ocr-result
window:chrome
```
...go on scripting using typescript/python to locate the numbers in the ocr-result
Thanks! Helium only automates browsers. If the 2FA is happening in the browser, then you can use Helium to automate the flow. If it's outside, then that part cannot be handled by Helium.
This. Seems like you could wedge this and a model into a scrappy version of computer use for browsers.
Fwiw, thanks for contributing this. It seems apt for a number of repetitive things I probably do dozens of times a week and don't even notice as cruft anymore.
I'm not sure why there were such hot takes on what this is or isn't. Maybe Big Selenium crisis actors? You made something cool, you shared it w/ world -- that should be the system prompt for people posting about it in my kinder world of things.
(Functional style: "method(thing)" vs object oriented style: "thing.method()")
We mostly abandoned the functional style when we merged with the WebDriver project (aka Selenium 2), but that functional style still lives on in the Selenium IDE record/playback tool.
That is all to say, there are fans of many different styles for automation APIs. No single API will please everyone. (But I personally like the simpler, functional style, fwiw!)
Side-note: This is also why I'm a fan of the Nim programming language. "method(thing)" and "thing.method" are supported syntax for literally the same thing. For others new to the idea, the fancy term for this is "Uniform Function Call Syntax".