Here's 12 Sysadmin/DevOps (they're synonyms now!) challenges, straight from the ...

cobertos · 2025-12-01T04:59:12 1764565152

Re: 6. ... Github Actions

Github Actions left a bad taste in my mouth after having it randomly removed authenticated workers from the pool, after their offline for ~5 days.

This was after setting up a relatively complex PR workflow (always on cheap server starts up very expensive build server with specific hardware) only to have it break randomly after a PR didn't come in for a few days. And no indication that this happens, and no workaround from GitHub.

There are better solutions for CI, GitHub 's is half baked.

paulddraper · 2025-12-01T15:23:45 1764602625

This is documented currently (supposed to be 14 days). [1]

That said, I have found runners to be unnecessarily difficult.

But Jenkins and its own quirks, and when I used GitLab, it used ancient docker-machine and outdated AMIs by default.

I think Buildkite has been the only one to make this easy and scalable. But it is meant for self hosted runners.

[1] https://docs.github.com/en/enterprise-cloud@latest/actions/h...

cnunciato · 2025-12-03T01:07:08 1764724028

Buildkite also has hosted runners (which they all agents): https://buildkite.com/docs/pipelines/hosted-agents

paulddraper · 2025-12-03T21:59:22 1764799162

It does, but that came second.

It was originally (and still usually) used by those who wanted to self-host runners.

swyx · 2025-12-01T05:11:37 1764565897

bugs happen to all of us. whats your better solution - gitlab?

shoo · 2025-12-01T07:44:24 1764575064

Roll 2d6, sum result. Your CI migration target is:

  2. migrate secret manager. Roll again
  3. cloud build
  4. gocd
  5. jenkins
  6. gitlab
  7. github actions
  8. bamboo
  9. codepipeline
  10. buildbot
  11. team foundation server
  12. migrate version control. Roll again

swyx · 2025-12-01T08:28:49 1764577729

somehow i am really liking the kind of people that comment in the comment sections of sysadmin posts. i wonder what personality type this is

n4bz0r · 2025-12-01T09:25:31 1764581131

Sysadmin.

speakspokespok · 2025-12-01T10:34:41 1764585281

SysEng

mroche · 2025-12-01T18:58:39 1764615519

Bump up to 2d10 and add:

    - Travis
    - CircleCI
    - Drone/Woodpecker
    - Tekton Pipelines
    - TeamCity
    - Zuul
    - Buildkite
    - Agola

flyer23 · 2025-12-01T23:56:05 1764633365

IBM ClearCase anyone? Noone? I AM old

esseph · 2025-12-01T06:35:20 1764570920

GitLab pipelines are really good.

Balinares · 2025-12-01T07:30:19 1764574219

Not in love with its insistence on recreating the container from scratch every step of the pipeline, among a bundle of other irksome quirks. There are certainly worse choices, though.

friendzis · 2025-12-01T13:02:38 1764594158

Opposite of Jenkins where you have shared workspaces and have to manually ensure workspace is clean or suffer from reproducibility issues with tainted workspaces.

maratc · 2025-12-01T19:11:45 1764616305

It's up to you whether you have a shared workspace or not. My machines/pods are destroyed and recreated after each job, so I never had this issue.

esseph · 2025-12-02T00:08:58 1764634138

You don't actually have to.

If you use the built in container registry and build artifacts, you can pass between steps.

Balinares · 2025-12-03T20:05:30 1764792330

I'm aware, but thank you. Unfortunately, given sufficiently large artifacts, the overhead of packaging, uploading, downloading and unpacking them at every step becomes prohibitive.

sharts · 2025-12-01T05:24:17 1764566657

honestly jenkins really isnt that bad

friendzis · 2025-12-01T14:42:26 1764600146

Hudson/Jenkins is just not architected for large, multi-project deployments, isolated environments and specialized nodes. It can work if you do not need these features, but otherwise it's fight against the environment.

You need a beefy master and it is your single point of failure. Untimely triggers of heavy jobs overwhelm controller? All projects are down. Jobs need to be carefully crafted to be resumable at all.

Heavy reliance on master means that even sending out webhooks on stage status changes is extremely error prone.

When your jobs require certain tools to be available you are expected to package those as part of agent deployment as Jenkins relies on host tools. In reality you end up rolling your own tool management system that every job has to call in some canonical manner.

There is no built in way to isolate environments. You can harden the system a bit with various ACLs, but in the end if you either have to trust projects or build up and maintain infrastructures for different projects isolated at host level.

In cases when time-wise significant processing happens externally, you have to block an executor.

bionsystem · 2025-12-01T08:13:43 1764576823

Yeah I was thinking of using it for us actually. Connects to everything, lots of plugins, etc. I wonder what the hate is from, they are all pretty bad aren't they ?

Will test forgejo's CI first as we'll use the repo anyway, but if it ain't for me, it's going to be jenkins I assume.

n4bz0r · 2025-12-01T09:42:36 1764582156

Cons:

  - DSL is harder to get into.
  - Hard to reproduce a setup unless builds are in DSL and Jenkins itself is in a fixed version container with everything stored in easily transferable bind volumes; config export/import isn't straightforward.
  - Builds tend to break in a really weird way when something (even external things like Gitea) updates.
  - I've had my setup broken once after updating Jenkins and not being able to update the plugins to match the newer Jenkins version.
  - Reliance on system packages instead of containerized build environment out of the box.
  - Heavier on resources than some of the alternatives.

Pros:

  - GUI is getting prettier lately for some reason.
  - Great extendability via plugins.
  - A known tool for many.
  - Can mostly be configured via GUI, including build jobs, which helps to get around things at first (but leads into the reproducibility trap later on).

Wouldn't say there is a lot of hate, but there are some pain points compared to managed Gitlab. Using managed Gitlab/Github is simply the easiest option.

Setting up your own Gitlab instance + Runners with rootless containers is not without quirks, too.

maratc · 2025-12-01T16:34:44 1764606884

CASC plugin + seed jobs keep all your jobs/configurations in files and update them as needed, and k8s + Helm charts can keep the rest of config (plugins, script approvals, nodes, ...) in a manageable file-based state as well.

We have our main node in a state that we can move it anywhere in a couple of minutes with almost no downtime.

I'll add another point to "Pros": Jenkins is FOSS and it costs $0 per developer per month.

bionsystem · 2025-12-01T11:01:46 1764586906

I have a previous experience with it. I agree with most points. Jobs can be downloaded as xml config and thus kept/versioned. But the rest is valid. I just don't want to manage gitlab, we already have it at corp level, just can't use it right now in preprod/prod and I need something which will be either throwaway or kept just for very specific tasks that shouldn't move much in the long run.

n4bz0r · 2025-12-01T11:54:52 1764590092

For a throwaway, I don't think Jenkins will be much of a problem. Or any other tool for that matter. My only suggestion would be to still put some extra effort into building your own Jenkins container on top of the official one [0]. Add all the packages and plugins you might need to your image, so you can easily move and modify the installation, as well as simply see what all the dependencies are. Did a throwaway, non-containerized Jenkins installation once which ended up not being a throwaway. Couldn't move it into containers (or anywhere for that matter) without really digging in.

Haven't spent a lot of time with it myself, but if Jenkins isn't of much appeal, Drone [1] seems to be another popular (and lightweight) alternative.

[0] https://hub.docker.com/_/jenkins/

[1] https://www.drone.io

jagged-chisel · 2025-12-01T04:01:26 1764561686

> … from Jenkins to GitHub Actions.

Oh, good lord why?

0xbadcafebee · 2025-12-01T16:07:36 1764605256

Many, many reasons... the most important of which is, Jenkins is a constant security nightmare and a maintenance headache. But also it's much harder to manage a bunch of random Jenkins servers than GHA. Authentication, authorization, access control, configuration, job execution, networking, etc. Then there's the configuration of things like env vars and secrets, environments, etc that can also scale better. I agree GHA kinda sucks as a user tool, but as a sysadmin Jenkins will suck the life out of you and sap your time and energy that can go towards more important [to the company] tasks.

maratc · 2025-12-01T17:47:31 1764611251

I really scratch my head when I read your comment, as nothing of this is a real issue in my Jenkins.

> bunch of random Jenkins servers

Either PXE boot from an image, or k8s from an image, have a machine or pod rebooted/destroyed after one job. Update your image once a month, or have a Jenkins job to do that for you.

> Authentication, authorization, access control

Either use LDAP or Login via Github, and Matrix security plugin. Put all "Devops" group into admins, the rest into users, never touch it again.

> configuration

CASC plugin and seed for jobs, and/or Helm for just about everything else.

> env vars and secrets

Pull everything from Vault with Vault plugin.

> as a sysadmin Jenkins will suck the life out of you

I spend about 1-2 hours a week managing Jenkins itself, and the rest of the week watching the jobs or developing new ones.

0xbadcafebee · 2025-12-03T01:29:26 1764725366

Well one issue is, CasC isn't enough. You often have to write JobDSL to get around some limitation in CasC, and sometimes Groovy for limitations in the other two. If you want to manage access control (and you choose the correct Auth plugin, and figure out how to configure it), often you need an admin to make changes in both the Jenkins server and your backend AuthNZ system. Then there's the "seed job vs not-seed-job" weirdness that doesn't exist with GHA. And building the (hopefully containerized) Jenkins server, Jenkins build agents, etc will depend on your infrastructure provider, but still usually requires you to get your hands dirty. There are many, many more layers to the onion with Jenkins, and it's just not worth all that overhead for what should be "git clone && build && deploy" - which GHA does much simpler, right where your code lives, without you needing to maintain anything.

And this is if you get to manage it! Often there's 5 different random Jenkins servers set up by different teams, all of which are EOL and rife with security holes, and they expect you to fix them when they break, nobody version controls their configs or backs them up (they haven't even heard of CasC and have no interest in using it), and your boss says you can't say no, and also you can't upgrade them/take them over. I've seen million-dollar products which are completely dependent on over a thousand Jenkins jobs on an out-of-date Jenkins server, so complex and intertwined it couldn't be replaced.

If it were up to me, I would replace most CI with Drone.io (or Woodpecker CI if it ever gets feature parity). Now that's a dead simple CI system.

maratc · 2025-12-03T11:06:42 1764760002

My issue with GHA and other "dead simple" systems is that my CI is complicated. Having a real programming language for stuff like "calculate what date it was a week ago" or "concatenate these three strings but only under some conditions" or "parse the output and build an object out of it" is really helpful while a bastardised YAML-based Jinja template simply can't hold up.

But yeah, if all there is to do is "git clone && build && deploy" then Jenkins is an overkill and it probably wasn't warranted in the first place.

0xbadcafebee · 2025-12-03T17:19:17 1764782357

For complex logic I don't rely on the CI system; I've been burned too many times. I shell out to an external program and have it return an output variable, and I just do "if $foo = y then blah" in the CI's DSL (and I keep those tests to a minimum; rather have more separate jobs than one complex job). Often I will put everything in a dedicated build tool (Make or similar) so I can run it from my laptop or CI, and any change to logic only happens in one place. It's adding an abstraction, but the end result is I write the CI job once and never touch it again. For flexibility I add parameters to the CI job.

maratc · 2025-12-04T10:29:51 1764844191

> For complex logic I don't rely on the CI system

I do that too, until my complex logic belongs to the CI system and nowhere else.

As an example:

    It's 2 am now and Jenkins needs to run some (but not all) nightly tests[0]. To figure out which, let's bring the source code and analyse the configuration file, disregarding anything that doesn't need to run at this hour. Once we have the plan for which tests to run, let's figure out what we need to build[1]. Also, let's see what is the status of the pool running the tests[2] so we can decide on a tests parallelisation strategy for this run. When we have a plan, let's build and test, keeping an eye on the triggered tests. When all these have finished, analyse the failures[3], create tickets for the failures[4] and prepare a report to be sent.

I wouldn't be able to express all this in YAML.

[0] other "nightly tests" run at 1, 3, 4 etc.

[1] this is mapped in the configuration file too.

[2] this is internal to Jenkins

[3] same

[4] this involves finding the "responsible person," so a lot of API calls

0xbadcafebee · 2025-12-04T19:51:06 1764877866

Sounds like you're using cron as a complex job queue! A lot of teams get there eventually, and either 1) keep hacking on cron/jenkins/etc to make this work, 2) invent their own queueing tool (NIH syndrome; been done many times before, there is nothing new to make here), or 3) use a purpose-built solution for this. Airflow is the old-and-busted solution; the new hotness is newer generations of the same concept (Prefect/Dagster, Luigi, Temporal). But often sticking to your existing thing is cheaper; depends how much custom engineering you want to invest.

Fwiw, I do believe you can do this in GHA, but you may need to call their API from your workflow. In addition, their replacement for Groovy is to run an action which lets you embed Javascript/Typescript and call their SDK. It sucks, but so does Groovy! ;-)

vachina · 2025-12-01T04:30:21 1764563421

Because sysadmim wants to outsource their responsibilities (and job).

n4bz0r · 2025-12-01T09:59:34 1764583174

> Sysadmin/DevOps (they're synonyms now!)

I've notified the authorities and social services.

betaby · 2025-12-01T04:23:20 1764563000

5. and 6. are a matter of taste (trade-offs), the rest is spot on!

daemonologist · 2025-12-01T04:57:03 1764565023

You get me the permissions to do half of this stuff, and I'll do whatever you want.

Waterluvian · 2025-12-01T17:58:04 1764611884

Here’s the first step to all of these that I often see sysadmins stumbling on: communicate in written, non-abstract terms why each of these matter.

Most are obvious to most people. None are obvious to everybody.

Nextgrid · 2025-12-01T11:20:39 1764588039

> Get a user to stop logging in as root.

It really depends if the machine is hosting anything that you don't want some users to access. If the machine is single-purpose and any user is already able to access everything valuable from it (DB with customer data, etc) or trivially elevate to root (via sudo, docker access, etc) then it's just pointless extra typing and security theatre.

panzagl · 2025-12-01T15:46:03 1764603963

I guess no one ever audits your servers.

f1shy · 2025-12-01T11:00:50 1764586850

>> Sysadmin/DevOps (they're synonyms now!)

Is this really like that? Isn't there any Unix/DBA anymore? I associate DevOps to what at my time we called "operations" and "development". We had 5 teams or so:

1) Developers, who would architect and write code, 2) Operations who would deploy, monitor and address customer complaints, 3) Unix (aka SYS) administrators, who would take care of housekeeping of well, the OS (and web servers/middleware), 4) DBA who would be monitoring and optimizing Oracle/Postgres, and 5) Network admins, who would take care of Load Balancers, Routers, Switches, Firewalls (well, there were 2 security experts for that also)

So I think DevOps would be a mix of 1&2, to avoid the daily wars that would constantly happen "THEY did it wrong!"

Can somebody clear my mind, please!? It seems I was out of it for too long?!

Wilya · 2025-12-01T11:45:03 1764589503

In full-cloud environments, in small/middle companies I've worked at:

Developers handle 1). Devops handle 2)/3)/5). Nobody does 4)

f1shy · 2025-12-01T13:03:45 1764594225

Thanks. That is an interesting insight into the current reality. I assume the developers take care of optimization of queries; set up indexes and development of schemas and DB backups is handled by devops.

I must say, again I thought (I read it somewhere?) DevOps should take care of the constant battle between Devs and Operations (I've seen enough of that in my times) by merging 1 and 2 together. But it seems just a name change, and if anything, seems worst, as a (IMHO) critical and central component, like the DB, now has totally distributed responsibilities. I would like to know what happens when e.g. a DB crashes because a filesystem is full, "because one developer made another index, because one from devops had a complaint because X was too slow".

Either the people are extremely more professional that in my times, or it must be a shitshow to look while eating pop-corn.

friendzis · 2025-12-01T14:49:29 1764600569

> DevOps should take care of the constant battle between Devs and Operations

In practice there is no way to relay "query fubar, fix" back, because we are much agile, very scrum: feature is done when the ticket is closed, new tickets are handled by product owners. Reality is antithesis of that double Ouroboros.

In practice developers write code, devops deploy "teh clouds" (writing yamls is the deving part) and we throw moar servers at some cloud db when performance becomes sub-par.

sgarland · 2025-12-01T12:34:50 1764592490

Nobody does 4 until they’ve had multiple large incidents involving DBs, or the spend gets hilariously out of control.

Then they hire DBREs because they think DBA sounds antiquated, who then enter a hellscape of knowing exactly what the root issues are (poorly-designed schemata, unperformant queries, and applications without proper backoff and graceful degradation), and being utterly unable to convince management of this (“what if we switched to $SOME_DBAAS? That would fix it, right?”).

avhception · 2025-12-01T12:04:40 1764590680

Can confirm: that's exactly what we do.

rtp4me · 2025-12-01T15:08:13 1764601693

For 4) - consider PGHero[1] and PGTuner[2] instead of a full-time DBA. We use both in production and they work very well to help track down performance issues with Postgres.

[1] https://github.com/ankane/pghero

[2] https://pgtune.leopard.in.ua/

Edit: For the record, I have worked at a few small companies as the "SysAdmin" guy who did the whole compliment of servers, OS, storage, networking, VMs, DB, perf tuning, etc.

technion · 2025-12-01T08:05:04 1764576304

I know its a common view that sysadmin/devops are the same these days, but witha current sysadmin role nothing youve mentioned sounds relevant. Let's give you my list:

1. Patch Microsoft exchange with only a three hour outage window 2. Train a user to use onedrive instead of emailing 50mb files and back and forth 3. Setup eight printers for six users. Deal with 9gb printer drivers. 4. Ask an exec if he would please let you add mfa to their mailbox. 5. Sit there calmly while that exec yells like a wwe wrestler about the ways he plans to ruin you in response 6. Debate the cost of a custom mouse pad for one person across three meetings 7. Deploy any standard windows app that expects everyone be an administrator without making everyone an administrator 8. Deploy an app that expects uac disabled without disabling uac 9. Debug some finance persons 9000 line excel function

hnlmorg · 2025-12-01T09:17:09 1764580629

That sounds more like Desktop Support than a SysAdmin role. My condolences if that's the job you landed when interviewing for a SysAdmin role

0xbadcafebee · 2025-12-01T16:25:28 1764606328

I used to have that job, but my title wasn't Sysadmin, it was IT Manager. For companies small enough that they don't have multiple roles, you do both... but for larger companies, the user-side stuff is done by IT, and the server-side stuff is done by a Sysadmin. (And my condolences; having done that combined role, it's not easy, and you don't get paid enough!)

hansmayer · 2025-12-01T08:53:55 1764579235

What you describe sounds more like a MS "Modern Workplace" / IT support in a corporate environment.

technion · 2025-12-01T10:15:42 1764584142

Are we arguing that corporate workers arent "real sysadmins"?

jagged-chisel · 2025-12-01T10:48:48 1764586128

Pretty sure they mean “general IT support isn’t sysadmin work.”

jabroni_salad · 2025-12-01T15:09:11 1764601751

HN culture as a whole doesnt really recognize the validity of business that buy software vs build software.

fragmede · 2025-12-02T13:05:44 1764680744

This HN?

> you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.

https://news.ycombinator.com/item?id=8863

hansmayer · 2025-12-02T12:54:40 1764680080

Not really - there are lot of "real" sysadmins working with bought software such as RHEL and AAP...

hansmayer · 2025-12-02T12:52:28 1764679948

No. There are plenty of corporate sysadmins. I am arguing that MS Workplace Sysadmins are not the ones this advent was meant for.

Xiol · 2025-12-01T09:49:55 1764582595

i.e., Hell

hansmayer · 2025-12-02T12:53:08 1764679988

That is hands down the most concise description, yes.

dessimus · 2025-12-04T01:37:58 1764812278

>4. Ask an exec if he would please let you add mfa to their mailbox.

Ask?! This is where the org's cyber insurance is your friend. Just have the executive get the provider's clearance on him not having MFA. I'm sure that line item will change his mind, and if not, be sure to accidently mention those exemptions to those yearly auditors.

stackskipton · 2025-12-01T16:59:11 1764608351

Former Exchange Admin here: 1 is easy, I used to do 70k mailboxes in middle of the day only but it requires spare hardware or virtualization with headroom.

Deploy new Server(s), patch, install Exchange, Setup DAGs, migrate everyone mailbox, swing load balancer over to new servers, uninstall Exchange from old, remove old from Active Directory, delete servers.

BTW, Upgrades now suck because Office365 uses method above so upgrade system never gets good Q&A from them.

EvanAnderson · 2025-12-01T17:38:51 1764610731

Same feeling here re: migrations being easy if the Customer isn't a cheapass. Small business Customers who had the competing requirements of spending as little money as possible and having as much uptime as possible were the stressor.

alberth · 2025-12-01T04:50:38 1764564638

I’d be super interested to see solutions to each, just to learn from.

philipwhiuk · 2025-12-01T16:18:50 1764605930

You can deploy tooling (e.g. BeyondTrust / CyberArk for 1&2), but ultimately there's a conversation and a migration plan to be done for each.

athrowaway3z · 2025-12-01T09:53:18 1764582798

  9.  Get management to give you the authority to force users to rotate their AWS access keys which are 8 years old.

Saying "keys which are 8 years old" implies you're worried about the keys themselves, which is just wrong. (Their security state depends on monitoring)

You can definitely make a strong argument that the organization needs practice rotating, so I would advise reframing it as an org-survivability-planning challenge and not a key-security issue.

DoctorOW · 2025-12-01T15:33:08 1764603188

> Get a user to use configuration management rather than scp'ing config files from their laptop to the server.

Damn, this one I'm guilty of. Though, I'm not real Sysadmin/DevOps, I'm just throwing something together and deploying it on a LAN-only VM for security reasons (I don't trust the type of code I would write)

infogulch · 2025-12-01T15:15:35 1764602135

Q: 3. Get a user to upgrade their app's dependencies to versions newer than 2010.

A: Calculate the average age in years of all dependencies calculated by: (max(most recent version release date, date of most recent CVE on library) - used version release date). Sleep for that many seconds before the app starts.

JuniperMesos · 2025-12-01T06:29:39 1764570579

A lot of these problems seem pretty solveable, if you're the admin of the machine (or cloud system) and the user isn't.

If you don't want a user to log in as root, disable the root password (or change it to something only you know) and disable root ssh. If you want people to stop sharing the same login and password across all servers, there's several ways to do it but the most straightforward one seems like it would be to enforce the use of a hardware key (yubikey or similar) for login. If people aren't using configuration management software and are leaving machines in an inconsistent state, again there are several options but I'd look into this NixOS project: https://github.com/nix-community/impermanence + some policy of rebooting the machines regularly.

If you don't like how users are making use of AWS resources and secrets, then set up AWS permissions to force them to do so the correct way. In general if someone is using a system in a bad or insecure way, then after alerting them with some lead time, deliberately break their workflow and force them to come to you in order to make progress. If the thing you suggest is actually the correct course of action for your organization, then it will be worthwhile.

philipwhiuk · 2025-12-01T11:41:20 1764589280

None of them are technically hard. All of them are bureaucracy-hard.

If you just do any of this list without the proper migration plan/time, someone senior in the org will complain and you will lose.

jakeydus · 2025-12-01T12:36:33 1764592593

> If you just do any of this […], some senior in the org will complain and you will lose.

More accurate statement imo.

skywhopper · 2025-12-01T12:33:12 1764592392

It’s not as easy as “I can technically change this”. If you think it is, you don’t understand the job of a sysadmin.

AstroJetson · 2025-12-02T03:16:22 1764645382

I think the BOFH answer would be “They ride Elevator #2 to sub-basement 3.” Plot twist, there is only sub-basement 2.

Two pints of ale please!

UltraSane · 2025-12-01T13:13:04 1764594784

Best practice is to use IP-restricted keys.