Shell and SQL make you 10x productive over any alternative.
Nothing even comes close. I've seen people scrambling for 1 hours to write some data munging, then spend another 1 hour to run it through a thread pool to utilize those cores , while somebody comfortable is shell writes a parallelized one liner, rips through GBs of data, and delivers the answer in 15 minutes.
What Python is to Java, Shell is to Python. It speeds you up several times. I started using inline 'python -c' more often than the python repl now as it stores the command in shell history and it is then one fzf search away.
While neither Shell or SQL are perfect, there have been many ideas to improve them and for sure people can't wait for something new like oil shell to get production ready, getting the shell quoting hell right, or somebody fixing up SQL, bringing old ideas from Datalog and QUEL into it, fixing the goddamn NULL joins, etc.
But honestly, nothing else even comes close to this 10x productivity increase over the next best alternative. No, Thank you, I will not rewrite my 10 lines of sh into python to explode it into 50 lines of shuffling clunky objects around. I'll instead go and reread that man page how to write an if expression in bash again.
Shameless plug coming, it this has been a pain point for me too. I found the issue with quotes (in most languages, but particularly in Bash et al) is that the same character is used to close the quote as is used to open it.m. So in my own shell I added support to use parentheses as quotes in addition to the single and double quotation ASCII symbols. This then allows you to nest quotation marks.
You also don’t need to worry about quoting variables as variables are expanded to an argv[] item rather than expanded out to a command line and then any spaces converted into new argv[]s (or in layman’s terms, variables behave like you’d expect variables to behave).
Though Ruby makes it confusing AF because there are two quoting types for both strings and symbols, and they're different. (%Q %q %W %w %i) I can never remember which does which.... the letter choice feels really arbitrary.
This means that you can even quote the delimiter in the string as long as it's balanced.
$X=q( foo() )
Should work if it's balanced. If you choose a different pair like []{} then you can avoid hitting collisions. It also means that you can trivially nest quotations.
I agree that this qualified quotation is really underutilized.
I’m not fan of Python, however that’s down to personal preference rather than objective fact. If Python solves a problem for other people then who am into judge :)
I really don't think so! If you have experience with any scripting, you can fully grok the fundamentals of awk in 1 hour. You might not memorize all the nuances, but you can establish the fundamentals to a degree that most things you would try to achieve would take just a few minutes of brushing up.
For those that haven't taken the time yet, I think this is a good place to start:
Of course, some people do very advanced things in awk and I absolutely agree that 1 hour of study isn't going to make you a ninja, but it's absolutely enough to learn the awk programming paradigm so that when the need arises you can quickly mobilize the solution you need.
For example: If you're quick to the draw, it can take less time to write an awk one liner to calculate the average of a column in a csv than it does to copy the csv into excel and highlight the column. It's a massive productivity booster.
yeah thats exactly right. it may only take an hour to learn, but every time i need to use awk it seems like i have to spend an hour to re-learn its goofy syntax.
alas this is true,
I never correctly recall the order of particular function args as they are
fairly random, still beats the alternative of having to continually internalize entire fragile ecosystems to achieve the same goal.
What? The awk manual is only 827 highly technical pages[1]. If you can't read and interalize that in an hour, I suspect you're a much worse programmer than the OP.
All you need to do is learn that cmd | awk '{ $5 }' will print out the 5th word as delimited by one or more whitespace characters. Regexes support this easily but are cumbersome to write on the command line.
Doing that, maybe with some inline concatenation to make a new structure, and this are about all I use:
Printing based on another field, example gets UIDs >= 1000:
awk -F: '$3 >= 1000 {print $0}' /etc/passwd
It can do plenty of magic, but knowing how to pull fields, concat them together, and select based on them cover like 99% of the things I hope to do with it
It takes a lot less time to learn to be fairly productive with awk than with, say, vi / vim. Over time I've realized that gluing these text manipulation tools together is only an intermediate step toward learning how to write and use them in a manner that is maintainable across several generations of engineers as well as portable across many different environments, and that's still a mostly unsolved problem IMO for not just shell scripts but programming languages in general. For example, the same shell script that does something as seemingly simple as performing a sha256 checksum on macOS won't work on most Linux distributions. So in the end one winds up writing a lot of utilities all over again in yet another language for the sake of portability which ironically hurts maintainability and readability for sure because it's simply more code that can rot.
The only thing I use AWK for is getting at columns from output, (possibly processing or conditionally doing something on each) what would be the next big use-case?
For simple scripting tasks, yes. I have had the opposite experience for more critical software engineering tasks (as in, coding integrated over time and people).
Language aside, the ecosystem and culture do not afford enough in way of testing, dependency management, feature flags, static analysis, legibility, and so on. The reason people say to keep shell programs short is because of these problems, it needs to be possible to rewrite shell programs on a whim. At least then, you can A/B test and deploy at that scope.
awk great for things that will be used over several decades.
(where hardware / OS started with nolonger exists at end of multi-decade project, but data from start to end still has to be used)
What is used both in case/esac and globbing are "shell patterns." They are also found in variable pattern removal with ${X% and ${X#.
In "The Unix Programming Environment," Kernighan and Pike apologized for these close concepts that are easily mistaken for one another.
"Regular expressions are specified by giving special meaning to certain characters, just like the asterix, etc., used by the shell. There are a few more metacharacters, and, regrettably, differences in meanings." (page 102)
Bash does implement both patterns and regex, which means discerning their difference becomes even more critical. The POSIX shell is easier in memory for this reason, and others.
Can you expand on the parallelism features you use and what shell? In bash I've basically given up managing background jobs because identifying and waiting for them properly is super clunky; throttling them is impossible (pool of workers) and so for that kind of thing I've had to use GNU parallel (which is its own abstruse mini-language thing and obviously nothing to do with shell). Ad-hoc but correct parallelism and first class job management was one of the things that got me to switch away from bash.
I've just started installing ipython on pretty much every python environment I set up on personal laptops, but there is repl history even without ipython: https://stackoverflow.com/a/7008316/1170550
Python is a terrible comparison language here. Of course shell is better than Python for shell stuff; no one should suggest otherwise. Python is extremely verbose, it requires you to be precise with whitespace, and using regex has friction because it's not actually built into the language syntax (unless something has changed very recently).
The comparison should be to perl or Ruby, both of which will fare better than Python for typical shell-type tasks.
If I'm interactively composing something I do very much like pipes and shell commands, but if it's a thing I'm going to be running repeatedly then the improved maintainability of a python script, even if it does a lot of subprocess.run, is preferable to me. "Shuffling clunky objects around" seems more documented and organized than "everything is a bytestring".
I often find that downloading lots of data from s3 using `xargs aws sync`, and then xargs on some crunching pipeline, is much faster than a 100 core spark cluster
That's a hardware management question. The optimized binary used in my shell script still runs orders of magnitude faster and cheaper if you orchestrate 100 machines for it than any Hadoop, Spark, Beam, Snowflake, Redshift, Bigquery or what have you.
That's not to say I'd do everything in shell. Most stuff fits well into SQl, but when it comes to optimizing processing over TB or PB scale, you won't beat shell+massive hw orchestration.
I suppose the Python side is a strawman then - who would do that for a small dataset that fits on a machine? Or have I been using shell for too long :-)
PSQL="psql postgresql://$POSTGRES_USER:$POSTGRES_PASSWORD@$DATABASE_HOST:$DATABASE_PORT/$POSTGRES_DB -t -P pager=off -c "
OLD="CURRENT_DATE - INTERVAL '5 years'"
$PSQL "SELECT id from apt WHERE apt.created_on > $OLD order by apt.created_on asc;" | while
read -r id; do
if [[ $id != "" ]]; then
printf "\n\*\* Do something in the loop with id where newer than \"$OLD\" \*\*\*\n"
# ...
fi
done
WASM Gawk wrapper script in a web browser with relevant information about schema grammar / template file would allow for alternate display formats beyond cli text (aka html, latex, "database report output", cvs, etc )
What Python is to Java, Shell is to Python. It speeds you up several times. I started using inline 'python -c' more often than the python repl now as it stores the command in shell history and it is then one fzf search away.
While neither Shell or SQL are perfect, there have been many ideas to improve them and for sure people can't wait for something new like oil shell to get production ready, getting the shell quoting hell right, or somebody fixing up SQL, bringing old ideas from Datalog and QUEL into it, fixing the goddamn NULL joins, etc.
But honestly, nothing else even comes close to this 10x productivity increase over the next best alternative. No, Thank you, I will not rewrite my 10 lines of sh into python to explode it into 50 lines of shuffling clunky objects around. I'll instead go and reread that man page how to write an if expression in bash again.