Shell and SQL make you 10x productive over any alternative. Nothing even comes c...

hnlmorg · on Oct 27, 2022

> getting the shell quoting hell right

Shameless plug coming, it this has been a pain point for me too. I found the issue with quotes (in most languages, but particularly in Bash et al) is that the same character is used to close the quote as is used to open it.m. So in my own shell I added support to use parentheses as quotes in addition to the single and double quotation ASCII symbols. This then allows you to nest quotation marks.

https://murex.rocks/docs/parser/brace-quote.html

You also don’t need to worry about quoting variables as variables are expanded to an argv[] item rather than expanded out to a command line and then any spaces converted into new argv[]s (or in layman’s terms, variables behave like you’d expect variables to behave).

https://github.com/lmorg/murex

rcthompson · on Oct 27, 2022

One of my favorite Perl features that has been disappointingly under-appropriated by other languages is quoting with q(...).

pdntspa · on Oct 27, 2022

This is one of my favorite features of Ruby!

Though Ruby makes it confusing AF because there are two quoting types for both strings and symbols, and they're different. (%Q %q %W %w %i) I can never remember which does which.... the letter choice feels really arbitrary.

pmarreck · on Oct 28, 2022

Elixir has something like this too, but even more powerful (you can define your own):

https://elixir-lang.org/getting-started/sigils.html#strings-...

pmarreck · on Oct 28, 2022

Ruby and Elixir both have features like this. Very sweet.

Elixir has sigils, which are useful for defining all kinds of literals easier, not just strings:

https://elixir-lang.org/getting-started/sigils.html#strings-...

You can also define your own. It's pretty great.

6keZbCECT2uB · on Oct 27, 2022

This means that you can even quote the delimiter in the string as long as it's balanced.

    $X=q( foo() )

Should work if it's balanced. If you choose a different pair like []{} then you can avoid hitting collisions. It also means that you can trivially nest quotations.

I agree that this qualified quotation is really underutilized.

tb_technical · on Oct 27, 2022

Off topic. What's your opinion on Python?

I also write shell scripts, but I'm just curious what you would think about a comparison.

hnlmorg · on Oct 28, 2022

I’m not fan of Python, however that’s down to personal preference rather than objective fact. If Python solves a problem for other people then who am into judge :)

j0hnyl · on Oct 27, 2022

I noticed that I became so much more quick after taking 1 hour to properly learn awk. Yes, it literally takes about 1 hour.

ketanmaheshwari · on Oct 27, 2022

Awk is awesome but saying it literally takes 1 hour to properly learn it is a bit overselling.

j0hnyl · on Oct 27, 2022

I really don't think so! If you have experience with any scripting, you can fully grok the fundamentals of awk in 1 hour. You might not memorize all the nuances, but you can establish the fundamentals to a degree that most things you would try to achieve would take just a few minutes of brushing up.

For those that haven't taken the time yet, I think this is a good place to start:

https://learnxinyminutes.com/docs/awk/

Of course, some people do very advanced things in awk and I absolutely agree that 1 hour of study isn't going to make you a ninja, but it's absolutely enough to learn the awk programming paradigm so that when the need arises you can quickly mobilize the solution you need.

For example: If you're quick to the draw, it can take less time to write an awk one liner to calculate the average of a column in a csv than it does to copy the csv into excel and highlight the column. It's a massive productivity booster.

chasil · on Oct 27, 2022

Brian Kernighan covers the entire [new] awk language in 40 pages - chapter 2.

There are people who have asked me scripting questions for over a decade, who will not read this for some reason.

It could be read in an hour, but not fully retained.

https://archive.org/download/pdfy-MgN0H1joIoDVoIC7/

2devnull · on Oct 27, 2022

I feel like I do this every three years then proceed to never use it. Then I read a post on hn and think about how great it could be; rinse and repeat

boredtofears · on Oct 27, 2022

yeah thats exactly right. it may only take an hour to learn, but every time i need to use awk it seems like i have to spend an hour to re-learn its goofy syntax.

tejtm · on Oct 27, 2022

alas this is true, I never correctly recall the order of particular function args as they are fairly random, still beats the alternative of having to continually internalize entire fragile ecosystems to achieve the same goal.

boredtofears · on Oct 27, 2022

yeah you're definitely right. im sure if it was something i had to use more consistently i'd be able to commit it to memory. maybe...

mmh0000 · on Oct 27, 2022

What? The awk manual is only 827 highly technical pages[1]. If you can't read and interalize that in an hour, I suspect you're a much worse programmer than the OP.

[1] https://www.gnu.org/software/gawk/manual/gawk.html

For the sarcasm impared among us, everything above this, but possibly including this sentence is sarcasm.

sargstuff · on Oct 28, 2022

Think the more relevant script equivalent of 'Everything in this statement is false.' is 'All output must return false to have true side effects.'

The quick one ~ true ~ fix was ! or #! without the 1024k copyright.

s-expression notation avoids the issue with (."contents")

MS Windows interpretation is much more terse & colorful.

metadat · on Oct 27, 2022

Awk is an amazingly effective tool for getting things done quickly.

Submitted yesterday:

Learn to use Awk with hundreds of examples

https://github.com/learnbyexample/Command-line-text-processi...

https://news.ycombinator.com/item?id=33349930

foobarian · on Oct 27, 2022

All you need to do is learn that cmd | awk '{ $5 }' will print out the 5th word as delimited by one or more whitespace characters. Regexes support this easily but are cumbersome to write on the command line.

bravetraveler · on Oct 27, 2022

Doing that, maybe with some inline concatenation to make a new structure, and this are about all I use:

Printing based on another field, example gets UIDs >= 1000:

    awk -F: '$3 >= 1000 {print $0}' /etc/passwd

It can do plenty of magic, but knowing how to pull fields, concat them together, and select based on them cover like 99% of the things I hope to do with it

every · on Oct 27, 2022

And don't forget the invisible $0 field in awk...

fluential · on Oct 30, 2022

And $NF

devonkim · on Oct 27, 2022

It takes a lot less time to learn to be fairly productive with awk than with, say, vi / vim. Over time I've realized that gluing these text manipulation tools together is only an intermediate step toward learning how to write and use them in a manner that is maintainable across several generations of engineers as well as portable across many different environments, and that's still a mostly unsolved problem IMO for not just shell scripts but programming languages in general. For example, the same shell script that does something as seemingly simple as performing a sha256 checksum on macOS won't work on most Linux distributions. So in the end one winds up writing a lot of utilities all over again in yet another language for the sake of portability which ironically hurts maintainability and readability for sure because it's simply more code that can rot.

qwertywert_ · on Oct 27, 2022

The only thing I use AWK for is getting at columns from output, (possibly processing or conditionally doing something on each) what would be the next big use-case?

5e92cb50239222b · on Oct 27, 2022

I use it frequently to calculate some basic statistics on log file data.

Here's a nice example of something similar: https://drewdevault.com/dynlib

sargstuff · on Oct 28, 2022

awk autonoma theory and oop the results. add unicode for extra tics!

Scripted cholmskey grammar ( https://en.wikipedia.org/wiki/Universal_grammar ) to unleash the power of regular expressions.

yls · on Oct 28, 2022

I have used it to extract a table to restore from a MySQL database dump.

humanrebar · on Oct 27, 2022

For simple scripting tasks, yes. I have had the opposite experience for more critical software engineering tasks (as in, coding integrated over time and people).

Language aside, the ecosystem and culture do not afford enough in way of testing, dependency management, feature flags, static analysis, legibility, and so on. The reason people say to keep shell programs short is because of these problems, it needs to be possible to rewrite shell programs on a whim. At least then, you can A/B test and deploy at that scope.

sargstuff · on Oct 28, 2022

awk great for things that will be used over several decades. (where hardware / OS started with nolonger exists at end of multi-decade project, but data from start to end still has to be used)

jb3689 · on Oct 27, 2022

I feel like the reasons for this are:

* Shell scripts force you to think in a more scalable way (data streams)

* Shell scripts compose rich programs rather than simplistic functions

* Shells encourage you to program with a rich, extensible feature set (ad-hoc I/O redirection, files)

The only times I don’t like shell scripts are when dealing with regex and dealing with parallelism

chasil · on Oct 27, 2022

The POSIX shell does not implement regex.

What is used both in case/esac and globbing are "shell patterns." They are also found in variable pattern removal with ${X% and ${X#.

In "The Unix Programming Environment," Kernighan and Pike apologized for these close concepts that are easily mistaken for one another.

"Regular expressions are specified by giving special meaning to certain characters, just like the asterix, etc., used by the shell. There are a few more metacharacters, and, regrettably, differences in meanings." (page 102)

Bash does implement both patterns and regex, which means discerning their difference becomes even more critical. The POSIX shell is easier in memory for this reason, and others.

http://files.catwell.info/misc/mirror/

gpderetta · on Oct 28, 2022

> The only times I don’t like shell scripts are when dealing with regex and dealing with parallelism

Wow, for me parallelism is one of the best features of a unix shell and I find it vastly superior to most other programming languages.

partdavid · on Oct 28, 2022

Can you expand on the parallelism features you use and what shell? In bash I've basically given up managing background jobs because identifying and waiting for them properly is super clunky; throttling them is impossible (pool of workers) and so for that kind of thing I've had to use GNU parallel (which is its own abstruse mini-language thing and obviously nothing to do with shell). Ad-hoc but correct parallelism and first class job management was one of the things that got me to switch away from bash.

fluential · on Oct 30, 2022

GNU parallel

fomine3 · on Oct 28, 2022

It's great for embarrassingly parallel data processing, but not good for concurrent/async task.

mogrim · on Oct 27, 2022

I'd add a working knowledge of regex to that. With a decent text editor + some fairly basic regex skills you can go a long way.

chungy · on Oct 27, 2022

> I started using inline 'python -c' more often than the python repl now as it stores the command in shell history and it is then one fzf search away.

Do you not have a ~/.python_history? The exact same search functions are available on the REPL. Ctrl-R, type your bit, bam.

snidane · on Oct 27, 2022

Exact same - can I use fzf gistory search using Ctrl+R like I can in shell?

vonseel · on Oct 28, 2022

I've just started installing ipython on pretty much every python environment I set up on personal laptops, but there is repl history even without ipython: https://stackoverflow.com/a/7008316/1170550

sjellis · on Oct 30, 2022

I expect nushell to massively change how I work:

https://www.nushell.sh/

It's a shell that is actually built for structured data, taking lessons learned from PowerShell and others.

prosaole · on Nov 1, 2022

> getting the shell quoting hell right

Running `parallel --shellquote --shellquote --shellquote` and pasting in the line you want to quote thrice may alleviate some of the pain.

By no means ideal, though.

KerrAvon · on Oct 27, 2022

Python is a terrible comparison language here. Of course shell is better than Python for shell stuff; no one should suggest otherwise. Python is extremely verbose, it requires you to be precise with whitespace, and using regex has friction because it's not actually built into the language syntax (unless something has changed very recently).

The comparison should be to perl or Ruby, both of which will fare better than Python for typical shell-type tasks.

intrepidhero · on Oct 27, 2022

If I'm interactively composing something I do very much like pipes and shell commands, but if it's a thing I'm going to be running repeatedly then the improved maintainability of a python script, even if it does a lot of subprocess.run, is preferable to me. "Shuffling clunky objects around" seems more documented and organized than "everything is a bytestring".

But different strokes and all that.

foobarian · on Oct 27, 2022

> while somebody comfortable is shell writes a parallelized one liner, rips through GBs of data, and delivers the answer in 15 minutes.

This also works up to a point where those GBs turn into hundreds of GBs, or even PBs, and a proper distributed setup can return results in seconds.

henrydark · on Oct 27, 2022

I often find that downloading lots of data from s3 using `xargs aws sync`, and then xargs on some crunching pipeline, is much faster than a 100 core spark cluster

snidane · on Oct 27, 2022

That's a hardware management question. The optimized binary used in my shell script still runs orders of magnitude faster and cheaper if you orchestrate 100 machines for it than any Hadoop, Spark, Beam, Snowflake, Redshift, Bigquery or what have you.

That's not to say I'd do everything in shell. Most stuff fits well into SQl, but when it comes to optimizing processing over TB or PB scale, you won't beat shell+massive hw orchestration.

ekianjo · on Oct 27, 2022

usually you use specific frameworks for that, not pure Python.

foobarian · on Oct 27, 2022

I suppose the Python side is a strawman then - who would do that for a small dataset that fits on a machine? Or have I been using shell for too long :-)

ekianjo · on Oct 30, 2022

I thought the above comment was about datasets that do not fit on ones machine?

pmarreck · on Oct 28, 2022

As far as control-R command history searching, really enjoying McFly https://github.com/cantino/mcfly

irrational · on Oct 27, 2022

> while somebody comfortable is shell writes a parallelized one liner

Do you have an example of this? I didn’t even know you could make sql calls in scripts.

meken · on Oct 27, 2022

I don’t have an example, but this article comes to mind and you may be able to find an example in it

https://adamdrake.com/command-line-tools-can-be-235x-faster-...

crucialfelix · on Oct 27, 2022

  PSQL="psql postgresql://$POSTGRES_USER:$POSTGRES_PASSWORD@$DATABASE_HOST:$DATABASE_PORT/$POSTGRES_DB -t -P pager=off -c "
  
  OLD="CURRENT_DATE - INTERVAL '5 years'"

  $PSQL "SELECT id from apt WHERE apt.created_on > $OLD order by apt.created_on asc;" | while 
  read -r id; do
    if [[ $id != "" ]]; then
      printf "\n\*\* Do something in the loop with id where newer than \"$OLD\" \*\*\*\n"
      # ...
    fi
  done

justanotherbody · on Oct 27, 2022

mysql, psql etc. let you issue sql from the command line

I don't do much sql in bash scripts but I do keep some wrapper scripts that let me run queries from stdin to databases in my environment

sargstuff · on Oct 28, 2022

WASM Gawk wrapper script in a web browser with relevant information about schema grammar / template file would allow for alternate display formats beyond cli text (aka html, latex, "database report output", cvs, etc )