Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Shell and SQL make you 10x productive over any alternative. Nothing even comes close. I've seen people scrambling for 1 hours to write some data munging, then spend another 1 hour to run it through a thread pool to utilize those cores , while somebody comfortable is shell writes a parallelized one liner, rips through GBs of data, and delivers the answer in 15 minutes.

What Python is to Java, Shell is to Python. It speeds you up several times. I started using inline 'python -c' more often than the python repl now as it stores the command in shell history and it is then one fzf search away.

While neither Shell or SQL are perfect, there have been many ideas to improve them and for sure people can't wait for something new like oil shell to get production ready, getting the shell quoting hell right, or somebody fixing up SQL, bringing old ideas from Datalog and QUEL into it, fixing the goddamn NULL joins, etc.

But honestly, nothing else even comes close to this 10x productivity increase over the next best alternative. No, Thank you, I will not rewrite my 10 lines of sh into python to explode it into 50 lines of shuffling clunky objects around. I'll instead go and reread that man page how to write an if expression in bash again.



> getting the shell quoting hell right

Shameless plug coming, it this has been a pain point for me too. I found the issue with quotes (in most languages, but particularly in Bash et al) is that the same character is used to close the quote as is used to open it.m. So in my own shell I added support to use parentheses as quotes in addition to the single and double quotation ASCII symbols. This then allows you to nest quotation marks.

https://murex.rocks/docs/parser/brace-quote.html

You also don’t need to worry about quoting variables as variables are expanded to an argv[] item rather than expanded out to a command line and then any spaces converted into new argv[]s (or in layman’s terms, variables behave like you’d expect variables to behave).

https://github.com/lmorg/murex


One of my favorite Perl features that has been disappointingly under-appropriated by other languages is quoting with q(...).


This is one of my favorite features of Ruby!

Though Ruby makes it confusing AF because there are two quoting types for both strings and symbols, and they're different. (%Q %q %W %w %i) I can never remember which does which.... the letter choice feels really arbitrary.


Elixir has something like this too, but even more powerful (you can define your own):

https://elixir-lang.org/getting-started/sigils.html#strings-...


Ruby and Elixir both have features like this. Very sweet.

Elixir has sigils, which are useful for defining all kinds of literals easier, not just strings:

https://elixir-lang.org/getting-started/sigils.html#strings-...

You can also define your own. It's pretty great.


This means that you can even quote the delimiter in the string as long as it's balanced.

    $X=q( foo() )
Should work if it's balanced. If you choose a different pair like []{} then you can avoid hitting collisions. It also means that you can trivially nest quotations.

I agree that this qualified quotation is really underutilized.


Off topic. What's your opinion on Python?

I also write shell scripts, but I'm just curious what you would think about a comparison.


I’m not fan of Python, however that’s down to personal preference rather than objective fact. If Python solves a problem for other people then who am into judge :)


I noticed that I became so much more quick after taking 1 hour to properly learn awk. Yes, it literally takes about 1 hour.


Awk is awesome but saying it literally takes 1 hour to properly learn it is a bit overselling.


I really don't think so! If you have experience with any scripting, you can fully grok the fundamentals of awk in 1 hour. You might not memorize all the nuances, but you can establish the fundamentals to a degree that most things you would try to achieve would take just a few minutes of brushing up.

For those that haven't taken the time yet, I think this is a good place to start:

https://learnxinyminutes.com/docs/awk/

Of course, some people do very advanced things in awk and I absolutely agree that 1 hour of study isn't going to make you a ninja, but it's absolutely enough to learn the awk programming paradigm so that when the need arises you can quickly mobilize the solution you need.

For example: If you're quick to the draw, it can take less time to write an awk one liner to calculate the average of a column in a csv than it does to copy the csv into excel and highlight the column. It's a massive productivity booster.


Brian Kernighan covers the entire [new] awk language in 40 pages - chapter 2.

There are people who have asked me scripting questions for over a decade, who will not read this for some reason.

It could be read in an hour, but not fully retained.

https://archive.org/download/pdfy-MgN0H1joIoDVoIC7/


I feel like I do this every three years then proceed to never use it. Then I read a post on hn and think about how great it could be; rinse and repeat


yeah thats exactly right. it may only take an hour to learn, but every time i need to use awk it seems like i have to spend an hour to re-learn its goofy syntax.


alas this is true, I never correctly recall the order of particular function args as they are fairly random, still beats the alternative of having to continually internalize entire fragile ecosystems to achieve the same goal.


yeah you're definitely right. im sure if it was something i had to use more consistently i'd be able to commit it to memory. maybe...


What? The awk manual is only 827 highly technical pages[1]. If you can't read and interalize that in an hour, I suspect you're a much worse programmer than the OP.

[1] https://www.gnu.org/software/gawk/manual/gawk.html

For the sarcasm impared among us, everything above this, but possibly including this sentence is sarcasm.


Think the more relevant script equivalent of 'Everything in this statement is false.' is 'All output must return false to have true side effects.'

The quick one ~ true ~ fix was ! or #! without the 1024k copyright.

s-expression notation avoids the issue with (."contents")

MS Windows interpretation is much more terse & colorful.


Awk is an amazingly effective tool for getting things done quickly.

Submitted yesterday:

Learn to use Awk with hundreds of examples

https://github.com/learnbyexample/Command-line-text-processi...

https://news.ycombinator.com/item?id=33349930


All you need to do is learn that cmd | awk '{ $5 }' will print out the 5th word as delimited by one or more whitespace characters. Regexes support this easily but are cumbersome to write on the command line.


Doing that, maybe with some inline concatenation to make a new structure, and this are about all I use:

Printing based on another field, example gets UIDs >= 1000:

    awk -F: '$3 >= 1000 {print $0}' /etc/passwd
It can do plenty of magic, but knowing how to pull fields, concat them together, and select based on them cover like 99% of the things I hope to do with it


And don't forget the invisible $0 field in awk...


And $NF


It takes a lot less time to learn to be fairly productive with awk than with, say, vi / vim. Over time I've realized that gluing these text manipulation tools together is only an intermediate step toward learning how to write and use them in a manner that is maintainable across several generations of engineers as well as portable across many different environments, and that's still a mostly unsolved problem IMO for not just shell scripts but programming languages in general. For example, the same shell script that does something as seemingly simple as performing a sha256 checksum on macOS won't work on most Linux distributions. So in the end one winds up writing a lot of utilities all over again in yet another language for the sake of portability which ironically hurts maintainability and readability for sure because it's simply more code that can rot.


The only thing I use AWK for is getting at columns from output, (possibly processing or conditionally doing something on each) what would be the next big use-case?


I use it frequently to calculate some basic statistics on log file data.

Here's a nice example of something similar: https://drewdevault.com/dynlib


awk autonoma theory and oop the results. add unicode for extra tics!

Scripted cholmskey grammar ( https://en.wikipedia.org/wiki/Universal_grammar ) to unleash the power of regular expressions.


I have used it to extract a table to restore from a MySQL database dump.


For simple scripting tasks, yes. I have had the opposite experience for more critical software engineering tasks (as in, coding integrated over time and people).

Language aside, the ecosystem and culture do not afford enough in way of testing, dependency management, feature flags, static analysis, legibility, and so on. The reason people say to keep shell programs short is because of these problems, it needs to be possible to rewrite shell programs on a whim. At least then, you can A/B test and deploy at that scope.


awk great for things that will be used over several decades. (where hardware / OS started with nolonger exists at end of multi-decade project, but data from start to end still has to be used)


I feel like the reasons for this are:

* Shell scripts force you to think in a more scalable way (data streams)

* Shell scripts compose rich programs rather than simplistic functions

* Shells encourage you to program with a rich, extensible feature set (ad-hoc I/O redirection, files)

The only times I don’t like shell scripts are when dealing with regex and dealing with parallelism


The POSIX shell does not implement regex.

What is used both in case/esac and globbing are "shell patterns." They are also found in variable pattern removal with ${X% and ${X#.

In "The Unix Programming Environment," Kernighan and Pike apologized for these close concepts that are easily mistaken for one another.

"Regular expressions are specified by giving special meaning to certain characters, just like the asterix, etc., used by the shell. There are a few more metacharacters, and, regrettably, differences in meanings." (page 102)

Bash does implement both patterns and regex, which means discerning their difference becomes even more critical. The POSIX shell is easier in memory for this reason, and others.

http://files.catwell.info/misc/mirror/


> The only times I don’t like shell scripts are when dealing with regex and dealing with parallelism

Wow, for me parallelism is one of the best features of a unix shell and I find it vastly superior to most other programming languages.


Can you expand on the parallelism features you use and what shell? In bash I've basically given up managing background jobs because identifying and waiting for them properly is super clunky; throttling them is impossible (pool of workers) and so for that kind of thing I've had to use GNU parallel (which is its own abstruse mini-language thing and obviously nothing to do with shell). Ad-hoc but correct parallelism and first class job management was one of the things that got me to switch away from bash.


GNU parallel


It's great for embarrassingly parallel data processing, but not good for concurrent/async task.


I'd add a working knowledge of regex to that. With a decent text editor + some fairly basic regex skills you can go a long way.


> I started using inline 'python -c' more often than the python repl now as it stores the command in shell history and it is then one fzf search away.

Do you not have a ~/.python_history? The exact same search functions are available on the REPL. Ctrl-R, type your bit, bam.


Exact same - can I use fzf gistory search using Ctrl+R like I can in shell?


I've just started installing ipython on pretty much every python environment I set up on personal laptops, but there is repl history even without ipython: https://stackoverflow.com/a/7008316/1170550


I expect nushell to massively change how I work:

https://www.nushell.sh/

It's a shell that is actually built for structured data, taking lessons learned from PowerShell and others.


> getting the shell quoting hell right

Running `parallel --shellquote --shellquote --shellquote` and pasting in the line you want to quote thrice may alleviate some of the pain.

By no means ideal, though.


Python is a terrible comparison language here. Of course shell is better than Python for shell stuff; no one should suggest otherwise. Python is extremely verbose, it requires you to be precise with whitespace, and using regex has friction because it's not actually built into the language syntax (unless something has changed very recently).

The comparison should be to perl or Ruby, both of which will fare better than Python for typical shell-type tasks.


If I'm interactively composing something I do very much like pipes and shell commands, but if it's a thing I'm going to be running repeatedly then the improved maintainability of a python script, even if it does a lot of subprocess.run, is preferable to me. "Shuffling clunky objects around" seems more documented and organized than "everything is a bytestring".

But different strokes and all that.


> while somebody comfortable is shell writes a parallelized one liner, rips through GBs of data, and delivers the answer in 15 minutes.

This also works up to a point where those GBs turn into hundreds of GBs, or even PBs, and a proper distributed setup can return results in seconds.


I often find that downloading lots of data from s3 using `xargs aws sync`, and then xargs on some crunching pipeline, is much faster than a 100 core spark cluster


That's a hardware management question. The optimized binary used in my shell script still runs orders of magnitude faster and cheaper if you orchestrate 100 machines for it than any Hadoop, Spark, Beam, Snowflake, Redshift, Bigquery or what have you.

That's not to say I'd do everything in shell. Most stuff fits well into SQl, but when it comes to optimizing processing over TB or PB scale, you won't beat shell+massive hw orchestration.


usually you use specific frameworks for that, not pure Python.


I suppose the Python side is a strawman then - who would do that for a small dataset that fits on a machine? Or have I been using shell for too long :-)


I thought the above comment was about datasets that do not fit on ones machine?


As far as control-R command history searching, really enjoying McFly https://github.com/cantino/mcfly


> while somebody comfortable is shell writes a parallelized one liner

Do you have an example of this? I didn’t even know you could make sql calls in scripts.


I don’t have an example, but this article comes to mind and you may be able to find an example in it

https://adamdrake.com/command-line-tools-can-be-235x-faster-...


  PSQL="psql postgresql://$POSTGRES_USER:$POSTGRES_PASSWORD@$DATABASE_HOST:$DATABASE_PORT/$POSTGRES_DB -t -P pager=off -c "
  
  OLD="CURRENT_DATE - INTERVAL '5 years'"

  $PSQL "SELECT id from apt WHERE apt.created_on > $OLD order by apt.created_on asc;" | while 
  read -r id; do
    if [[ $id != "" ]]; then
      printf "\n\*\* Do something in the loop with id where newer than \"$OLD\" \*\*\*\n"
      # ...
    fi
  done


mysql, psql etc. let you issue sql from the command line

I don't do much sql in bash scripts but I do keep some wrapper scripts that let me run queries from stdin to databases in my environment


WASM Gawk wrapper script in a web browser with relevant information about schema grammar / template file would allow for alternate display formats beyond cli text (aka html, latex, "database report output", cvs, etc )




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: