This rant misplaces its frustration. This is not a problem with unix filesystems...

rwmj · on Sept 6, 2012

No, even "satisfactory" languages can suffer problems. For example a couple of days ago I discovered a nice exploit in the qemu-img program, and using any language to parse the output wouldn't help you:

http://www.mail-archive.com/qemu-devel@nongnu.org/msg128802....

nnnnnnnn · on Sept 6, 2012

No, that is not at all accurate.

Your link above shows an author who claims JSON output, yet the output is clearly non-validating JSON (toplevel is not a [] or {}, improper quoting, etc). It appears that instead of using JSON serialization, the author merely printed key/value pairs separated by the string ": ". The problems with this approach are obvious.

This is why using a proper serialization format is important.

If the author had done this correctly and used a proper JSON library to produce this output, the following, completely safe result would have occurred:

{ "cluster_size":65536, "disk size":"136K", "file format":"qcow2", "image":"/tmp/foo\ncluster_size: bar", "virtual size":"10M (10485760 bytes)" }

The author probably would have been best served by YAML, which is more easily readable -- and which, like json, provides mechanisms to properly represent arbitrary data.

In any event, the discussion is severely confused. Ad-hoc buggy formats cannot be compared with well-formed JSON or YAML. This has nothing at all to do with the language.

rwmj · on Sept 6, 2012

You should probably read the link more closely. I'm advocating using JSON so that programs are able to safely parse the output of 'qemu-img'. At the moment there are many programs that parse the (current text) output, and they almost all have security holes as a result.

nnnnnnnn · on Sept 6, 2012

In any event it does not support your assertion that other languages suffer similar problems.

rwmj · on Sept 6, 2012

Yes it does - qemu-img is written in C. The two programs we found exploitable were written in Python and C. They are written in "satisfactory" languages. Bash is not involved. Yet both suffer exploits because of \n (and other) characters in filenames.

nnnnnnnn · on Sept 6, 2012

The issue you refer to is in a poorly formed, ad-hoc serialization format. It has nothing to do with representation of variables at runtime. It has nothing to do with the language.

It is a programming error, not an inherent flaw in the language.

fffggg · on Sept 7, 2012

That's incorrect. As was already pointed out this issue has nothing to do with reading data from the filesystem or manipulating variables internal to the program and everything to do with poor choices made when using printf.

In other words, those files aren't causing the QEMU program internals to re-interpolate one variable as two values. They're merely messing up a poorly written data exchange format.

Other languages such as I listed above simply do not have the same issue. The C program did not mis-interpret a variable as two separate values because it contained spaces. That is the nature of the danger with shell -- any reference to a variable in Bourne involves string interpolation and tokenization. This simply does not happen in C.

sturadnidge · on Sept 6, 2012

>The correct solution is to not program complicated scripts in Bourne Shell, and instead use a language which does not implement variable access by interpolating strings and then re-tokenizing and re-evaluating them.

Very well said.

dalke · on Sept 6, 2012

The author also mentions a problem with the filesystem: what is the filesystem encoding? Do you treat filenames as blobs, or encoded strings? What do you do if you think the filesystem stores UTF-8 but there's a filename which has a byte sequence which is invalid UTF-8?

fffggg · on Sept 7, 2012

This question is equally applicable to all data sources -- not just filesystems. The question applies to sockets, IPC, databases, everything.

dalke · on Sept 8, 2012

I was thinking about this some more. You are right, but both you and I missed the point.

I think the author is saying that the problem is the "unix filesystem" is actually filesystem that doesn't match the unix, where 'unix' includes sh/csh/bash shell and command-line arguments which start with '-'.

If the filesystem wasn't a broad in what it accepted ... and the author is trying to convince us that POSIX allows that ... then it would a unix filesystem which was a better match to unix.

haberman · on Sept 6, 2012

Yes, Bourne Shell's variable access scheme is a bit ghetto, but to me the problem is that the shell is doing globbing at all. Why not have the shell pass "*" through to the program, and have the program itself perform globbing? Then filenames would have no impact on how the command-line is parsed.

rwmj · on Sept 6, 2012

Because that's how MS-DOS used to work, and it was dumb. It means every program has to do globbing (or often, didn't do globbing). In any case, bash does get this right: ls * will pass the correct filenames to the ls program no matter what the filenames contain. Also quotes around variable expansions can cope with any characters.

haberman · on Sept 6, 2012

> It means every program has to do globbing

So what? If the primary API used by command-line applications to open files does the globbing, then programs will have to go out of their way to not glob. And you'll get the added benefit that globs will only be applied to arguments that are actually meant to specify filenames. There would be none of this escaping "*" when you pass it to "find."

> In any case, bash does get this right: ls will pass the correct filenames to the ls program no matter what the filenames contain.

That doesn't solve the problem; your filename could be called "--help."

prakashk · on Sept 6, 2012

> your filename could be called "--help."

bash isn't interpreting '--help' at all, it is just passed on to the program being executed, and most GNU CLI programs conventionally interpret '--help' as a special option.

If your filename is indeed --help, the convention is to use '--' as the separator between your command line options and filenames. Anything after -- is not interpreted as a command-line option.

Another way would be to use a more qualified filename form ('./--help')

nnnnnnnn · on Sept 6, 2012

Because, as the author points out, different users may want different globbing behavior. Globbing is not performed identically between shells.

If the author so wished, he might trivially create his own shell and allow * to match dotfiles, with absolutely no disruption to the rest of his system. Or one could write a shell which uses a regex instead of a glob. Or the SQL LIKE query syntax. The possibilities are endless. Anyone is free to do this.

The fact is, the current globbing behavior in unix shells strikes a good balance between pedantic correctness and "what I really want." The author's frustration is due to his attempting to use a command line interface as a structured programming language.

MagerValp · on Sept 6, 2012

Incidentally, this is how AmigaDOS handled it and applications were expected to call ReadArgs() to deal with it:

  http://tbs-software.com/guide/index.php?guide=autodocs.doc%2Fdos.doc&node=105

mikeash · on Sept 6, 2012

A better, and universal, solution to the problem of filenames starting with - is to prefix all relative paths with ./. A path like ./-blah will never be misinterpreted as a command line option, regardless of the tool, and doesn't depend on the -- convention which is only inconsistently present.

ralph · on Sept 6, 2012

To withdraw shell programming from Unix takes away one of its strengths. Not everything warrants Python, etc.