Mmmm, but there's more than one way to skin a cat. To take just the point about safe and unsafe strings - well, you could do that with naming conventions, sure. But then you've got a peer review issue. It helps, but it's not great. So the next thing (in a language which lets you use types in this way) is to create a type of UnsafeString, or SafeString etc. And your language doesn't even have to be that great to let you do this - just having a simple type system. And now you change your write method so it doesn't take a String anymore, it takes a SafeString, and only a SafeString. And you make it so that constructing a SafeString can only be done by Encoding at some point.
Now you have something which can still look clean (how clean may depend on your language, type inference, etc.) without bringing in messy pseudo-Hungarian-ness, which stops me doing the most important thing with my code - reading it easily.
Obviously it's only a silly and trivial example, but I'm not sure this article has dated that well in some regards (or perhaps only really applies to certain types of languages - Visual Basic, for example, pretty much forces you to do something like Hungarian to maintain any semblance of sanity long term).
The downside of that approach though is that you end up wrapping the bulk of the API functions you're using to make them conformant. You've got a huge amount of extra work creating and maintaining these components, and a major training workload for any new hires so they know why they can't use the standard options they're used to and what the internal replacements are.
Personally, I'd rather go Hungarian.
Edit - well, yes, you may well be able to do this sort of thing nice and easily in Haskell, but I'm not sure that going with Haskell to avoid the training and maintenance overhead of complicating your Java (or whatever) is a net reduction in workload and hiring difficulty...
It depends; if all you want to do is lift normal functions to the domain of unsafe operation (where an unsafe input implies unsafe output), then sure:
newtype Unsafe a = Unsafe a
instance Functor Unsafe where
fmap f (Unsafe k) = Unsafe . f $ k
addTwo :: Int -> Int
addTwo = (+2)
unsafeAddTwo :: Unsafe Int -> Unsafe Int
unsafeAddTwo = fmap addTwo
But really, I'm not sure this is the right approach. Even values generated inside my program need to be quoted for inclusion on an HTML page. What you want to avoid is double-quoting, so what you need is simpler:
data Content = Quoted String | Unquoted String
output :: [Content] -> Content
output = concatMap f
where f (Unquoted x) = quote x
f (Quoted x) = x
Now the type system ensures that (output . output) == output, which is what you really want to ensure. Tainted data, I think, is a separate concern. And the solution, in that case, doesn't involve a functor, it involves making sure your library tags everything as Unsafe and that your data validation functions remove that annotation:
type Params a = Map String (Unsafe a) -- keys may also be unsafe, YMMV
readHtmlForm :: Request -> Params String
validateField :: Validatable a => Unsafe a -> a
main = output . Unquoted . validateField . get "foo" . readHtmlForm <$> fakeHttpRequest
This comment illustrates the problem with comments about problems with Haskell. It doesn't quite understand what it's complaining about.
Functor vs. Monad in this case is a way of talking about how exactly this construct should work --- it's not a meaningless distinction. As another response to my comment mentioned, both a functor or a monad are applicable. The question is whether two instances of this data type (I'm using non-Haskell-y terms for clarity) have influence on each other (in rough terms). So it's not that I was complaining that the parent was wrong. I was complaining that the semantics he was imposing on his quoted strings were too restrictive. Another poster instead mentioned ways in which mine were too loose. So my post was part of a constructive debate on what exactly we want quoted strings to do. That Haskell provides a vocabulary for communicating precisely and tersely is not it's fault, it is one of its strengths.
As siblings point out, how much work you have to do depends on your language and the available facilities. In Java, this might be a huge pain - Haskell, etc. less so. It's more likely that this is done as part of the framework or libraries generally - how many people are, to continue this example, writing their own web framework? In .NET for example, ASP.NET MVC includes an HtmlString class for similar purposes. Other languages can approach it in different ways. You may find as well that even with something like .NET, not the most expressive of type systems, you could still make life fairly easy by providing appropriate implicit type conversions in one direction.
In most languages that support this sort of things, can't you use SafeString in places where String is expected, as long as you declare SafeString as a sort of String?
I wouldn't want SafeString is-a String, I would want a conversion from SafeString to String that removes whatever encoding or escaping was applied. Otherwise you end up re-encoding values that unbeknownst to you don't need it, which is why there are thousands of terrible PHP bulletin boards out there which won't let you use quotes without mangling them.
I'd just like to mention that perl, which is often underrated, has a "tainted" mode that checks at compile time that you sanitized all inputs, and refuse to run if it isn't the case:
I used to think that was pretty cool, until I realized it was probably developed well after Haskell. :) It can be of help if you're passing a lot of input to system() etc; it's not really general enough to help with web programming.
I gave up on perl's taint mode when I discovered this bug http://bugs.debian.org/411786 , in which perl randomly sets the taint flag due to a utf8 bug.
Anyway, I'm not sure why it would be less cool even if it were inspired by another language (which I'm sure it was; Perl, like English, elevates borrowing to an art form).
> a "tainted" mode that checks at compile time that you sanitized all inputs
Considering how dynamic perl is, and that you can mix tainted and non-tainted values in a single collection (for instance), I don't see how a perl program could be statically analyzed for taintedness misuses.
The program dies at runtime if the runtime detects a misuse of tainted data.
The reality is that nobody uses taint mode, though, for whatever reason. If you look at my comment up the page, the problem that people have is not managing the safety of data, it's making sure that they present the right "view" of that data to the right component. HTML needs to be escaped, but not if it's already been escaped, and so on.
Your point about language mattering is spot on. If your language has the right features (even just `typedef`), I don't think you need any flavor of Hungarian. In the right context, though, it can be useful.
For example, I do a lot of Excel+VBA programming. In VBA, the native data type is `Variant`. And if you pass something from Excel to a VBA function, that's what you get. But you also know that legal Excel values are just a subset of what VBA allows in a `Variant`. So I find it very useful to stick a clue to myself in the variable name, so say, `f(parm)` becomes `f(xParm)`.
I have libraries that work with the `x` subset of `Variant` (and subsets of that - i.e. `xs` denotes a simple value that can go in a single Excel cell). The prefixes provide useful clues that the language can't reasonably help me with.
Now you have something which can still look clean (how clean may depend on your language, type inference, etc.) without bringing in messy pseudo-Hungarian-ness, which stops me doing the most important thing with my code - reading it easily.
Obviously it's only a silly and trivial example, but I'm not sure this article has dated that well in some regards (or perhaps only really applies to certain types of languages - Visual Basic, for example, pretty much forces you to do something like Hungarian to maintain any semblance of sanity long term).