Mmmm, but there's more than one way to skin a cat. To take just the point about ...

eftpotrm · on Aug 22, 2011

The downside of that approach though is that you end up wrapping the bulk of the API functions you're using to make them conformant. You've got a huge amount of extra work creating and maintaining these components, and a major training workload for any new hires so they know why they can't use the standard options they're used to and what the internal replacements are.

Personally, I'd rather go Hungarian.

Edit - well, yes, you may well be able to do this sort of thing nice and easily in Haskell, but I'm not sure that going with Haskell to avoid the training and maintenance overhead of complicating your Java (or whatever) is a net reduction in workload and hiring difficulty...

yummyfajitas · on Aug 22, 2011

This is a Java/Python/etc problem, not a fundamental one. In Haskell, the wrapping process is merely an application of liftM:

    type UnsafeUserGenerated = UserGenerated a
    instance Monad UnsafeUserGenerated where
        ... (monadic boilerplate skipped)...

    apiFunction :: String -> String

    funcOfUnsafe :: U -> UnsafeString
    funcOfUnsafe unsafe = (liftM apiFunction) unsafe

pavpanchekha · on Aug 22, 2011

Does it really have to be a monad? Isn't a functor what we're really looking for?

jrockway · on Aug 22, 2011

It depends; if all you want to do is lift normal functions to the domain of unsafe operation (where an unsafe input implies unsafe output), then sure:

    newtype Unsafe a = Unsafe a
    
    instance Functor Unsafe where
        fmap f (Unsafe k) = Unsafe . f $ k

    addTwo :: Int -> Int
    addTwo = (+2)

    unsafeAddTwo :: Unsafe Int -> Unsafe Int
    unsafeAddTwo = fmap addTwo

But really, I'm not sure this is the right approach. Even values generated inside my program need to be quoted for inclusion on an HTML page. What you want to avoid is double-quoting, so what you need is simpler:

    data Content = Quoted String | Unquoted String

    output :: [Content] -> Content
    output = concatMap f
       where f (Unquoted x) = quote x
             f (Quoted   x) = x

Now the type system ensures that (output . output) == output, which is what you really want to ensure. Tainted data, I think, is a separate concern. And the solution, in that case, doesn't involve a functor, it involves making sure your library tags everything as Unsafe and that your data validation functions remove that annotation:

    type Params a = Map String (Unsafe a) -- keys may also be unsafe, YMMV

    readHtmlForm :: Request -> Params String
    validateField :: Validatable a => Unsafe a -> a

    main = output . Unquoted . validateField . get "foo" . readHtmlForm <$> fakeHttpRequest

Bootvis · on Aug 22, 2011

That comment illustrates the problem with Haskell. Yes it's nice and logical but for some reason also very hard.

pavpanchekha · on Aug 23, 2011

This comment illustrates the problem with comments about problems with Haskell. It doesn't quite understand what it's complaining about.

Functor vs. Monad in this case is a way of talking about how exactly this construct should work --- it's not a meaningless distinction. As another response to my comment mentioned, both a functor or a monad are applicable. The question is whether two instances of this data type (I'm using non-Haskell-y terms for clarity) have influence on each other (in rough terms). So it's not that I was complaining that the parent was wrong. I was complaining that the semantics he was imposing on his quoted strings were too restrictive. Another poster instead mentioned ways in which mine were too loose. So my post was part of a constructive debate on what exactly we want quoted strings to do. That Haskell provides a vocabulary for communicating precisely and tersely is not it's fault, it is one of its strengths.

jrockway · on Aug 22, 2011

Any monad is also a functor, so they're both right!

eru · on Aug 22, 2011

Actually, he probably wants an applicative functor, which sits between functors and monads.

Or perhaps an arrow or a co-monad are better? ;o)

eru · on Aug 22, 2011

> [...] but for some reason also very hard.

Yes. But that's a feature, not a bug.

gmartres · on Aug 23, 2011

It's not hard, it's different.

yummyfajitas · on Aug 23, 2011

The way I envisioned using it, yes. The use of a Monad over a Functor is merely because I'm still very much a Haskell newbie.

kolektiv · on Aug 22, 2011

As siblings point out, how much work you have to do depends on your language and the available facilities. In Java, this might be a huge pain - Haskell, etc. less so. It's more likely that this is done as part of the framework or libraries generally - how many people are, to continue this example, writing their own web framework? In .NET for example, ASP.NET MVC includes an HtmlString class for similar purposes. Other languages can approach it in different ways. You may find as well that even with something like .NET, not the most expressive of type systems, you could still make life fairly easy by providing appropriate implicit type conversions in one direction.

Symmetry · on Aug 22, 2011

In most languages that support this sort of things, can't you use SafeString in places where String is expected, as long as you declare SafeString as a sort of String?

prodigal_erik · on Aug 22, 2011

I wouldn't want SafeString is-a String, I would want a conversion from SafeString to String that removes whatever encoding or escaping was applied. Otherwise you end up re-encoding values that unbeknownst to you don't need it, which is why there are thousands of terrible PHP bulletin boards out there which won't let you use quotes without mangling them.

mseebach · on Aug 22, 2011

In Java, String is a final class.

wazoox · on Aug 22, 2011

I'd just like to mention that perl, which is often underrated, has a "tainted" mode that checks at compile time that you sanitized all inputs, and refuse to run if it isn't the case:

http://perldoc.perl.org/perlsec.html#SECURITY-MECHANISMS-AND...

joeyh · on Aug 22, 2011

I used to think that was pretty cool, until I realized it was probably developed well after Haskell. :) It can be of help if you're passing a lot of input to system() etc; it's not really general enough to help with web programming.

I gave up on perl's taint mode when I discovered this bug http://bugs.debian.org/411786 , in which perl randomly sets the taint flag due to a utf8 bug.

JadeNB · on Aug 24, 2011

According to Wikipedia, Haskell 1.0 was defined in 1990 (http://en.wikipedia.org/wiki/Haskell_%28programming_language...), but Perl has had taint mode since v3 in 1989 (http://en.wikipedia.org/wiki/Taint_checking#History).

Anyway, I'm not sure why it would be less cool even if it were inspired by another language (which I'm sure it was; Perl, like English, elevates borrowing to an art form).

wazoox · on Aug 22, 2011

Well, it's a bug. It's not supposed to happen, and AFAIK it occurs only for some (one?) specific version...

masklinn · on Aug 22, 2011

> a "tainted" mode that checks at compile time that you sanitized all inputs

Considering how dynamic perl is, and that you can mix tainted and non-tainted values in a single collection (for instance), I don't see how a perl program could be statically analyzed for taintedness misuses.

jrockway · on Aug 22, 2011

The program dies at runtime if the runtime detects a misuse of tainted data.

The reality is that nobody uses taint mode, though, for whatever reason. If you look at my comment up the page, the problem that people have is not managing the safety of data, it's making sure that they present the right "view" of that data to the right component. HTML needs to be escaped, but not if it's already been escaped, and so on.

wazoox · on Aug 22, 2011

> The reality is that nobody uses taint mode, though, for whatever reason.

Uh? I do use it; it's extremely efficient. I'm probably not the only one :)

masklinn · on Aug 22, 2011

> The program dies at runtime if the runtime detects a misuse of tainted data.

Right, so it's not at compile time. Thank you.

jtolle · on Aug 22, 2011

Your point about language mattering is spot on. If your language has the right features (even just `typedef`), I don't think you need any flavor of Hungarian. In the right context, though, it can be useful.

For example, I do a lot of Excel+VBA programming. In VBA, the native data type is `Variant`. And if you pass something from Excel to a VBA function, that's what you get. But you also know that legal Excel values are just a subset of what VBA allows in a `Variant`. So I find it very useful to stick a clue to myself in the variable name, so say, `f(parm)` becomes `f(xParm)`.

I have libraries that work with the `x` subset of `Variant` (and subsets of that - i.e. `xs` denotes a simple value that can go in a single Excel cell). The prefixes provide useful clues that the language can't reasonably help me with.