Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As an illustration of why C kind of sucks for web apps, that code is obviously insecure† (it's reflected XSS). To get around that while preserving natural syntax, you want:

  char *hsafe(const char *input);
But where does hsafe get the memory for the string from? It can't use input (the filtered result is larger than the input). Does it malloc? Now you have to free the result. Does it do the inet_ntoa() thing with the static variable? Now you can't chain it (or use threads, but you wouldn't want to do that anyways).

Maybe you can do an arena for each connection, so it's:

  char *hsafe(request_t *r, char *str)
But that's still sort of painful.

In an admittedly artificial way.



I think part (far from all!) of the issue is that you (and parent) are not using the right abstraction: The response body shouldn't be a string; it should be a stream:

   write_quot(response,"this < should be safe >");
   
Not perfect (you still need to deal with allocating temporaries if you want to inspect the contents before sending to the client), but: it (a) matches what's actually going on under the hood, (b) makes the simple cases safe, (c) provides a decent interface for safely extending the available formatters. (They would write their output to the stream, then free all temporary resources themselves before returning.)

(Further, I've had enough influence from statically-typed-land that I'd personally want to create tainted wrapper structs so that the compiler helps prevent user data from being passed to an unquoted write... but that's just me.)


I'm not seeing this a problem with C, but perhaps I misunderstand. Your goal is avoid including anything executable in the page. The filtered result is only larger than the input if you need the ability to faithfully quote the potentially malicious code. In this particular contrived example, you could "just" strip it out and make it shorter, or even error out if you see anything suspicious.

Which is to say it's only a problem if you are doing it by hand on a per field basis. Which leads to your next point:

  >  Maybe you can do an arena for each connection, so it's:
  >  char *hsafe(request_t *r, char *str)
  >  But that's still sort of painful.
This is only painful if you are doing it by hand every time you use the parameter. If the framework handles it for you, so that you always get the sanitized result, it strikes me as no different than any other language.

r->param("input"); // presanitized, lazily created, pooled in r

r->param_raw("input"); // if you want to live dangerously

Why is this worse in straight C than in something like Python that is doing the same thing but with an interpreted layer between you and the C?


You need to convert < into &lt;. That's the price of entry. You can't redefine the problem to "web framework that simply strips < out of inputs". Your framework would then be immediately inferior to every other framework which does output quoting.


Yes, if it was done in the framework, you'd want to have the sanitizing and allocation happen seamlessly, and you'd want the memory management to be simple as well. My confusion was why this strikes you as a significant difficulty for a framework author to set up.

Certainly it's harder if you decide that you are going to work from the ground up in straight ANSI C, but there's lots of good pool memory allocators out there. I'd either use one I had laying around, or just link in the one from the Apache Portable Runtime: http://apr.apache.org/docs/apr/1.4/group__apr__pools.html

The framework author could easily hide this behind the scenes, so that the user would find the string creation and destruction just as seamless as in Perl or Python. Use it and forget it, and the pool would be freed along with the Request. Thus my question, and my confusion, was why you finished with "But that's still sort of painful."


Hm good point. Built-in C string handling is much too painful for me to even consider doing a web application in C, and you have to worry about buffer overflows and memory leaks all the time. I'm so happy I don't have to do that in Python.

Then again, if you have a sane string handling library in C instead of messing around with strcpy/strcat, malloc/free and fixed-sized buffers, I guess it could be made more convenient.

so that the user would find the string creation and destruction just as seamless as in Perl or Python

If you manage to do that, please send me the link :)


You could just have a linked list of memory allocations that get freed at the end of the request.


If you're going for clean syntax, you could store an arena pointer in a global variable/thread local storage (depending on the programming model). Just make sure the scaffolding switches/creates/destroys the arenas. I think you'd still want a hierarchical allocator, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: