Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Symbolica – Try our symbolic code executor in the browser
45 points by Choc13 on Sept 9, 2021 | hide | past | favorite | 35 comments
We're a couple of software engineers who believe that to build great software you need to write good tests, but we also sympathise when engineers say things like:

- "Writing tests was too time consuming on my tight schedule", or

- "Unit tests don't catch enough bugs, so they're useless", or

- "I've inherited a legacy code base without tests and have no idea where to start"

To tackle this we're building Symbolica (https://www.symbolica.dev), a symbolic code executor [1], that lets you run your code for all possible inputs. This means you can do things like:

- Assert properties about your code and check that they hold for every conceivable input.

- Check that two implementations of the same function/method/program are equivalent, which is really useful if you're refactoring a legacy codebase without tests.

- Find out if your code will hit any undefined behaviours, e.g. divide by zero or out of bounds array access.

We're still really early in the development of this product, but we're excited to have built a working prototype of the symbolic executor for C programs. We wanted to get some feedback from potential early adopters so we've put up a code playground (https://www.symbolica.dev/playground) where you can try out Symbolica on C programs in the browser for free. We'd love people to give it a go and give us their thoughts.

Our plan is to build this out into a hosted cloud service that you can integrate into your DevOps pipeline (e.g. GitHub actions) so that you can run these symbolic tests on every CI build.

Further down the line we plan to add support for other languages too. We've currently got proof of concept implementations for Lisp and Python and will be looking into C++, Rust and .NET after. Of course we're always willing to prioritise a particular language if there's strong demand.

If you're interested in what we're building then please either message us at dev@symbolica.dev or join the alpha waiting list if you want to get first access to our full offering once we launch that.

For those curious about how the executor works the core part of it is open source on GitHub (https://github.com/SymbolicaDev/Symbolica)

[1] https://en.wikipedia.org/wiki/Symbolic\_execution



Can you explain how your offering compares with open-source alternatives, such as KLEE [1] and DeepState [2]?

P.S. your logo is surprisingly similar to that of ForAllSecure [3], which also provides a symbolic execution engine called Mayhem [4].

[1] https://klee.github.io/ (online example: http://klee.doc.ic.ac.uk/)

[2] https://github.com/trailofbits/deepstate#readme

[3] https://forallsecure.com/

[4] https://users.ece.cmu.edu/~aavgerin/papers/mayhem-oakland-12...


Yeah sure. We initially wanted to build something to compare the equivalence of two programs written in different languages and for this we started out using KLEE. Unfortunately, we found it didn't quite satisfy our needs and then changes we wanted to make didn't really fit with their architectural model. We also felt like it wasn't as amenable to parallelisation as we would have liked. The core technology is using an SMT solver just like KLEE, but our hope is that Symbolica will be more scalable and easier to use for larger problems.

DeepState looks interesting, we had seen this before and other stuff from trail of bits, but admittedly we haven't used it ourselves. Our impression is that it seems like more of a frontend to various symbolic execution / fuzzing backends. So maybe it's something we should consider integrating with too.

We hadn't come across ForAllSecure before, so thanks for pointing them out. They do appear to be similar, but their focus seems to be on large commercial/enterprise projects in sectors like defence and aerospace. So maybe I'm wrong, but I don't think they have an offering for individuals / smaller teams.

On the logo point, as another commenter pointed out it's the mathematical forall symbol, so I guess we just both had the same thought when it came to coming up with a logo.


Their logo is just \forall quantifier, which is pretty much what those products do, so I wouldn't be that surprised.

https://en.wikipedia.org/wiki/Universal_quantification


I think the problem this product is trying to solve, making it easier to implement rigorous testing, is an important one. When I talk to junior developers I compare enterprise software development to painting. When you paint a room, most of your time is actually spent taping out all of the edges and boundaries. The verb painting is a misnomer. Similarly, it isn't a fast or easy for devs to see all of the boundary conditions in inherited code bases and write meaningful tests for them.

You mention in the description about testing code for all possible inputs. Are you all actually doing this behind the scenes with an SMT solver? Are you using some other proof assistant (COQ, HOL, etc) to be able to use proof-by-induction techniques and thus manage the path explosion problem?

I tried out an example comparing two multiplication algorithms (see below) and the tool said it couldn't find any issues in about a minute. Curious whether it will scale to larger problem sizes.

// implementation 1 int mult(int x, int y) { return x*y; } // implementation 2 int slow_mult(int x, int y) { int result=0; if(x<0) { for(int i=x; i<0; i++) { result-=y; } } else { for(int i=0; i<x; i++) { result+=y; } } return result; }


Yeah I think that's a great analogy, we noticed the same thing when we used to work together at our last jobs which is what motivated to start this venture.

On the implementation side, yes we're using Z3, you can have a poke around at the core symbolic executor at https://github.com/SymbolicaDev/Symbolica

In terms of managing the path explosion problem, we have a few techniques that we've prototyped locally. For instance we believe we can do quite a bit by canonicalising and caching past results. We've also made our executor amenable to parallelisation from the start. We hope to start using these techniques in our hosted version soon.

With your example, maybe I'm missing something here, but they look like they are functionally equivalent to me. When I just tried running some values by hand they even appear to overflow in the same way. What was the assertion that you were making and what was the result you were expecting?


I think the surprise comes from how long it takes to get the answer, not that it's incorrect.

By comparison, gcc is able to reduce those two implementations to the same thing and prove them equal in a fraction of the time (https://godbolt.org/z/6oMEd8ebY). If it takes Symbolica a minute for the same tiny example, how does it work on real codebases?


Ah right, makes sense, thanks. Most of the time on the playground is spent booting a build server (we're using GitHub actions behind the scenes just for this part of the playground) and then building the code. I just ran this example myself and the build part took around 50 seconds before it actually sent the bitcode off to be analysed.

We need to add some more detailed diagnostic outputs to the actual execution part so that we can see the timings more clearly, but as a rough approximation from looking at the logs the execution API took about 10 seconds to run and some of that time would be in waiting for the message to be dequeued and for the function to wake up and process the job. So it's taking maybe around 5 seconds right now to analyse this one.

I know that's still quite slow, but we've not yet spent any real time on optimisations to the execution part so we're quite confident we can make this bit faster. As for the constant build overhead, this is something that's only really applicable to our playground, as users would likely build their code locally or have already built it as part of a CI run. Given all the feedback on this slowness though we are planning to make some improvements to this build part of the playground too to help reduce this constant overhead, or at least make it more obvious that this is what is causing the initial part of the waiting time.


This is interesting, it seems like it may in fact be testing by looping, given it took a minute, and didn't conclude. For hardware design there are logic equivalence checking tools that work very differently - they essentially work by reducing a function (piece of hardware) to a minimized boolean equation. The method of reduction ensures the same logic, no matter how coded, results in the same boolean equation, and then comparing the equations is direct and does not require testing all input values. Hardware verification also has a higher bar than software, where code coverage analysis can require 100% on line, branch, value, condition, expression, functional, etc. Its a lot of work, but does give very high confidence there are zero bugs.


Hey, the reason for the long times on the playground is actually that we’re having to wait for build servers to boot up and then build the code before we can finally ship it off to our backend for analysis. We’ve had quite a bit of feedback on this slowness and we hope to eliminate some of this constant factor boot up and compile time soon, and for what remains make it clear that this is what’s happening in the status box.

So although it might seem like it, we’re not actually doing any looping, we’re doing constraint solving as you’ve mentioned.


Cool. I have a background in IT security and am sometimes doing symbolic (or concolic) execution on binaries instead of source code. So I may not be the target audience.

Nevertheless I think you should do a comprehensive list what your tool can detect and what not, instead of some examples. From the sample programs i learned that it can check two programs for semantical equivalence and can detect undefined behavior. I did not get the example with the memory violation, as symbolica.h is not included in the code. Additionally, your tool could create test cases that trigger the error, afaik.

Regarding the feedback for the user, as multiple others have mentioned, the running time is very long and thus a simple timer may not be sufficient. But I do not have an idea how to improve that.

I am not sure if you are working on the target program directly, or on the compiled binary. In the example with the division by zero, there is a print statement in order to stop the compiler from removing dead code. But I do not understand where you need a compiler, if you are working directly on C. On the other hand, when you are working on the compiled binary, then your tool is very similar to other symbolic execution engines, such as angr.


Hey, thanks for the feedback. A more comprehensive list of features is on our todo list, we’ll update the site shortly.

The memory violation one doesn’t include symbolica.h because we don’t need to symbolize any variables for that one. When we run the compiled code through our solver it adds additional constraints such as, “don’t access memory that’s out of bounds”, so we can detect that automatically.

On your final point, we’re working on the LLVM bitcode so we do have to compile the code, hence the print statement. You’re right in that this is similar to other symbolic executors. We built this one as we found others were quite difficult to get started with and were hard to extend with the features we personally wanted as well as not supporting the languages we wanted either. Obviously at this point in time we’ve not achieved these goals yet, but we think we’ve built the foundations to be able to do so.


Thank you for the answer and a big kudos for the project.

Regarding the memory violation, could symbolica deal with symbolic memory? Can it deal with symbolic files as input? How about syscalls with symbolic inputs? These are the main problems I had when I worked on my toy symbolic execution engine. If yes, you should definitely market these features.


Yeah it can deal with symbolic memory. As long as the memory allocation is fixed size the contents can be fully or partially symbolic. We also support symbolic addresses for allocations to ensure that pointer arithmetic is fully tested.

We're currently simulating a lot of the underlying system at the C std library level. For a number of reasons we'd like to lower this to the raw syscall and assembly level. This would allow any lib C implementation to be tested along with the application code, and we may even be able to simulate threading and the file system. Syscalls could be made symbolic along with files by treating the entire system symbolically, but obviously this is a lot of work so it's something that we're gradually building towards.


Most code nowadays consists of calling third party libraries and your system's other modules. How do you build the models of those components for symbolic execution?


Great question. This obviously isn't exposed through the playground right now as that's just a single file demo.

Our backend software does integrate with larger build systems. So the way you would handle third party deps is to link them during the build and then analyse the whole built module. We plan to make it easier to integrate with toolchains for specific languages in the future.


Link, https://www.symbolica.dev

I tried the try for free button.

The run took longer than my attention span. Since I'm not a potential customer, that's not a data point.

On the other hand, adding some text to the code editor page quantifying in minutes and seconds how long running example code will take would be a low effort way to manage user expectations.

Setting the context as a proof of concept is ok. It's honest. Letting people know it will be slow is good because it is honest, too.

Give people evidence that you can be trusted. Javascript that runs on back-button mouse over is not where to put your efforts. Good luck.


Thanks for the comments, they’re good points. I’ll update the page as you suggested as yeah right now at the PoC stage it’s on the slow side.

Some of the time lag is due to us waiting for the build stage to complete, which isn’t actually the core part of the product and is only required for the playground site. I’ll try and make this more explicit too so that people can get a sense for how long the actual symbolic execution part takes, which is the actual delta that will be added if they were to use it in their DevOps pipeline.

We also have some speed improvements to the executor that we’ll be shopping soon. We’ve tested locally but just need to port to the cloud version.


I’m not quite sure what you meant by this statement

“Javascript that runs on back-button mouse over is not where to put your efforts.”

Was this something you observed on our site, or just a general point about prioritising the core product over fancy UI effects on the website? Would you mind clarifying?


I might be wrong, but I think he meant that JS runs, a new http request is made, when you hover over the logo. (Which is close to the back button, so it seems like JS is running, when you hover over the back button.)


Ah right yeah that makes sense. We built the website using NextJS so we kinda got this for free with their Image component.


There are three "example programs" available on your website, but none of them are very interesting.

I'd love to see better examples of subtle problems that your tool can detect: the kind of thing where an engineer says "oof, yes, that's obvious now that you've pointed it out" but wouldn't have noticed it right away on their own.


Yeah that's a good point, we'll work on some more realistic examples.


> We've currently got proof of concept implementations for Lisp and Python

So what would be the benefit over something like http://hypothesis.readthedocs.io ?

From the look of it (the mention of "free" on the website), you have some monetization in mind. What form would that take?


I’ve not used Hypothesis, but says it’s inspired by QuickCheck and I’m a regular user of FsCheck which is the F# equivalent. Those tools have a similar goal in mind that we do, but whereas they generate randomised test data (I think FsCheck creates 100 test cases IIRC), we effectively solve the equation so we can actually say that it holds for all inputs. The key difference is that we don’t actually invoke the code with concrete values, but rather treat it like a set of constraints that should always be satisfied. We then check these constraints and if they fail we can reverse engineer a concrete value that would cause the violation. So we effectively run it for all inputs, without actually running it for any inputs.


On the monetisation point we do plan to have a paid version of the product too. We think that one of the limiting factors of using symbolic execution to date has been in the path explosion problem when analysing large programs. A large part of our reasoning behind building a new symbolic executor from scratch was that we wanted to be able to parallelise it to help overcome this issue. We intend to provide a cloud hosted version of Symbolica that makes use of this so that users can get access to this compute power on a pay per use basis.

So our rough plans at the moment are to charge on a consumption basis to use this service.


Thank you.

> The key difference is that we don’t actually invoke the code with concrete values, but rather treat it like a set of constraints that should always be satisfied

How does that work with a dynamic language such as Python?


The static type information isn't actually too important to us, what our tool cares about is being able to interpret branching conditions in the code in order to turn them into constraints. We currently do this at the LLVM IR level, so as long as we can compile the code down into this form then we can analyse it. With Python a lot of the work we had to do in order to get a prototype working was in being able to handle all of the system calls that the Python interpreter needs to make when booting up. These aren't always straight forward to treat symbolically because they interact with the file system etc, but it's possible to provide mock implementations that don't affect the analysis of the users code.


I doubt that any paying user will be interacting with your system through a browser, but the browser-based demo could do a better job with status. You do show some activity updates, but it would be nice to know where in the overall process the task is. I was just about to bail when my one line C program finished. It took about 90 seconds. Yeah, I am impatient.

Your Lisp implementation would be interesting to me, but I don't know what the market size for a Lisp focused product would be. I actually don't know if the market size for C is great either. We did a huge toolchain refactor at DreamWorks a decade ago and there was a lot of C code, so maybe it would be relevant.


We do have some upcoming changes that will make the status clearer on the website and will be removing some constant time overhead that we're paying on each playground run. I agree that the playground is on the slow side at the moment whilst we're still in the early stages of developing this product.

On the Lisp point, it's not really been developed for commercial reasons, as like you say it's not got the largest market share, but we wanted to see if we could get it to work with languages other than C and Lisp seemed like a nice one to try.


I should add that we believe it’s possible to support other langs too, especially if they compile down to LLVM IR, or have an interpreter/runtime that does. So we plan to eventually work on support for others too like C++, .NET, Java, JavaScript and Rust.


Does this work on FF? I tried to check out some samples and couldn't see anything.

Works on chrome, but ran long enough that I bounced. Not sure I'm the target customer, though, so NBD there.

Cool stuff, thanks for posting!


Hmm, yeah sorry about the FireFox thing and thanks for reporting. I just checked it now and it seems like the code editor div isn't respecting the height 100% style so it's shrunk to being a few pxs heigh. I'll fix today.

Thanks for giving it another shot in Chrome.


It took a minute and a half for me using FF on a one line C program.


The "Join our mailing list" form in the footer has a UI problem. The text color is white when users type in it.


Ah sorry about that and thanks for pointing it out. Will fix!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: