Ideally, with a substantial dataset of obfuscated JavaScript and corresponding raw code, a language model could potentially make good predictions. The first key difficulty, however, is collecting a large-scale dataset and setting up a system for automatic compilation and segment out the binary-source pairs.