BLOG

Reverse Engineering JS by Example

Published January 02, 2019

flatmap-stream payload A

In November, the npm package event-stream was exploited via a malicious dependency, flatmap-stream. The whole ordeal was written up here and the focus of this post is to use it as a case study for reverse engineering JavaScript. The 3 payloads associated with flatmap-stream are simple enough to be easy to write about and complex enough to be interesting. While it is not critical to understand the backstory of this incident in order to understand this post, I will be making assumptions that might not be obvious if you aren’t somewhat familiar with the details.

Reverse engineering most JavaScript is more straightforward than binary executables you may run on your desktop OS – after all, the source is right in front of you – but JavaScript code that is designed to be difficult to understand often goes through a few passes of obfuscation in order to obscure its intent. Some of this obfuscation comes from what is called “minification” which is the process of reducing the overall bytecount of your source as much as possible for space saving purposes. This involves shortening of variables to single character identifiers and translating expressions like true to something shorter but equivalent like !0. Minification is mostly unique to JavaScript’s ecosystem because of its web browser origins and is occasionally seen in node packages due to a reuse of tools and is not intended to be a security measure. For basic reversal of common minification and obfuscation techniques, check out Shape’s unminify tool. Dedicated obfuscation passes may come from tools designed to obfuscate or are performed manually by the developer

The first step is to get your hand on the isolated source for analysis. The flatmap-stream package was crafted specifically to look innocent except for a malicious payload included in only one version of the package, version 0.1.1. You can quickly see the changes to the source by diffing version 0.1.2 and version 0.1.1 or even just alternating between the urls in two tabs. For the rest of the post we’ll be referring to the appended source as payload A. Below is the formatted source of payload A.

Reverse Engineering by Example Image 1

First things first: NEVER RUN MALICIOUS CODE (except in insulated environments). I’ve written my own tools to help me refactor code dynamically using the Shift suite of parsers and JavaScript transformers but you can use an IDE like Visual Studio Code for the purposes of following along with this post.

When reverse engineering JavaScript it is valuable to keep the mental juggling to a minimum. This means getting rid of any expressions or statements that don’t add immediate value and also reversing the DRYness of any code that has been optimized automatically or manually. Since we’re statically analyzing the JavaScript and tracking execution in our heads, the deeper your mental stack grows the more likely it is you’ll get lost.

One of the simplest things you can do is unminify variables that are being assigned global properties like require and process, like on lines 3 and 4.

 

Reverse Engineering by Example Image 2

 

You can do this with any IDE that offers refactoring capabilities (usually by pressing “F2” over an identifier you want to rename). After that, we see a function definition, e, which appears to simply decode a hex string.

 

Reverse Engineering by Example Image 3

 

The first interesting line of code appears to import a file which comes from the result of the function e decoding the string "2e2f746573742f64617461"

 

Reverse Engineering by Example Image 4

It is extremely common for deliberately obfuscated JavaScript to obscure any literal string value so that anyone who takes a passing glance won’t get alerted by particularly ominous strings or properties in clear view. Most developers recognize this is a very low hurdle so you’ll often find trivially undoable encoding in place and that’s no different here. The e function simply reverses hex strings and you can do that manually via an online tool or with your own convenience function. Even if you’re confident that you understand that the e function is doing, it’s still a good idea to not run it (even if you extract it) with input found in a malicious file because you have no guarantees that the attacker hasn’t found a security vulnerability which is triggered by the data.

After reversing that string we see that the script is including a data file, './test/data' which is located in the distributed npm package.

 

Reverse Engineering by Example Image 5

After renaming n to data and deobfuscating calls to e(n[2]) to e(n[9]) we start to see a better picture of what we’re dealing with here.

 

Reverse Engineering by Example Image 6

It’s also easy to see why these strings were hidden, finding any references to decryption in a simple flatmap library would be a dead giveaway that something is very wrong.

From here we see the script is importing node.js’s “crypto” library and, after looking up the APIs, we find that the second argument to createDecipher, o here, is the password used to decrypt. Now we can rename that argument and the following return values to sensible names based on the API. Every time we find a new piece of the puzzle it’s important to immortalize it via a refactor or a comment, even if it’s a renamed variable that seems trivial. It’s very common when diving through foreign code for hours that you lose your place, get distracted, or need to backtrack because of some erroneous refactor. Using git to save checkpoints during a refactor is valuable as well but I’ll leave that decision to you. The code now looks as follows, with the e function deleted because it is no longer used along with the statement if (!o) {... because it doesn’t add value to the analysis.

 

Reverse Engineering by Example Image 7

You’ll also notice I’ve renamed f to newModuleInstance. With code this short it’s not critical but with code that might be hundreds of lines long it’s important for everything to be as clear as possible.

Now payload A is largely deobfuscated and we can walk through it to understand what it does.

Line 3 imports our external data.

 

Reverse Engineering by Example Image 8

Line 4 grabs a password out of the environment. process.env allows you to access variables from within a node script and npm_package_description is a variables that npm, node’s package manager, sets when you run scripts defined in a package.json file.

 

Reverse Engineering by Example Image 9

Line 5 creates a decipher instance with the value from npm_package_description as the password. This means that the encrypted payload can only be decrypted when this script is executed via npm and is being executed for a particular project that has, in its package.json, a specific description field. That’s going to be tough.

 

Reverse Engineering by Example Image 10

Lines 6 and 7 decrypt the first element in our external file and store it in the variable “decrypted“

 

Reverse Engineering by Example Image 11

 

Lines 8-11 create a new module and then feeds the decrypted data into the undocumented method _compile. This module then exports the second element of our external data file. module.exports is node’s mechanism of exposing data from one module to another, so newModuleInstance.exports(data[1]) is exposing a second encrypted payload found in our external data file.

 

Reverse Engineering by Example Image 12

 

At this point we have encrypted data that is only decryptable with a password found in a package.json somewhere and whose decrypted data gets fed into the _compile method. Now we are left with a problem: how do you decrypt data where the password is unknown? This is a non-trivial question, if it were easy to brute force aes256 encryption then we’d have more problems than an npm package being taken over. Luckily we’re not dealing with a completely unknown set of possible passwords, just any string that happened to be entered into a package.json somewhere. package.json files originated as the file format for npm package metadata so we may as well start at the official npm registry. Luckily there’s an npm package that gives us a stream of all package metadata.

Reverse Engineering by Example Image 13

There’s no guarantee our target file is located in an npm package, many non-npm projects use package.json to store configuration for node-based tools, and package.json descriptions can change from version to version but it’s a good place to start. It is possible to decrypt this payload with multiple keys resulting in garbled gibberish so we need some way of validating our decrypted payload during this brute forcing process. Since we’re dealing something that is fed to Module.prototype._compile which feeds to vm.runInThisContext we can reasonably assume that the output is JavaScript and we can use any number of JavaScript parsers to validate the data. If our password fails or if it succeeds but our parser throws an error then we need to move to the next package.json. Conveniently, Shape Security has built its own set of JavaScript parsers for use in JavaScript and Java environments. The brute force script used is here:

 

Reverse Engineering by Example Image 14

After running this for 92.1 seconds and processing 740543 packages we come up with our password – “A Secure Bitcoin Wallet” – which successfully decodes the payload included below:

 

Reverse Engineering by Example Image 15

This was lucky. What could have been a monstrous brute forcing problem ended up needing less than a million iterations. The affected package with the key in question ended up being the bitcoin wallet Copay’s client application. The next two payloads dive deeper into the application itself and, given the target application is centered around storing bitcoins, you can probably guess where this might be going.

If you find topics like this interesting and want to read an analysis for the other two payloads or future attacks, then be sure to “like” this post or let me know on twitter at @jsoverson.