Javascript obfuscation using PHP

For a project of mine I needed a means to compress javascript code. There are numerous programs who do this very well but I thought it would be a fun project to make one myself.

Obfuscation is used most frequently to prevent users from viewing and simply copy pasting your code. It's however also a great way of reducing your file size. I was able to reduce a file to less than half it's orginal size by simply rewriting the original code.

If you don't want all the technical stuff just check out the demo and see for yourself.

Start the javascript obfuscation demo

The only decent way of doing javascript obfuscation is by using regular expressions. This is done in three steps:

  1. Analyses phase, where code is searched for variable and function names
  2. Replacement phase, where function and variable names are replaced with shorter random names
  3. Compact phase, where whitespace, comments and linebreaks are removed

Doing these steps results in most cases in fully functional code but errors still can occur. More on this later.

Analyzing

To search for variable, function and function argument names I use two different regular expression patterns. First the one for variables:

/(var)*\s+([\w]+)\s*=/

The most crucial bit is the = sign. Everything before it is either an object or a variable. Object properties won't get matched here because where only searching for strings with word characters (\w) everything with a dot in it is ignored. the var bit at the beginning is optional since variables don't have to be declared explicitly.

Finding function and argument names is a little more obvious since functions are always declared starting with the word function.

/function\s+(\w+)\(((\w+,)*\w+)?\)/

Argument names are the tricky bit in this. Since I found no easy way of selecting each individual name I select the complete argument string and parse it using PHP.

Replacing

All names that have been found get stored in an array. This is than looped through and for each entry a search pattern is generated which looks something like this:

/(?![\'"])\bXXX\b(?![\'"])/m

Here XXX is the name of the variable. The crucial part here is the word boundaries (\b) which should prevent partial strings from being matched. The 'look ahead' patterns in front and after the word boundaries are my incomplete attempts at preventing text being matched inside javascript strings.

Each match found is replaced by a randomly generated unique name, which looks something like Ae, KsyF, PMw, etc

Stripping comments and whitespace

The final stage is to remove all unwanted text like whitespace and comments. For the comments we need two patterns one for the single line the other for multiline comments. The single line pattern looks like this:

/\/\/.*$/m

and the multiline:

/\/\*.*?\*\//sm

The whitespace is a little more tricky since some of it has meaning in the javascript code whe can't just remove every whitespace character we find.

I made a pattern which should remove all whitespace where this isn't strictly necessary.

/\s*([,=!<>:;\-%\?\*\+\|\]\[\(\)\{\}]{1,2})\s*/m

Things that can be improved

This script isn't perfect and there's still room for improvement. Currently the following issues can be improved upon:

For now I will keep the code for this app private since it needs a little more thinkering with. When I feel the major problems have been resolved I will post the source here. In the mean time try the demo.

Categorieën: javascript PHP