PHP Formatter
The best part about the holidays is the free time to write fun code. I’ve wanted to write a PHP formatter since I first used Go’s gofmt, so this week it was a pretty easy choice.
The code is available on GitHub: PHP Formatter, and licensed under the MIT License like most of my code.
It uses PHP’s built-in token_get_all function to tokenize the PHP into operators, keywords, strings, HTML, and whitespace, and then steps through these tokens, focusing mostly on rearranging the whitespace. All whitespace (outside of strings and HTML, of course) is stripped except for multiple sequential newlines which are reduced to a single newline. After that, “correct” whitespace is added back into the code.
In order to avoid creating an unruly mess, it then goes through and ensures that no line is longer than 97 characters unless absolutely necessary (i.e. it doesn’t automatically break strings up yet). For consistency sake, it also re-arranges sequential single-line comments to the same line length.
Additionally, PHP Formatter will format inline HTML for indenting, although the HTML formatting is much more permissive, with whitespace for the most part being left untouched. The indenting rule here is that each line that has an opening tag is indented once (regardless of how many tags are opened on that line), and likewise each line with a closing tag is unindented. For this to work, closing tags are rearranged to remain on the same line as the other corresponding open tags were originally. If a closing tag occurs out of order, unclosed tags will be closed to help promote valid markup.
As an added perk, inline JavaScript (through script tags) will be formatted using the JS-Beautify PHP port if it’s available alongside the PHP Formatter. I forked this in order to make it ignore PHP tags inside the JS as though they were just comments, since it’s not actually valid JavaScript, although that’s only necessary if you mix PHP and JS.
All in all, this actually took me a fair bit longer than I thought it would – I was expecting to get a lot more done today and yesterday outside of this – as it turns out that nearly everything about the way we format code is an edge-case, with the word “exception” (or worse “the one exception”) appearing all too often in the comments. Nonetheless, save for a few quirks, it effectively formats code the way I’ve been trying to, and does so with minimal effort on the developer’s part.
Recent Comments