Creating coding standards for PHP_CodeSniffer

When our project is supervised by a continous integration platform, we are (hopefully) using static code analysis tools. One of the best for analysing PHP code
is PHP_CodeSniffer which integrates fine into systems like PhpUnderControl, Hudson or Bamboo. But in some cases the pre-installed coding standards like PEAR or Zend might not be sufficient for our
current project or we want to deviate. This is the moment when we want to be able to create a custom one that fits our special needs. In this article I want to share my first experiences
with you about how to create a custom coding standard for PHP_CodeSniffer.

What’s a standard in PHP_CodeSniffer?

A coding standard is nothing more than a set of rules. Each rule – called „sniff“ – is a class,
that checks the code for a particular requirement. There could be one that verifies that no tabs are used but spaces instead.
Another sniff could check if variable names stand in correct camel case, whereas a third one could check if there is only one class in each file.
When we locate the installation path of CodeSniffer and browse to the subfolder Standards
we will find this structure:

A standard (here PEAR) is a subfolder located in the Standards folder.
It needs a base class named after the standard followed by CodingStandard.php. Sniffs go into the subfolder Sniffs.
Grouping Sniffs into subfolders is best practise but not necessary. CodeSniffer grabs all
sniffs from all subfolders anyway.

When running PHP_CodeSniffer the standard’s name is referenced like this:

phpcs --standard=MyStandard

Of course phpcs comes with some pre-installed standards.
The Option -i tells us which ones are installed right now and therefor tells us, which standards could be used to analyse our code:

Setting up the file structure for a custom standard

Now we are ready to create a structure for our custom standard. We could create a structure like this:

The file MyStandardCodingStandard.php contains:

if (class_exists('PHP_CodeSniffer_Standards_CodingStandard', true) === false) {
    throw new PHP_CodeSniffer_Exception('Class PHP_CodeSniffer_Standards_CodingStandard not found');
}

class PHP_CodeSniffer_Standards_MyStandard_MyStandardCodingStandard extends PHP_CodeSniffer_Standards_CodingStandard
{
    public function getIncludedSniffs()
    {
        return array(
                'Generic/Sniffs/WhiteSpace/DisallowTabIndentSniff.php',
               );
    }
}

What’s happening there?
First we make sure that the class PHP_CodeSniffer_Standards_CodingStandard exists because we want to extend it. Afterwards we define our custom coding standard class. The method getIncludeSniffs()
defines which sniffs should be used in this standard. This is a possibility to reuse sniffs of external standards.
That’s great because we don’t want to reinvent the wheel again and again. If other standards already have a sniff we’d like to use in our
standard, we can inject it by just adding it to this array.
PHP_CodeSniffer scans the Sniffs folder for our own sniffs automatically. If you add them nonetheless, they will be executed twice.

That’s it! Our standard is ready.
Ok, it doesn’t do much more than executing the included external sniffs, but it’s already working. That means: if we want to create our own collection of sniffs taken from the Generic, PEAR,
Squiz or Zend standard we are fine by just gluing them together to our custom standard. I want to encourage you to delve deeper into the already coded sniffs. A lot of sniffs – and I really mean „a
lot“ – are already done.

Coding a sniff

Before we can start to build our sniff we have to understand how CodeSniffer works and how things are processed. At first it splits the file to be analysed into tokens. Our sniff needs to register the types of tokens for which it wants to be
invoked. The list of available token types can be found in Tokens.php and looks like
this:

define('T_NONE', 0);
define('T_OPEN_CURLY_BRACKET', 1000);
define('T_CLOSE_CURLY_BRACKET', 1001);
define('T_OPEN_SQUARE_BRACKET', 1002);
define('T_CLOSE_SQUARE_BRACKET', 1003);
define('T_OPEN_PARENTHESIS', 1004);
define('T_CLOSE_PARENTHESIS', 1005);
define('T_COLON', 1006);
define('T_STRING_CONCAT', 1007);
define('T_INLINE_THEN', 1008);
define('T_NULL', 1009);
define('T_FALSE', 1010);
define('T_TRUE', 1011);
define('T_SEMICOLON', 1012);
...

PHP_CodeSniffer is doing all the dirty work for us. It splits up the file into tokens, categorizes them and calls our sniff automatically, if we registered it for that kind of token.
Now that we know what tokens we can deal with, we are ready to code our first sniff.

Coding the first Sniff

For demonstration purposes our first sniff will be a very basic but working one.
We simply want to make sure that each class name starts with a capital letter – that’s our requirement.
First we create a file we want phpcs to check and call it checkMe.php.
Of course we write the class name in lower case to check if our sniff detects this correctly:

class checkMe
{
}

Now we build our sniff, name it ValidClassNameSniff.php and save it in our MyStandard/Sniffs
folder:

class MyStandard_Sniffs_ValidClassNameSniff implements PHP_CodeSniffer_Sniff
{
    public function register()
    {
        return array(
                T_CLASS,
                T_INTERFACE,
               );

    }

    /**
     * Processes this test, when one of its tokens is encountered.
     *
     * @param PHP_CodeSniffer_File $phpcsFile The current file being processed.
     * @param int                  $stackPtr  The position of the current token
     *                                        in the stack passed in $tokens.
     *
     * @return void
     */
    public function process(PHP_CodeSniffer_File $phpcsFile, $stackPtr)
    {
        $tokens = $phpcsFile->getTokens();

        $className = $phpcsFile->findNext(T_STRING, $stackPtr);
        $name      = trim($tokens[$className]['content']);

        // Make sure the first letter is a capital.
        if (preg_match('|^[A-Z]|', $name) === 0) {
            $error = ucfirst($tokens[$stackPtr]['content']).' \''.
                $name.'\' must begin with a capital letter';
            $phpcsFile->addError($error, $stackPtr);
        }
    }//end process()
}//end class

The register() method is used to tell CodeSniffer in which tokens we are interested.
Each time a token of the added type is detected CodeSniffer invokes our process()
method and hands over a PHP_Code_Sniffer_File instance and an integer pointing to the offset
of the deteced token. With

$tokens = $phpcsFile->getTokens();

we get the complete list of all tokens of this file. Let us add a quick and dirty

echo "\nStackPointer: ".$stackPtr."\n";
print_r($tokens);

to get an idea what we are dealing with. The output gives us important information:

StackPointer: 1
Array
(
    [0] => Array
        (
            [content] =>  T_OPEN_TAG
            [code] => 368
            [line] => 1
            [column] => 1
            [level] => 0
            [conditions] => Array
                (
                )

        )

    [1] => Array           <--- the token which invoked the process() method of our sniff
        (
            [code] => 353
            [content] => class
            [type] => T_CLASS
            [line] => 2
            [scope_condition] => 1
            [scope_opener] => 5
            [scope_closer] => 7
            [column] => 1
            [level] => 0
            [conditions] => Array
                (
                )

        )

    [2] => Array
        (
            [code] => 371
            [content] =>
            [type] => T_WHITESPACE
            [line] => 2
            [column] => 6
            [level] => 0
            [conditions] => Array
                (
                )

        )

    [3] => Array           <--- the token we are looking for
        (
            [type] => T_STRING
            [code] => 307
            [content] => checkMe
            [line] => 2
            [column] => 7
            [level] => 0
            [conditions] => Array
                (
                )

        )

    [4] => Array
        (
           ...
)

The pointer points to index 1 which has a subindex type which is the token type we registered for.
But that’s not the class name we are looking for!
We have to move the pointer to the token containing the class name and grab it from the index content.

Of course we could increase the pointer by 2 because there is one whitespace between
the invoking token and the class name. This would work for this file.
But what if there are more whitespaces or if the class would look like this?

class /*I am a class comment*/ checkMe

Our static strategy to simply add 2 to the invoking token index would fail.

Luckily the class PHP_Code_Sniffer_File offers the method
findNext($tokenType, $stackPtr) to find the next token of the given type.
As we can see in our debug output our class name is a token of the type T_STRING.
So we can use the method to look for the next T_STRING and therefor are able to skip all
whitespaces and comments easily.

$className = $phpcsFile->findNext(T_STRING, $stackPtr);

Btw: when coding sniffs it is a good idea to take a look at the methods CodeSniffer
already offers. Read the DocBlocks that contain many hints.
After we have found the class name it is easy to analyse it. Here we use a preg_match
pattern for uppercase letters and fire the addError() method to let CodeSniffer
do its complaining.
When we write other sniffs and want to distinguish between errors and warnings
there is also a method addWarning().
Let’s take a look at the generated output:

CodeSniffer automatically adds the line number, the kind of error and our output text. Our sniff is ready.
Of course CodeSniffer can not only check a single file but directories with subfolders.

Now you should be able to write your own sniffs. At least I hope you get the idea.
Sometimes it can get a little tricky to get the tokens you are interested in,
but I’m sure you will find a way.

Feel free to add notes to this article and use the comment function.
Depending on the public interest I am willing to do a follow up with a use case
of our project in progress. At least I hope I could catch your interest in building a custom coding standard. It’s only magic until you tried to do it yourself. ;)

Best regards,
Daniel Schlichtholz

Kommentare

9 Antworten zu „Creating coding standards for PHP_CodeSniffer“

grzegorzdrozd.pl

25. Februar 2011

Antworten
Bas Simons

27. Februar 2011

Hi Daniel,

What I really enjoy in the latest version of phpcs (1.3.*) is that you can specify your own ruleset xml to alter an existing standard. For example, if you like the pear sniffs but want your linelength to be a bit longer than 80 characters:

[?xml version=“1.0″?]
[ruleset name=“My PEAR“]

[rule ref=“PEAR“/]

[rule ref=“Generic.Files.LineLength“]
[properties]
[property name=“lineLimit“ value=“120″/]
[/properties]
[/rule]

[/ruleset]

you can call phpcs like this:

phpcs –standard=/path/to/custom_ruleset.xml /path/to/code

Antworten
1. Daniel Schlichtholz
  
  27. Februar 2011
  
  Thanks for this completion.
  Interested readers can read more here: http://pear.php.net/manual/en/package.php.php-codesniffer.annotated-ruleset.php
  
  Antworten
2. Yatin
  
  6. November 2012
  
  Really nice. I try this for myself and run into a problem. I include PEAR and the KernighanRitchie Sniff similar to your explanation. Further I excluded the BSDAllmanSniff. So, I should have the PEAR Standard with changed braces-coding convention.But I get the error, that the braces should be on a new line, so the ExcludedSniffs seems not to be noted by the codesniffer.Is there some trick, I didn’t got?regards
  
  Antworten
  1. DSB
    
    6. November 2012
    
    Hi, without knowing the used code noone can say why your excluded sniffs are not taken into account. I know that pointing to the documentation is no real help at first sight, but in this case I suggest you take a look at the examples in the documentation located here: https://pear.php.net/manual/en/package.php.php-codesniffer.annotated-ruleset.php
    
    Antworten
Brian Swan

4. März 2011

This week’s list is long. I highly recommend the first two links…both are interesting reads…I

Antworten
Amit

19. November 2012

Suppose i want to create a rule to find a particular string say „hello_world“ in the source code ,which token i have to register and how to find the „hello_world“ through out my source code

Antworten
1. Daniel Schlichtholz
  
  21. Dezember 2012
  
  If you are looking for a string, the token type T_STRING is what you are looking for.
  
  Antworten
Kalaiselvan

14. Mai 2014

Great..:)

Antworten

Der #1-Killer für Deine AI-Projekte!

Aktuelle Artikel per Mail