Actions

Code quality guide: Difference between revisions

From LimeSurvey Manual

Line 124: Line 124:
* Randomization
* Randomization


Why does this matter? Because side-effects make testing harder! '''If you have side-effects, you cannot create unit tests''', ''unless'' those side-effects can be mocked. If not, you need integrity tests with database setup or more.  
Why does this matter? Because side-effects make testing harder! '''If you have side-effects, you cannot create unit tests''', ''unless'' those side-effects can be mocked. If not, you need integrity tests with database setup or more. The more side-effects you have in a function, the more mocking is required to unit test it.


So, one solution to this is to move side-effects higher up in the stack trace, that is, move them to the calling code.
So, one solution to this is to move side-effects higher up in the stack trace, that is, move them to the calling code.

Revision as of 01:38, 3 April 2021

DRAFT

Prologue

  1. Be risk aware. Too perfect code can be a business risk (slow to write, maybe over-designed). Too sloppy code can also be a business risk (hard to maintain and understand). You have to find a balance that is adapted to the current situation and risk analysis, in which code quality becomes a risk mitigation technique.
  2. Be humble. LimeSurvey was made by developers from all over the world, with different age, education and experience. Your code might be read by a completely different team ten years from now, in the same way you are now reading code by developers that no longer work with us, but which work pays our rent.
  3. Performance matters sometimes, and shouldn't be disregarded completely. In particular, database queries using the ORM and ActiveRecord can be problematic. Some surveys have thousands of questions and hundreds of thousands of responses. Fast response time is also important for a fluid user experience.
  4. It's harder to read code than to write code. Don't choose patterns that are easy or fast to write, but that are easy to read.
  5. It's easy to forget cross-cutting concerns like translation and security. Keep a mental note.
  6. Stress affects code quality. If your stress level is too high to write code with proper quality, take a step back and discuss it with your boss.

Quality

What is quality? Which aspects of quality can be measured?

It's usually easier to get along what is "bad" code than what's "best" code. Blacklist instead of whitelist?

Quality attributes related to code quality:

  • Readability
  • Testability
  • Maintainability
  • Performance
  • Security

It's better to avoid emotional language when describing code quality, like "clean" or "dirty". Be precise.

Idiomatic code is more readable than non-idiomatic code. What's idiomatic depends on which context or domain you work in. We work in PHP and web application development, and have other idioms than in, say, functional or hardware-close programming.

Some code are "hot spots" - changed often. Those parts should require higher quality than other code.

Risk

todo

Modules

There's no inherit module functionality in PHP, but Yii has a module system.

Classes

todo

Different categories of classes:

  • Builders
  • Fetchers
  • Data-transfer object
  • ...

Injection, dependencies

Functions

Functions are one of the fundamental building blocks of programming, so it's important to get it right.

Functions that are part of a class are called "methods", if you want to be formal.

Different types of functions in PHP:

  • Functions (not attached to a class)
  • Methods (attached to class)
  • Anonymous functions (does not capture scope automatically)
  • Short functions (does capture scope automatically)

I'm using "function" as a word for all types of functions in PHP below.


Size

  • A function should be short enough to be completely readable on one screen, approx 60 lines
  • A function should not have too many arguments; if you have five or more, consider making it a class instead
  • If a function is growing too big, it might be better to create a class instead or split it into several functions (if both functions have common state, a class might be better, unless you want to pass around explicit state)

Contract

  • A function has a contract with its surrounding environment:
    • Pre-condition: What needs to be true before the function is executed
    • Post-condition: What will be true after the function has returned
  • The function contract can be checked with assertions (internal invariants) or exceptions (outside world); see below
  • The first line in the docblock should be one sentence describing the relation between function arguments and its output
    Example
    /**
     * Creates a random password of length $length (defaults to 12)
     * @param int $length
     * @return string
     */
    function createPassword($lenght = 12)
    
    /**
     * Creates an HTML language drop down for survey with id $surveyId, marking $selectedLanguage as selected
     * @param int $surveyId
     * @param string $selectedLanguage
     * @return string <select> HTML
     */
    function createLanguageDropdown($surveyId, $selectedLanguage)
    

Push side-effects up in the stack trace

This one is a bit abstract.

First, what is a side-effect? In short, it's anything that happens in a function that is not described by the relation between function arguments and output.

Another way to phrase it is, that a side-effect is anything that is not always the same for the given input to a function; a function without any side-effects will always return the same result if it gets the same input.

Examples:

// gT accesses the file system and will return different result depending on what's in the translation files
function gT($sToTranslate, $sEscapeMode = 'html', $sLanguage = null);

// If $sText and $sEscapeMode is always the same, quoteText() will always return the same string.
function quoteText($sText, $sEscapeMode = 'html');

A function with no side-effects is called pure. A function that is guaranteed to always return (no exceptions, no errors, no warnings, no division by zero) is called total. Another name for purity is referential transparency.

Examples of side-effects:

  • Database or file access
  • Echo, or write to stdout or stderr
  • Randomization

Why does this matter? Because side-effects make testing harder! If you have side-effects, you cannot create unit tests, unless those side-effects can be mocked. If not, you need integrity tests with database setup or more. The more side-effects you have in a function, the more mocking is required to unit test it.

So, one solution to this is to move side-effects higher up in the stack trace, that is, move them to the calling code.

Example:

function getSurveyInfo(int $surveyid, string $languagecode = '', boolean $force = false): array;

Since this function takes $surveyid, we know it will probably query the database. Quering the database is a side-effect, and thus it requires a database scaffold for the test. One simple solution is to life the side-effect out and pass in a Survey object instead:

function getSurveyInfo(Survey $survey, string $languagecode = '', boolean $force = false): array;

Much better! Now we can either mock a Survey object, or make an imitating stub.

Reading the function body, there is in fact another database access, but since this is a relation to the Survey model, it could easily be moved out too. We've just made a function testable! Hooray!

Function name

  • The function name should tell you what it does; docs should tell you why (if this information is not obvious). How is described by the code itself (hopefully, but can in some cases be relevant to document).
  • Function name can often be verb + noun, like "createFieldmap" or "quoteText"
  • Prefix with "is" for boolean functions

todo: difference between create, get, fetch, generate, ...

Good examples:

  • gT() - An exception to descriptive function names; since this is used everywhere, the minimal name is motivated
  • createPassword() - One verb, one noun - clear
  • isCaptchaEnabled() - Using "is" signifies it returns boolean; clear purpose
  • getFullResponseTable() - Verb, adjective, noun - but is "full" needed here? Is there a non-full version of the same function?
  • replaceExpressionCodes() - Pretty good, but it's obvious that more information is needed in the docblock
  • fixCKeditorText() - "fix" is a bit vague, but maybe best there is? Pray that docblock contains information!

Bad examples:

  • languageDropdown() - No verb, doesn't tell you what it does
  • rmdirr() - Two r's, looks like a typo, but the last "r" actually means "recursive"!
  • CSVUnquote() - Is CSV really the namespace? Why is the verb the second word?
  • incompleteAnsFilterState() - One verb, but in the middle, or is "filter state" a noun?; why abbreviate "answer" to "ans"?
  • reverseTranslateFieldNames() - Two verbs - did author mean "reverseTranslatedFieldNames"? Or "reverse and translate"? If so, it should be two functions, one for reverse and one for translate

Assertions

Assertions can be used for mental checks. They are only used for internal logic - failed assertions are not supposed to be seen by the end user. For interaction with the outside world, like database or browser, exceptions should be used instead.

Assertions can be disabled in production mode and have then no runtime cost.

Example:

/**
 * Translation helper function
 *
 * @param string $sToTranslate
 * @param string $sEscapeMode Valid values are html (this is the default, js and unescaped)
 * @param string $sLanguage
 * @return string
 */
function gT($sToTranslate, $sEscapeMode = 'html', $sLanguage = null)
{
    assert(is_string($sToTranslate));
    assert($sEscapeMode === 'html' || $sEscapeMode === 'js' || $sEscapeMode === 'unescaped');
    assert(strlen($sLanguage) === 2 || $sLanguage === null);
    return quoteText(Yii::t('', $sToTranslate, array(), null, $sLanguage), $sEscapeMode);
}

Exceptions

todo

If-statements

If-statements should not be too long or hard to understand. If they are, consider factor out the checks into separate functions.

Variables

Descriptive naming.

  • Variables or ActiveRecord classes can have same name as the class, like "$survey" or "$question"
  • Add -s for plural when it's an array: "$surveys", "$questions"

Not so good:

$result = LabelSet::model()->findAll();

Better with model name and -s for plural:

$labelSets = LabelSet::model()->findAll();

Looping

The bodies of foreach-loops are often a good candidate to factor out into separate functions.

Example:

foreach ($questions as $question) {
  processQuestion($question);
}

Loop indexing should be named i, j, k etc.

Example:

for ($i = 0; $i < $length; $i++) {
    for ($j = 0; $j < $otherLength; $j++) {
        // ...
    }
}

Cases

/**
* This function generates an array containing the fieldcode, and matching data in the same order as the activate script
*
* @param Survey $survey Survey ActiveRecord model
* @param string $style 'short' (default) or 'full' - full creates extra information like default values
* @param boolean $force_refresh - Forces to really refresh the array, not just take the session copy
* @param bool|int $questionid Limit to a certain qid only (for question preview) - default is false
* @param string $sLanguage The language to use
* @param array $aDuplicateQIDs
* @return array
*/
function createFieldMap($survey, $style = 'short', $force_refresh = false, $questionid = false, $sLanguage = '', &$aDuplicateQIDs = array())

Good:

  • Name is verb and noun
  • Each argument is documented (almost)

Bad:

  • Returns "array", which can be anything. A separate class for "Fieldmap" would be better.
  • Not clear if it should be "createFieldMap" or "createFieldmap" - is "fieldmap" one word or two?
  • Very many boolean arguments. Should probably be a class with methods like $fieldmapCreator->setForceRefresh(true); instead.
  • Too long.
  • Doesn't elaborate on what "fieldmap" really is, and does not link to any documentation about it.
  • Access super-global, which makes it hard to test. Should instead return a data-transfer object, like:
    $_SESSION['fieldmap'] = createFieldMap(...);
    
  • Has no test
  • Mixes Hungarian notation with non-Hungarian notation in the argument list

Defensive programming and error handling

todo

Documentation

todo

OOP design

todo

Object reification.

Hierarchy depth.

The PHP of yesterday

Old idioms and habits that should be abandoned.

Assoc arrays vs DTO. No performance overhead of objects compared to arrays.

Objects are already passed by reference; arrays are value types

Security

todo

XSS

Database injection

Permissions and roles

Performance

todo

Testing

todo

QA

PHPUnit

TDD

Mocks, can't mock functions, static methods, mocking as a DSL, PHPUnit mocking tools

Pure functions, referential transparency, side-effects

Unit, functional, integration

Don't test the framework; don't test PHP

Static analysis

php -l, CodeSniffer, Mess Detector, Psalm

Resources

todo

Books, links, videos