Neovim: dev_style.txt

dev_style.txt          Nvim


                            NVIM REFERENCE MANUAL


Nvim style guide                                        dev-style

Style guidelines for developers working Nvim's source code.

License: CC-By 3.0 https://creativecommons.org/licenses/by/3.0/

                                      Type gO to see the table of contents.

==============================================================================
Background

One way in which we keep the code base manageable is by enforcing consistency.
It is very important that any programmer be able to look at another's code and
quickly understand it.

Maintaining a uniform style and following conventions means that we can more
easily use "pattern-matching" to infer what various symbols are and what
invariants are true about them. Creating common, required idioms and patterns
makes code much easier to understand.

In some cases there might be good arguments for changing certain style rules,
but we nonetheless keep things as they are in order to preserve consistency.


==============================================================================
Header Files                                            dev-style-header


Header guard 

All header files should start with `#pragma once` to prevent multiple inclusion.

    In foo/bar.h:
>c
        #pragma once
<

Headers system 

Nvim uses two types of headers. There are "normal" headers and "defs" headers.
Typically, each normal header will have a corresponding defs header, e.g.
fileio.h and fileio_defs.h. This distinction is done to minimize
recompilation on change. The reason for this is because adding a function or
modifying a function's signature happens more frequently than changing a type
The goal is to achieve the following:

- All headers (defs and normal) must include only defs headers, system
  headers, and generated declarations. In other words, headers must not
  include normal headers.

- Source (.c) files may include all headers, but should only include normal
  headers if they need symbols and not types.

Use the following guideline to determine what to put where:

Symbols:
  - regular function declarations
  - extern variables (including the EXTERN macro)

Non-symbols:
  - macros, i.e. #define
  - static inline functions with the FUNC_ATTR_ALWAYS_INLINE attribute
  - typedefs
  - structs
  - enums

- All symbols must be moved to normal headers.

- Non-symbols used by multiple headers should be moved to defs headers. This
  is to ensure headers only include defs headers. Conversely, non-symbols used
  by only a single header should be moved to that header.

- EXCEPTION: if the macro calls a function, then it must be moved to a normal
  header.

==============================================================================
Scoping                                                 dev-style-scope

Local Variables 

Place a function's variables in the narrowest scope possible, and initialize
variables in the declaration.

C99 allows you to declare variables anywhere in a function. Declare them in as
local a scope as possible, and as close to the first use as possible. This
makes it easier for the reader to find the declaration and see what type the
variable is and what it was initialized to. In particular, initialization
should be used instead of declaration and assignment, e.g. >c

    int i;
    i = f();      // ❌: initialization separate from declaration.

    int j = g();  // ✅: declaration has initialization.


Initialization 

Multiple declarations can be defined in one line if they aren't initialized,
but each initialization should be done on a separate line.

>c
    int i;
    int j;              // ✅
    int i, j;           // ✅: multiple declarations, no initialization.
    int i = 0;
    int j = 0;          // ✅: one initialization per line.

    int i = 0, j;       // ❌: multiple declarations with initialization.
    int i = 0, j = 0;   // ❌: multiple declarations with initialization.

==============================================================================
Nvim-Specific Magic

clint 

Use clint.py to detect style errors.

src/clint.py is a Python script that reads a source file and identifies
style errors. It is not perfect, and has both false positives and false
negatives, but it is still a valuable tool. False positives can be ignored by
putting `// NOLINT` at the end of the line.

uncrustify 

src/uncrustify.cfg is the authority for expected code formatting, for cases
not covered by clint.py.  We remove checks in clint.py if they are covered by
uncrustify rules.

==============================================================================
Other C Features                                        dev-style-features


Variable-Length Arrays and alloca() 

We do not allow variable-length arrays or alloca().

Variable-length arrays can cause hard to detect stack overflows.


Postincrement and Postdecrement 

Use postfix form (`i++`) in statements. >c

    for (int i = 0; i < 3; i++) { }
    int j = ++i;  // ✅: ++i is used as an expression.

    for (int i = 0; i < 3; ++i) { }
    ++i;  // ❌: ++i is used as a statement.


Use of const 

Use const pointers whenever possible. Avoid const on non-pointer parameter definitions.

    Where to put the const 

    Some people favor the form `int const *foo` to `const int *foo` . They
    argue that this is more readable because it's more consistent: it keeps
    the rule that const always follows the object it's describing. However,
    this consistency argument doesn't apply in codebases with few
    deeply-nested pointer expressions since most const expressions have only
    one const, and it applies to the underlying value. In such cases, there's
    no consistency to maintain. Putting the const first is arguably more
    readable, since it follows English in putting the "adjective" (`const`)
    before the "noun" (`int`).

    That said, while we encourage putting const first, we do not require it.
    But be consistent with the code around you! >c

    void foo(const char *p, int i);
    }

    int foo(const int a, const bool b) {
    }

    int foo(int *const p) {
    }


Integer Types 

Of the built-in integer types only use char, int, uint8_t, int8_t,
uint16_t, int16_t, uint32_t, int32_t, uint64_t, int64_t,
uintmax_t, intmax_t, size_t, ssize_t, uintptr_t, intptr_t, and
ptrdiff_t.

Use int for error codes and local, trivial variables only.

Use care when converting integer types. Integer conversions and promotions can
cause non-intuitive behavior. Note that the signedness of char is
implementation defined.

Public facing types must have fixed width (`uint8_t`, etc.)

There are no convenient printf format placeholders for fixed width types.
Cast to uintmax_t or intmax_t if you have to format fixed width integers.

Type		unsigned    signed
char  		%hhu  	    %hhd
int  		n/a	    %d
(u)intmax_t  	%ju  	    %jd
(s)size_t  	%zu  	    %zd
ptrdiff_t  	%tu  	    %td


Booleans 

Use bool to represent boolean values. >c

    int loaded = 1;  // ❌: loaded should have type bool.


Conditions 

Don't use "yoda-conditions". Use at most one assignment per condition. >c

    if (1 == x) {

    if (x == 1) {  //use this order

    if ((x = f()) && (y = g())) {


Function declarations 

Every function must not have a separate declaration.

Function declarations are created by the gen_declarations.lua script. >c

    static void f(void);

    static void f(void)
    {
      ...
    }


General translation unit layout 

The definitions of public functions precede the definitions of static
functions. >c

    <HEADER>

    <PUBLIC FUNCTION DEFINITIONS>

    <STATIC FUNCTION DEFINITIONS>


Integration with declarations generator 

Every C file must contain #include of the generated header file, guarded by
#ifdef INCLUDE_GENERATED_DECLARATIONS.

Include must go after other #includes and typedefs in .c files and after
everything else in header files. It is allowed to omit #include in a .c file
if .c file does not contain any static functions.

Included file name consists of the .c file name without extension, preceded by
the directory name relative to src/nvim. Name of the file containing static
functions declarations ends with .c.generated.h, *.h.generated.h files
contain only non-static function declarations. >c

    // src/nvim/foo.c file
    #include <stddef.h>

    typedef int FooType;

    #ifdef INCLUDE_GENERATED_DECLARATIONS
    # include "foo.c.generated.h"
    #endif

    …


    // src/nvim/foo.h file
    #pragma once

    …

    #ifdef INCLUDE_GENERATED_DECLARATIONS
    # include "foo.h.generated.h"
    #endif


64-bit Portability 

Code should be 64-bit and 32-bit friendly. Bear in mind problems of printing,
comparisons, and structure alignment.

- Remember that `sizeof(void *)` != sizeof(int). Use intptr_t if you want
  a pointer-sized integer.

- You may need to be careful with structure alignments, particularly for
  structures being stored on disk. Any class/structure with a
  int64_t/uint64_t member will by default end up being 8-byte aligned on a
  64-bit system. If you have such structures being shared on disk between
  32-bit and 64-bit code, you will need to ensure that they are packed the
  same on both architectures. Most compilers offer a way to alter structure
  alignment. For gcc, you can use __attribute__((packed)). MSVC offers
  `#pragma pack()` and __declspec(align()).

- Use the LL or ULL suffixes as needed to create 64-bit constants. For
  example: >c

    int64_t my_value = 0x123456789LL;
    uint64_t my_mask = 3ULL << 48;


sizeof 

Prefer sizeof(varname) to sizeof(type).

Use sizeof(varname) when you take the size of a particular variable.
sizeof(varname) will update appropriately if someone changes the variable
type either now or later. You may use sizeof(type) for code unrelated to any
particular variable, such as code that manages an external or internal data
format where a variable of an appropriate C type is not convenient. >c

    Struct data;
    memset(&data, 0, sizeof(data));

    memset(&data, 0, sizeof(Struct));

    if (raw_size < sizeof(int)) {
      fprintf(stderr, "compressed record not big enough for count: %ju", raw_size);
      return false;
    }


==============================================================================
Naming                                                  dev-style-naming

The most important consistency rules are those that govern naming. The style
of a name immediately informs us what sort of thing the named entity is: a
type, a variable, a function, a constant, a macro, etc., without requiring us
to search for the declaration of that entity. The pattern-matching engine in
our brains relies a great deal on these naming rules.

Naming rules are pretty arbitrary, but we feel that consistency is more
important than individual preferences in this area, so regardless of whether
you find them sensible or not, the rules are the rules.


General Naming Rules 

Function names, variable names, and filenames should be descriptive; eschew
abbreviation.

Give as descriptive a name as possible, within reason. Do not worry about
saving horizontal space as it is far more important to make your code
immediately understandable by a new reader. Do not use abbreviations that are
ambiguous or unfamiliar to readers outside your project, and do not abbreviate
by deleting letters within a word. >c

    int price_count_reader;    // No abbreviation.
    int num_errors;            // "num" is a widespread convention.
    int num_dns_connections;   // Most people know what "DNS" stands for.

    int n;                     // Meaningless.
    int nerr;                  // Ambiguous abbreviation.
    int n_comp_conns;          // Ambiguous abbreviation.
    int wgc_connections;       // Only your group knows what this stands for.
    int pc_reader;             // Lots of things can be abbreviated "pc".
    int cstmr_id;              // Deletes internal letters.


File Names 

Filenames should be all lowercase and can include underscores (`_`).

Use underscores to separate words. Examples of acceptable file names: 

    my_useful_file.c
    getline_fix.c  // ✅: getline refers to the glibc function.

C files should end in .c and header files should end in .h.

Do not use filenames that already exist in /usr/include, such as db.h.

In general, make your filenames very specific. For example, use
http_server_logs.h rather than logs.h.


Type Names 

Typedef-ed structs and enums start with a capital letter and have a capital
letter for each new word, with no underscores: MyExcitingStruct.

Non-Typedef-ed structs and enums are all lowercase with underscores between
words: `struct my_exciting_struct` . >c

    struct my_struct {
      ...
    };
    typedef struct my_struct MyAwesomeStruct;


Variable Names 

Variable names are all lowercase, with underscores between words. For
instance: my_exciting_local_variable.

    Common Variable names 

    For example: >c

        string table_name;  // ✅: uses underscore.
        string tablename;   // ✅: all lowercase.

        string tableName;   // ❌: mixed case.
<

    Struct Variables 

    Data members in structs should be named like regular variables. >c

        struct url_table_properties {
          string name;
          int num_entries;
        }
<

    Global Variables 

    Don't use global variables unless absolutely necessary. Prefix global
    variables with g_.


Constant Names 

Use a k followed by mixed case: kDaysInAWeek.

All compile-time constants, whether they are declared locally or globally,
follow a slightly different naming convention from other variables. Use a k
followed by words with uppercase first letters: >c

    const int kDaysInAWeek = 7;

Function Names 

Function names are all lowercase, with underscores between words. For
instance: my_exceptional_function(). All functions in the same header file
should have a common prefix.

In os_unix.h: >c

    void unix_open(const char *path);
    void unix_user_id(void);

If your function crashes upon an error, you should append or_die to the
function name. This only applies to functions which could be used by
production code and to errors that are reasonably likely to occur during
normal operation.


Enumerator Names 

Enumerators should be named like constants: kEnumName. >c

    enum url_table_errors {
      kOK = 0,
      kErrorOutOfMemory,
      kErrorMalformedInput,
    };


Macro Names 

They're like this: MY_MACRO_THAT_SCARES_CPP_DEVELOPERS. >c

    #define ROUND(x) ...
    #define PI_ROUNDED 5.0


==============================================================================
Comments                                                dev-style-comments

Comments are vital to keeping our code readable. The following rules describe
what you should comment and where. But remember: while comments are very
important, the best code is self-documenting.

When writing your comments, write for your audience: the next contributor who
will need to understand your code. Be generous — the next one may be you!

Nvim uses Doxygen comments.


Comment Style 

Use the //-style syntax only. >c

    // This is a comment spanning
    // multiple lines
    f();


File Comments 

Start each file with a description of its contents.

    Legal Notice 

    We have no such thing. These things are in LICENSE and only there.

    File Contents 

    Every file should have a comment at the top describing its contents.

    Generally a .h file will describe the variables and functions that are
    declared in the file with an overview of what they are for and how they
    are used. A .c file should contain more information about implementation
    details or discussions of tricky algorithms. If you feel the
    implementation details or a discussion of the algorithms would be useful
    for someone reading the .h, feel free to put it there instead, but
    mention in the .c that the documentation is in the .h file.

    Do not duplicate comments in both the .h and the .c. Duplicated
    comments diverge. >c

    /// A brief description of this file.
    ///
    /// A longer description of this file.
    /// Be very generous here.


Struct Comments 

Every struct definition should have accompanying comments that describes what
it is for and how it should be used. >c

    /// Window info stored with a buffer.
    ///
    /// Two types of info are kept for a buffer which are associated with a
    /// specific window:
    /// 1. Each window can have a different line number associated with a
    /// buffer.
    /// 2. The window-local options for a buffer work in a similar way.
    /// The window-info is kept in a list at g_wininfo.  It is kept in
    /// most-recently-used order.
    struct win_info {
      /// Next entry or NULL for last entry.
      WinInfo *wi_next;
      /// Previous entry or NULL for first entry.
      WinInfo *wi_prev;
      /// Pointer to window that did the wi_fpos.
      Win *wi_win;
      ...
    };

If the field comments are short, you can also put them next to the field. But
be consistent within one struct, and follow the necessary doxygen style. >c

    struct wininfo_S {
      WinInfo *wi_next;  ///< Next entry or NULL for last entry.
      WinInfo *wi_prev;  ///< Previous entry or NULL for first entry.
      Win *wi_win;       ///< Pointer to window that did the wi_fpos.
      ...
    };

If you have already described a struct in detail in the comments at the top of
your file feel free to simply state "See comment at top of file for a complete
description", but be sure to have some sort of comment.

Document the synchronization assumptions the struct makes, if any. If an
instance of the struct can be accessed by multiple threads, take extra care to
document the rules and invariants surrounding multithreaded use.


Function Comments 

Declaration comments describe use of the function; comments at the definition
of a function describe operation.

    Function Declarations 

    Every function declaration should have comments immediately preceding it
    that describe what the function does and how to use it. These comments
    should be descriptive ("Opens the file") rather than imperative ("Open the
    file"); the comment describes the function, it does not tell the function
    what to do. In general, these comments do not describe how the function
    performs its task. Instead, that should be left to comments in the
    function definition.

    Types of things to mention in comments at the function declaration:

    - If the function allocates memory that the caller must free.
    - Whether any of the arguments can be a null pointer.
    - If there are any performance implications of how a function is used.
    - If the function is re-entrant. What are its synchronization assumptions? >c
    /// Brief description of the function.
    ///
    /// Detailed description.
    /// May span multiple paragraphs.
    ///
    /// @param arg1 Description of arg1
    /// @param arg2 Description of arg2. May span
    ///        multiple lines.
    ///
    /// @return Description of the return value.
    Iterator *get_iterator(void *arg1, void *arg2);
<

    Function Definitions 

    If there is anything tricky about how a function does its job, the
    function definition should have an explanatory comment. For example, in
    the definition comment you might describe any coding tricks you use, give
    an overview of the steps you go through, or explain why you chose to
    implement the function in the way you did rather than using a viable
    alternative. For instance, you might mention why it must acquire a lock
    for the first half of the function but why it is not needed for the second
    half.

    Note you should not just repeat the comments given with the function
    declaration, in the .h file or wherever. It's okay to recapitulate
    briefly what the function does, but the focus of the comments should be on
    how it does it. >c

    // Note that we don't use Doxygen comments here.
    Iterator *get_iterator(void *arg1, void *arg2)
    {
      ...
    }


Variable Comments 

In general the actual name of the variable should be descriptive enough to
give a good idea of what the variable is used for. In certain cases, more
comments are required.

    Global Variables 

    All global variables should have a comment describing what they are and
    what they are used for. For example: >c

        /// The total number of tests cases that we run
        /// through in this regression test.
        const int kNumTestCases = 6;


Implementation Comments 

In your implementation you should have comments in tricky, non-obvious,
interesting, or important parts of your code.

    Line Comments 

    Also, lines that are non-obvious should get a comment at the end of the
    line. These end-of-line comments should be separated from the code by 2
    spaces. Example: >c

        // If we have enough memory, mmap the data portion too.
        mmap_budget = max<int64>(0, mmap_budget - index_->length());
        if (mmap_budget >= data_size_ && !MmapData(mmap_chunk_bytes, mlock)) {
          return;  // Error already logged.
        }
<
    Note that there are both comments that describe what the code is doing,
    and comments that mention that an error has already been logged when the
    function returns.

    If you have several comments on subsequent lines, it can often be more
    readable to line them up: >c

        do_something();                      // Comment here so the comments line up.
        do_something_else_that_is_longer();  // Comment here so there are two spaces between
                                             // the code and the comment.
        { // One space before comment when opening a new scope is allowed,
          // thus the comment lines up with the following comments and code.
          do_something_else();  // Two spaces before line comments normally.
        }
<

    NULL, true/false, 1, 2, 3... 

    When you pass in a null pointer, boolean, or literal integer values to
    functions, you should consider adding a comment about what they are, or
    make your code self-documenting by using constants. For example, compare:
    >c

        bool success = calculate_something(interesting_value,
                                           10,
                                           false,
                                           NULL);  // What are these arguments??
<

    versus: >c

        bool success = calculate_something(interesting_value,
                                           10,     // Default base value.
                                           false,  // Not the first time we're calling this.
                                           NULL);  // No callback.
<

    Or alternatively, constants or self-describing variables: >c

        const int kDefaultBaseValue = 10;
        const bool kFirstTimeCalling = false;
        Callback *null_callback = NULL;
        bool success = calculate_something(interesting_value,
                                           kDefaultBaseValue,
                                           kFirstTimeCalling,
                                           null_callback);
<

    Don'ts 

    Note that you should never describe the code itself. Assume that the
    person reading the code knows C better than you do, even though he or she
    does not know what you are trying to do: >c

        // Now go through the b array and make sure that if i occurs,
        // the next element is i+1.
        ...        // Geez.  What a useless comment.


Punctuation, Spelling and Grammar 

Pay attention to punctuation, spelling, and grammar; it is easier to read
well-written comments than badly written ones.

Comments should be as readable as narrative text, with proper capitalization
and punctuation. In many cases, complete sentences are more readable than
sentence fragments. Shorter comments, such as comments at the end of a line of
code, can sometimes be less formal, but you should be consistent with your
style.

Although it can be frustrating to have a code reviewer point out that you are
using a comma when you should be using a semicolon, it is very important that
source code maintain a high level of clarity and readability. Proper
punctuation, spelling, and grammar help with that goal.


TODO Comments 

Use TODO comments for code that is temporary, a short-term solution, or
good-enough but not perfect.

TODOs should include the string TODO in all caps, followed by the name,
email address, or other identifier of the person who can best provide context
about the problem referenced by the TODO. The main purpose is to have a
consistent TODO format that can be searched to find the person who can
provide more details upon request. A TODO is not a commitment that the
person referenced will fix the problem. Thus when you create a TODO, it is
almost always your name that is given. >c

    // TODO(kl@gmail.com): Use a "*" here for concatenation operator.
    // TODO(Zeke): change this to use relations.

If your TODO is of the form "At a future date do something" make sure that
you either include a very specific date ("Fix by November 2005") or a very
specific event ("Remove this code when all clients can handle XML
responses.").


Deprecation Comments 

Mark deprecated interface points with @deprecated docstring token.

You can mark an interface as deprecated by writing a comment containing the
word @deprecated in all caps. The comment goes either before the declaration
of the interface or on the same line as the declaration.

After @deprecated, write your name, email, or other identifier in
parentheses.

A deprecation comment must include simple, clear directions for people to fix
their callsites. In C, you can implement a deprecated function as an inline
function that calls the new interface point.

Marking an interface point DEPRECATED will not magically cause any callsites
to change. If you want people to actually stop using the deprecated facility,
you will have to fix the callsites yourself or recruit a crew to help you.

New code should not contain calls to deprecated interface points. Use the new
interface point instead. If you cannot understand the directions, find the
person who created the deprecation and ask them for help using the new
interface point.


==============================================================================
Formatting                                              dev-style-format

Coding style and formatting are pretty arbitrary, but a project is much easier
to follow if everyone uses the same style. Individuals may not agree with
every aspect of the formatting rules, and some of the rules may take some
getting used to, but it is important that all project contributors follow the
style rules so that they can all read and understand everyone's code easily.


Non-ASCII Characters 

Non-ASCII characters should be rare, and must use UTF-8 formatting.

You shouldn't hard-code user-facing text in source (OR SHOULD YOU?), even
English, so use of non-ASCII characters should be rare. However, in certain
cases it is appropriate to include such words in your code. For example, if
your code parses data files from foreign sources, it may be appropriate to
hard-code the non-ASCII string(s) used in those data files as delimiters. More
commonly, unittest code (which does not need to be localized) might contain
non-ASCII strings. In such cases, you should use UTF-8, since that is an
encoding understood by most tools able to handle more than just ASCII.

Hex encoding is also OK, and encouraged where it enhances readability — for
example, "\uFEFF", is the Unicode zero-width no-break space character, which
would be invisible if included in the source as straight UTF-8.


Braced Initializer Lists 

Format a braced list exactly like you would format a function call in its
place but with one space after the { and one space before the }

If the braced list follows a name (e.g. a type or variable name), format as if
the {} were the parentheses of a function call with that name. If there is
no name, assume a zero-length name. >c

    struct my_struct m = {  // Here, you could also break before {.
        superlongvariablename1,
        superlongvariablename2,
        { short, interior, list },
        { interiorwrappinglist,
          interiorwrappinglist2 } };


Loops and Switch Statements 

Annotate non-trivial fall-through between cases.

If not conditional on an enumerated value, switch statements should always
have a default case (in the case of an enumerated value, the compiler will
warn you if any values are not handled). If the default case should never
execute, simply use abort(): >c

    switch (var) {
      case 0:
        ...
        break;
      case 1:
        ...
        break;
      default:
        abort();
    }

Switch statements that are conditional on an enumerated value should not have
a default case if it is exhaustive. Explicit case labels are preferred over
default, even if it leads to multiple case labels for the same code. For
example, instead of: >c

    case A:
      ...
    case B:
      ...
    case C:
      ...
    default:
      ...

You should use: >c

    case A:
      ...
    case B:
      ...
    case C:
      ...
    case D:
    case E:
    case F:
      ...

Certain compilers do not recognize an exhaustive enum switch statement as
exhaustive, which causes compiler warnings when there is a return statement in
every case of a switch statement, but no catch-all return statement. To fix
these spurious errors, you are advised to use UNREACHABLE after the switch
statement to explicitly tell the compiler that the switch statement always
returns and any code after it is unreachable. For example: >c

    enum { A, B, C } var;
    ...
    switch (var) {
      case A:
        return 1;
      case B:
        return 2;
      case C:
        return 3;
    }
    UNREACHABLE;

Return Values 

Do not needlessly surround the return expression with parentheses.

Use parentheses in `return expr`; only where you would use them in `x =
expr;`. >c

    return result;
    return (some_long_condition && another_condition);

    return (value);  // You wouldn't write var = (value);
    return(result);  // return is not a function!


Horizontal Whitespace 

Use of horizontal whitespace depends on location.

    Variables 
>c
        int long_variable = 0;  // Don't align assignments.
        int i             = 1;

        struct my_struct {  // Exception: struct arrays.
          const char *boy;
          const char *girl;
          int pos;
        } my_variable[] = {
          { "Mia",       "Michael", 8  },
          { "Elizabeth", "Aiden",   10 },
          { "Emma",      "Mason",   2  },
        };
<

==============================================================================
Parting Words

The style guide is intended to make the code more readable. If you think you
must violate its rules for the sake of clarity, do it! But please add a note
to your pull request explaining your reasoning.


 vim:tw=78:ts=8:et:ft=help:norl: