diff --git a/spec-0012/index.md b/spec-0012/index.md index 0db0b00b..c4697513 100644 --- a/spec-0012/index.md +++ b/spec-0012/index.md @@ -4,122 +4,182 @@ number: 12 date: 2024-06-06 author: - "Pamphile Roy " + - "Matt Haberland " discussion: https://discuss.scientific-python.org/t/spec-12-formatting-mathematical-expressions endorsed-by: --- ## Description -It is known that the PEP8 and other established styling documents are missing -guidelines about mathematical expressions. This leads to people coming with -their own interpretation and style. Standardizing the way we represent maths -would lead to the same benefits seen with "normal" code. It brings consistency -in the ecosystem improving the collaborative efforts. +[PEP 8](https://peps.python.org/pep-0008) +and other established styling documents do not include guidelines about +styling mathematical expressions. This leads to individual interpretation and +styles which may conflict with those of others. We seek to standardizing the +way we represent mathematics for the same reason we standardize other code: +it brings consistency to the ecosystem and allows collaborators to focus on +more important aspects of the code. This SPEC standardize the formatting of mathematical expressions. -## Implementation - -The following rules must be followed. -These rules respect and complement the PEP8 (relevant sections includes -[id20](https://www.python.org/dev/peps/pep-0008/#id20) and -[id20](https://www.python.org/dev/peps/pep-0008/#id28)). - -We define a _group_ as a collection of operators having the same priority. -e.g. `a + b + c` is a single group, `a + b * c` is composed of two groups `a` -and `b * c`. A group is also a collection delimited with parenthesis. -`(a + b * c)` is a group. And the whole expression by itself is a -group. - -- There a space before and after `-` and `+`. Except if - the operator is used to define the sign of the number; - - ``` - a + b - -a - ``` - -- Within a group, if operators with different priorities are used, add whitespace around the operators with the lowest priority(ies). - - ``` - a + b*c - ``` - -- There is no space before and after `**`. - - ``` - a**b - ``` - -- There is no space before and after operators `*` and `/`. Only exception is if the expression consist of a single operator linking two groups with more than one - element. +## Terminology - ``` - a*b - (a*b) * (c*d) - ``` - -- Operators within a group are ordered from the lowest to the highest priority. - If this is technically an issue (e.g. restriction on the AST), add - parenthesis or spaces. - - ``` - a/d*b**c - a*(b**c)/d - a*b**c / d - a * b**c / d - ``` - -- When splitting an equation, new lines should start with the operator linking - the previous and next logical block. Single digit on a line are forbidden. - - ``` - ( - a/b - + c*d - ) - ``` - -### Examples - -```python -# good -i = i + 1 -submitted += 1 -x = x*2 - 1 -hypot2 = x*x + y*y -c = (a + b) * (a - b) -dfdx = sign*(-2*x + 2*y + 2) -result = 2*x**2 + 3*x**(2/3) -y = 4*x**2 + 2*x + 1 -c_i1j = ( - 1./n**2. - *np.prod( - 0.5*(2. + abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1 - ) -) -``` - -```python -# bad -i = i + 1 -submitted += 1 -x = x * 2 - 1 -hypot2 = x * x + y * y -c = (a + b) * (a - b) -dfdx = sign * (-2 * x + 2 * y + 2) -result = 2 * x ** 2 + 3 * x ** (2 / 3) -y = 4 * x ** 2 + 2 * x + 1 -c_i1j = ( - 1.0 - / n ** 2.0 - * np.prod( - 0.5 * (2.0 + abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1 - ) -) +An "explicit" expression is a code expression enclosed within parentheses or +otherwise syntactically separated from other expressions (i.e. by code other +than operators, whitespace, literals, or variables). For example, in the list +comprehension: +```python3 +for j in range(1, i + 1) ``` +The output expression `j` is one explicit expression and the input sequence +`range(1, i + 1)` is another. + +A "subexpression" is subset of an expression that is either explicit or could +be made explicit (i.e. with parentheses) without affecting the order of +operations. In the example above, `j` and `range(1, i + 1)` can also be +referred to as explicit subexpressions of the whole expression, and `1` and +`i + 1` are explicit subexpressions of the expression `range(1, i + 1)`. `i` and +`1` are "implicit" subexpressions of `i + 1`: they could be written as explicit +subexpressions `(i)` and `(1)` without affecting the order of operations, but they +are not explicit as written. + +As another example, in `x + y*z`, `y*z` is a subexpression because it could be made +explicit as in `x + (y*z)` without changing the order of operations. However, `x + y` +would not be a subexpression because `(x + y)*z` would change the order of operations. +Note that `x + y*z` as a whole may also be referred to as a "subexpression" rather than +an "expression" even though `(x + y*z)` is not a proper subset of the whole. + +A "simple" expression is an expression involving only one operator priority level +without considering the operators within explicit subexpressions. +A "compound" expression is an expression involving more than one operator +priority level without considering the contents of explicit subexpressions. +For example, +- `x + y - z` is a simple expression because `+` and `-` have the +same priority level. There are no explicit subexpressions to be ignored. +- `x * (y + z)` is also a simple expression because there is only one operator +between `x` and the explicit subexpression `(y + z)`; we ignore the contents - and +especially the operator - within the explicit subexpression; conceptually, it may +regarded as `(...)`. +- `x * y + z` is a compound expression; there are two operators and no explicit +subexpressions that can be ignored. + +The acronym PEMDAS commonly refers to "parentheses", "exponentiation", "multiplication", +"division", "addition", and "subtraction". Herein, we will consider these operators +to be "PEMDAS operators", and we will also include the unary `+`, `-`, and `~` in +this category for convenience. The order of operations of PEMDAS operators is typically +taught in primary school and reinforced throughout a programmer's training and +experience, so it is assumed that most programmers are comfortable relying on the +implicit order of operations of expressions involving a few PEMDAS operations. Implicit +order of operations becomes less obvious as the number of distinct operator priority +levels increases and when multiple non-PEMDAS operators are involved. Portions of this +acronym, namely MD and AS, will be used below to refer to the corresponding operators. -## Notes +## Implementation -These formatting rules do not make any consideration in terms of performances -nor precision. The scope is limited to styling. +These rules are intended to respect and +complement the [PEP 8 standards](https://peps.python.org/pep-0008), such as using +[implied line continuation](https://peps.python.org/pep-0008/#maximum-line-length) and +and [breaking lines before binary operators](https://peps.python.org/pep-0008/#should-a-line-break-before-or-after-a-binary-operator). +Although examples do not show the use of hanging indent, any of the indentation styles +allowed by [PEP 8 Indentation](https://peps.python.org/pep-0008/#indentation) are +permitted by this SPEC. + +0. Unless otherwise specified, rely on the implicit order of operations; + i.e., do not add extraneous parentheses. For example, prefer `u**v + y**z` + over `(u**v) + (y**z)`, and prefer `x + y + z` over `(x + y) + z`. A full + list of implicit operator priority levels is given by + [Operator Precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence) +1. Always use the `**` operator and unary `+`, `-`, and `~` operators *without* + surrounding whitespace. For example, prefer `-x**4` over `- (x ** 4)`. +2. Always surround non-PEMDAS operators with whitespace, and always make the priority of + non-PEMDAS operators explicit. For example, prefer `(x == y) or (w == t)` over + `x==y or w==t`.[^1] +3. Always surround AS operators with whitespace. +4. Typically, surround MD operators with whitespace, except in the following situations. + - When there are lower-priority operators (namely AS) within the same compound + expression. For example, prefer `z = -x * y**t` over `z = -x*y**t`, but + prefer `z = w + x*y**t` over `z = w + x * y**t` due to the presence of the + lower-priority addition operator. + - The division operation would be written mathematically as a fraction with a + horizontal bar. For example, prefer `z = t/v * x/y` over `z = t / v * x / y` + if this would be written mathematically as the product of two fractions, + e.g. $\frac{t}{v} \cdot \frac{x}{y}. +5. Considering the previous rules, only `**`, `*`, `/`, and the unary `+`, `-`, and `~` + operators can appear in implicit subexpressions without spaces. In such expressions, + - Use at most one unary operator, and if used, ensure that it is the leftmost operator. + - Use at most one `**` operator, and if used, ensure that it is the rightmost operator. + + To achieve these goals, simplification or the addition of parentheses may be required. + For example: + - The expressions `--x` and `-~x` would be implicit subexpressions without spaces + containing more than one unary operator. The former can be simplified to `+x` or + simply `x`, and the latter requires explicit parentheses, i.e. `-(~x)`. + - The expression `x**y**z` would be an implicit subexpression without spaces + containing more than one `**` operator. This code would be executed as `x**(y**z)` + following the implicit order, but the explicit parentheses should be included for + clarity. + - In the expression `t**v*x**y + z`, no spaces are used around the multiplication + operator due to the presence of the lower-priority addition operator. However, + this would lead to `t**v*x**y` being an implicit subexpression without spaces + containing more than one `**` operator. This code would be executed as + `(t**v)*(x**y) + z`, but the explicit parentheses should be included for clarity. + - In the expression `z + x**y/w`, no spaces are used around the division operator + due to the presence of the lower-priority addition operator. However, this would + lead to `x**y/w` being an implicit subexpression without spaces containing `**` + to the left of another operator. Options for refactoring include the addition of + parentheses (e.g. `z + (x**y)/w`) or pre-multiplying the exponential by a + fraction (i.e. `x + 1/w*x**y`). +6. Simplify combinations of unary and binary `+` and `-` operators when possible. + For example, + - prefer `x + y` over `x + +y`, + - prefer `x + y` over `x - -y`, + - prefer `x - y` over `x - +y`, and + - prefer `x - y` over `x + -y`. +7. If required to satisfy other style requirements, include line breaks before + the outermost explicit subexpression possible. For example, if + `t + (w + (x + (y + z))))` must be broken, prefer + ```python3 + (t + + (w + (x + (y + z))))) + ``` + over + ```python3 + (t + (w + (x + (y + + z))))) + ``` + If there are multiple candidates, include the break at the first opportunity. +8. If line breaks must occur within a compound subexpression, the break should + be placed before the operator with lowest priority. For example, if + (x + y*z) must be broken, prefer + ```python3 + (x + + y*z) + ``` + over + ```python3 + (x + y + * z) + ``` + If there are multiple candidates, include the break at the first opportunity. +9. Any of the preceeding rules may be broken if there is a clear reason to do so. + - *Conflict with other style rules*. For example, there is not supposed to be + whitepace surrounding the `**` operator, but one can imagine a chain of `**` + operations that exhausts the character limit of a line. + - *Domain knowledge*. For instance, in the expression + `t = (x + y) - z`, it may be important to emphasize that the addition should be + performed first for numerical reasons or because `(x + y)` is a conceptually + important quantity. In such cases, consider adding a comment, e.g. + ```python3 + t = (x + y) - z # perform `x + y` first for precision + ``` + or breaking the expressions into separate logical lines, e.g. + ```python3 + w = x + y + t = w - z + ``` + +[^1]: There is a case for simply eliminating spaces to reinforce the implicit order + of operations, as in `x==y or w==t`. However, if this were the rule, following + the rule would require users to remember the full order of operations hierarchy + and apply it without mistakes. Use of explicit parentheses with non-PEMDAS + operators leads to simpler rules, is more explicit, and is not uncommon in + existing code.