-
Notifications
You must be signed in to change notification settings - Fork 0
Attempt to make SPEC 12 complete and unambiguous #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f46a93e
5f4b20a
34aa825
63d45e6
f2a96a6
efa2ea8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,122 +4,182 @@ number: 12 | |
| date: 2024-06-06 | ||
| author: | ||
| - "Pamphile Roy <roy.pamphile@gmail.com>" | ||
| - "Matt Haberland <mhaberla@calpoly.edu>" | ||
| discussion: https://discuss.scientific-python.org/t/spec-12-formatting-mathematical-expressions | ||
| endorsed-by: | ||
| --- | ||
|
|
||
| ## Description | ||
|
|
||
| It is known that the PEP8 and other established styling documents are missing | ||
| guidelines about mathematical expressions. This leads to people coming with | ||
| their own interpretation and style. Standardizing the way we represent maths | ||
| would lead to the same benefits seen with "normal" code. It brings consistency | ||
| in the ecosystem improving the collaborative efforts. | ||
| [PEP 8](https://peps.python.org/pep-0008) | ||
| and other established styling documents do not include guidelines about | ||
| styling mathematical expressions. This leads to individual interpretation and | ||
| styles which may conflict with those of others. We seek to standardizing the | ||
| way we represent mathematics for the same reason we standardize other code: | ||
| it brings consistency to the ecosystem and allows collaborators to focus on | ||
| more important aspects of the code. | ||
|
|
||
| This SPEC standardize the formatting of mathematical expressions. | ||
|
|
||
| ## Implementation | ||
|
|
||
| The following rules must be followed. | ||
| These rules respect and complement the PEP8 (relevant sections includes | ||
| [id20](https://www.python.org/dev/peps/pep-0008/#id20) and | ||
| [id20](https://www.python.org/dev/peps/pep-0008/#id28)). | ||
|
|
||
| We define a _group_ as a collection of operators having the same priority. | ||
| e.g. `a + b + c` is a single group, `a + b * c` is composed of two groups `a` | ||
| and `b * c`. A group is also a collection delimited with parenthesis. | ||
| `(a + b * c)` is a group. And the whole expression by itself is a | ||
| group. | ||
|
|
||
| - There a space before and after `-` and `+`. Except if | ||
| the operator is used to define the sign of the number; | ||
|
|
||
| ``` | ||
| a + b | ||
| -a | ||
| ``` | ||
|
|
||
| - Within a group, if operators with different priorities are used, add whitespace around the operators with the lowest priority(ies). | ||
|
|
||
| ``` | ||
| a + b*c | ||
| ``` | ||
|
|
||
| - There is no space before and after `**`. | ||
|
|
||
| ``` | ||
| a**b | ||
| ``` | ||
|
|
||
| - There is no space before and after operators `*` and `/`. Only exception is if the expression consist of a single operator linking two groups with more than one | ||
| element. | ||
| ## Terminology | ||
|
|
||
| ``` | ||
| a*b | ||
| (a*b) * (c*d) | ||
| ``` | ||
|
|
||
| - Operators within a group are ordered from the lowest to the highest priority. | ||
| If this is technically an issue (e.g. restriction on the AST), add | ||
| parenthesis or spaces. | ||
|
|
||
| ``` | ||
| a/d*b**c | ||
| a*(b**c)/d | ||
| a*b**c / d | ||
| a * b**c / d | ||
| ``` | ||
|
|
||
| - When splitting an equation, new lines should start with the operator linking | ||
| the previous and next logical block. Single digit on a line are forbidden. | ||
|
|
||
| ``` | ||
| ( | ||
| a/b | ||
| + c*d | ||
| ) | ||
| ``` | ||
|
|
||
| ### Examples | ||
|
|
||
| ```python | ||
| # good | ||
| i = i + 1 | ||
| submitted += 1 | ||
| x = x*2 - 1 | ||
| hypot2 = x*x + y*y | ||
| c = (a + b) * (a - b) | ||
| dfdx = sign*(-2*x + 2*y + 2) | ||
| result = 2*x**2 + 3*x**(2/3) | ||
| y = 4*x**2 + 2*x + 1 | ||
| c_i1j = ( | ||
| 1./n**2. | ||
| *np.prod( | ||
| 0.5*(2. + abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1 | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| ```python | ||
| # bad | ||
| i = i + 1 | ||
| submitted += 1 | ||
| x = x * 2 - 1 | ||
| hypot2 = x * x + y * y | ||
| c = (a + b) * (a - b) | ||
| dfdx = sign * (-2 * x + 2 * y + 2) | ||
| result = 2 * x ** 2 + 3 * x ** (2 / 3) | ||
| y = 4 * x ** 2 + 2 * x + 1 | ||
| c_i1j = ( | ||
| 1.0 | ||
| / n ** 2.0 | ||
| * np.prod( | ||
| 0.5 * (2.0 + abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1 | ||
| ) | ||
| ) | ||
| An "explicit" expression is a code expression enclosed within parentheses or | ||
| otherwise syntactically separated from other expressions (i.e. by code other | ||
| than operators, whitespace, literals, or variables). For example, in the list | ||
| comprehension: | ||
| ```python3 | ||
| for j in range(1, i + 1) | ||
| ``` | ||
| The output expression `j` is one explicit expression and the input sequence | ||
| `range(1, i + 1)` is another. | ||
|
|
||
| A "subexpression" is subset of an expression that is either explicit or could | ||
| be made explicit (i.e. with parentheses) without affecting the order of | ||
| operations. In the example above, `j` and `range(1, i + 1)` can also be | ||
| referred to as explicit subexpressions of the whole expression, and `1` and | ||
| `i + 1` are explicit subexpressions of the expression `range(1, i + 1)`. `i` and | ||
| `1` are "implicit" subexpressions of `i + 1`: they could be written as explicit | ||
| subexpressions `(i)` and `(1)` without affecting the order of operations, but they | ||
| are not explicit as written. | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this concept might still need refinement to make the rules unambiguous, but let's see if others find ambiguities before adding unnecessarily.
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was not sure about how detailed this should be as I would imagine it could depend on how formatter operate internally.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To some extent, I intended to write this to communicate the rules with the developers of Black and ruff. If it does not need to be as precise for readers, the formal definitions can be moved to the postscript for the interested reader. |
||
|
|
||
| As another example, in `x + y*z`, `y*z` is a subexpression because it could be made | ||
| explicit as in `x + (y*z)` without changing the order of operations. However, `x + y` | ||
| would not be a subexpression because `(x + y)*z` would change the order of operations. | ||
| Note that `x + y*z` as a whole may also be referred to as a "subexpression" rather than | ||
| an "expression" even though `(x + y*z)` is not a proper subset of the whole. | ||
|
|
||
| A "simple" expression is an expression involving only one operator priority level | ||
| without considering the operators within explicit subexpressions. | ||
| A "compound" expression is an expression involving more than one operator | ||
| priority level without considering the contents of explicit subexpressions. | ||
| For example, | ||
| - `x + y - z` is a simple expression because `+` and `-` have the | ||
| same priority level. There are no explicit subexpressions to be ignored. | ||
| - `x * (y + z)` is also a simple expression because there is only one operator | ||
| between `x` and the explicit subexpression `(y + z)`; we ignore the contents - and | ||
| especially the operator - within the explicit subexpression; conceptually, it may | ||
| regarded as `(...)`. | ||
| - `x * y + z` is a compound expression; there are two operators and no explicit | ||
| subexpressions that can be ignored. | ||
|
|
||
| The acronym PEMDAS commonly refers to "parentheses", "exponentiation", "multiplication", | ||
| "division", "addition", and "subtraction". Herein, we will consider these operators | ||
| to be "PEMDAS operators", and we will also include the unary `+`, `-`, and `~` in | ||
| this category for convenience. The order of operations of PEMDAS operators is typically | ||
| taught in primary school and reinforced throughout a programmer's training and | ||
| experience, so it is assumed that most programmers are comfortable relying on the | ||
| implicit order of operations of expressions involving a few PEMDAS operations. Implicit | ||
| order of operations becomes less obvious as the number of distinct operator priority | ||
| levels increases and when multiple non-PEMDAS operators are involved. Portions of this | ||
| acronym, namely MD and AS, will be used below to refer to the corresponding operators. | ||
|
|
||
| ## Notes | ||
| ## Implementation | ||
|
|
||
| These formatting rules do not make any consideration in terms of performances | ||
| nor precision. The scope is limited to styling. | ||
| These rules are intended to respect and | ||
| complement the [PEP 8 standards](https://peps.python.org/pep-0008), such as using | ||
| [implied line continuation](https://peps.python.org/pep-0008/#maximum-line-length) and | ||
| and [breaking lines before binary operators](https://peps.python.org/pep-0008/#should-a-line-break-before-or-after-a-binary-operator). | ||
|
mdhaber marked this conversation as resolved.
|
||
| Although examples do not show the use of hanging indent, any of the indentation styles | ||
| allowed by [PEP 8 Indentation](https://peps.python.org/pep-0008/#indentation) are | ||
| permitted by this SPEC. | ||
|
|
||
| 0. Unless otherwise specified, rely on the implicit order of operations; | ||
| i.e., do not add extraneous parentheses. For example, prefer `u**v + y**z` | ||
| over `(u**v) + (y**z)`, and prefer `x + y + z` over `(x + y) + z`. A full | ||
| list of implicit operator priority levels is given by | ||
| [Operator Precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence) | ||
| 1. Always use the `**` operator and unary `+`, `-`, and `~` operators *without* | ||
| surrounding whitespace. For example, prefer `-x**4` over `- (x ** 4)`. | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
With such document, the norm is to use something like the RFC-2119 https://datatracker.ietf.org/doc/html/rfc2119 Here and bellow with things like "should". This makes it too ambiguous to me as well. There should be one and only one was to do things. That's the goal of this doc, that it can be used by any formatter and it would give the same output consistently.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can change "prefer... over" to "use... instead of"? I read RFC-2119 to mean that "should" is the prefered use because these are not absolute requirements. There are exceptions as defined by rule 10.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lucascolley what do you think on this point? |
||
| 2. Always surround non-PEMDAS operators with whitespace, and always make the priority of | ||
| non-PEMDAS operators explicit. For example, prefer `(x == y) or (w == t)` over | ||
| `x==y or w==t`.[^1] | ||
| 3. Always surround AS operators with whitespace. | ||
| 4. Typically, surround MD operators with whitespace, except in the following situations. | ||
| - When there are lower-priority operators (namely AS) within the same compound | ||
| expression. For example, prefer `z = -x * y**t` over `z = -x*y**t`, but | ||
| prefer `z = w + x*y**t` over `z = w + x * y**t` due to the presence of the | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As below, I think there should be parentheses here to distinguish between
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, how would the rules format I think my preference is:
But I would accept:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here, the presence of the lower priority operator This is intended to be reminiscent of mathematics in which we would write All operators with priority lower than PEMDAS operators need parentheses. But I didn't require parentheses in expressions that involve, say, multiplication and addition because I think people are familiar with the order of operations of such operators from elementary school and it is natural to them to omit the parentheses . I posit that it is just as natural to write The rules would be a bit simpler to express, actually, if we did require those parentheses like you want. But I'm not sure if that's very common in real code. I had ChatGPT write me a script to extract long math operations from the codebase.. maybe I'll resurrect that to see. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need a stronger rationale than what makes one of us happy to change that : ) Would you really write And you'll say no, but the superscript stands out so it's readable. Then I say ok, well I think the Is what you're suggesting much more common in code already? That would be a good reason to change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what's more common, but indeed the problem for me is that it doesn't stand out. Especially for longer expressions such as
so we just disagree on
But yeah, I'm happy to be outvoted on this one. Just for the record!
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And yeah, I think that's what we need is more opinions. Maybe we could add a limit on the number of chained operators in subexpressions without spaces, because I agree that |
||
| lower-priority addition operator. | ||
| - The division operation would be written mathematically as a fraction with a | ||
| horizontal bar. For example, prefer `z = t/v * x/y` over `z = t / v * x / y` | ||
| if this would be written mathematically as the product of two fractions, | ||
| e.g. $\frac{t}{v} \cdot \frac{x}{y}. | ||
| 5. Considering the previous rules, only `**`, `*`, `/`, and the unary `+`, `-`, and `~` | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Linked to my comment saying that we should say if rules are to be applied in a specific order or not.
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also don't get the "only". To me it looks like all operators are listed.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note the distinction between unary and binary |
||
| operators can appear in implicit subexpressions without spaces. In such expressions, | ||
| - Use at most one unary operator, and if used, ensure that it is the leftmost operator. | ||
| - Use at most one `**` operator, and if used, ensure that it is the rightmost operator. | ||
|
|
||
| To achieve these goals, simplification or the addition of parentheses may be required. | ||
| For example: | ||
| - The expressions `--x` and `-~x` would be implicit subexpressions without spaces | ||
| containing more than one unary operator. The former can be simplified to `+x` or | ||
| simply `x`, and the latter requires explicit parentheses, i.e. `-(~x)`. | ||
|
mdhaber marked this conversation as resolved.
Comment on lines
+113
to
+115
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can leave this for now and see what formatting folks tell us about the feasibility of such thing. Like if we go down that route, why not also recommend some operations vs other for precision issue? All that could seem a bit too far for a formatter though.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good. Note that these are all just examples of the implications of the two simple rules above that I thought were worth pointing out specifically. |
||
| - The expression `x**y**z` would be an implicit subexpression without spaces | ||
| containing more than one `**` operator. This code would be executed as `x**(y**z)` | ||
| following the implicit order, but the explicit parentheses should be included for | ||
| clarity. | ||
| - In the expression `t**v*x**y + z`, no spaces are used around the multiplication | ||
| operator due to the presence of the lower-priority addition operator. However, | ||
| this would lead to `t**v*x**y` being an implicit subexpression without spaces | ||
| containing more than one `**` operator. This code would be executed as | ||
| `(t**v)*(x**y) + z`, but the explicit parentheses should be included for clarity. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
| - In the expression `z + x**y/w`, no spaces are used around the division operator | ||
| due to the presence of the lower-priority addition operator. However, this would | ||
| lead to `x**y/w` being an implicit subexpression without spaces containing `**` | ||
| to the left of another operator. Options for refactoring include the addition of | ||
| parentheses (e.g. `z + (x**y)/w`) or pre-multiplying the exponential by a | ||
| fraction (i.e. `x + 1/w*x**y`). | ||
| 6. Simplify combinations of unary and binary `+` and `-` operators when possible. | ||
| For example, | ||
| - prefer `x + y` over `x + +y`, | ||
| - prefer `x + y` over `x - -y`, | ||
| - prefer `x - y` over `x - +y`, and | ||
| - prefer `x - y` over `x + -y`. | ||
| 7. If required to satisfy other style requirements, include line breaks before | ||
| the outermost explicit subexpression possible. For example, if | ||
| `t + (w + (x + (y + z))))` must be broken, prefer | ||
| ```python3 | ||
| (t | ||
| + (w + (x + (y + z))))) | ||
|
Comment on lines
+141
to
+142
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could clarify that we don't specify how parenthesis should be handled and it's only about where to break on mathematical operations.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See #1 (comment) |
||
| ``` | ||
| over | ||
| ```python3 | ||
| (t + (w + (x + (y | ||
| + z))))) | ||
| ``` | ||
| If there are multiple candidates, include the break at the first opportunity. | ||
| 8. If line breaks must occur within a compound subexpression, the break should | ||
| be placed before the operator with lowest priority. For example, if | ||
| (x + y*z) must be broken, prefer | ||
| ```python3 | ||
| (x | ||
| + y*z) | ||
| ``` | ||
| over | ||
| ```python3 | ||
| (x + y | ||
| * z) | ||
| ``` | ||
| If there are multiple candidates, include the break at the first opportunity. | ||
| 9. Any of the preceeding rules may be broken if there is a clear reason to do so. | ||
| - *Conflict with other style rules*. For example, there is not supposed to be | ||
| whitepace surrounding the `**` operator, but one can imagine a chain of `**` | ||
| operations that exhausts the character limit of a line. | ||
| - *Domain knowledge*. For instance, in the expression | ||
| `t = (x + y) - z`, it may be important to emphasize that the addition should be | ||
| performed first for numerical reasons or because `(x + y)` is a conceptually | ||
| important quantity. In such cases, consider adding a comment, e.g. | ||
| ```python3 | ||
| t = (x + y) - z # perform `x + y` first for precision | ||
| ``` | ||
| or breaking the expressions into separate logical lines, e.g. | ||
| ```python3 | ||
| w = x + y | ||
| t = w - z | ||
| ``` | ||
|
|
||
| [^1]: There is a case for simply eliminating spaces to reinforce the implicit order | ||
| of operations, as in `x==y or w==t`. However, if this were the rule, following | ||
| the rule would require users to remember the full order of operations hierarchy | ||
| and apply it without mistakes. Use of explicit parentheses with non-PEMDAS | ||
| operators leads to simpler rules, is more explicit, and is not uncommon in | ||
| existing code. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this list is complete.