Skip to content

Attempt to make SPEC 12 complete and unambiguous#1

Merged
tupui merged 6 commits intotupui:spec_12from
mdhaber:spec_12
Aug 11, 2024
Merged

Attempt to make SPEC 12 complete and unambiguous#1
tupui merged 6 commits intotupui:spec_12from
mdhaber:spec_12

Conversation

@mdhaber
Copy link
Copy Markdown

@mdhaber mdhaber commented Jul 14, 2024

I think the only differences with the examples you have shown are:

dfdx = sign*(-2*x + 2*y + 2)

becomes

dfdx = sign * (-2*x + 2*y + 2)

and

c_i1j = (
    1./n**2.
    *np.prod(
        0.5*(2. + abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1
    )
)

becomes

c_i1j = (1./n**2. * np.prod(0.5 * (2. + abs(z_ij[i1, :]) 
                                   + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1))

(although I haven't said anything that would contradict Black's rules about parentheses and line breaks, I don't think, so there is some flexibility to include the surrounding parentheses on separate lines.)

Copy link
Copy Markdown
Author

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some self review.

Please also look for examples in which the rules are ambiguous or lead to unfortunate formatting!

Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md
)
An "explicit" expression is a code expression enclosed within parentheses or
otherwise syntactically separated from other expressions (i.e. by code other
than operators, whitespace, literals, or variables). For example, in the list
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this list is complete.

Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md
Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md
`i + 1` are explicit subexpressions of the expression `range(1, i + 1)`. `i` and
`1` are "implicit" subexpressions of `i + 1`: they could be written as explicit
subexpressions `(i)` and `(1)` without affecting the order of operations, but they
are not explicit as written.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this concept might still need refinement to make the rules unambiguous, but let's see if others find ambiguities before adding unnecessarily.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure about how detailed this should be as I would imagine it could depend on how formatter operate internally.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To some extent, I intended to write this to communicate the rules with the developers of Black and ruff. If it does not need to be as precise for readers, the formal definitions can be moved to the postscript for the interested reader.

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
@mdhaber
Copy link
Copy Markdown
Author

mdhaber commented Jul 14, 2024

Should probably also add a table of all operators with priority level. I forgot about several of them before looking it up. Update: done.

Copy link
Copy Markdown
Owner

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt!

Comment thread spec-0012/index.md
`i + 1` are explicit subexpressions of the expression `range(1, i + 1)`. `i` and
`1` are "implicit" subexpressions of `i + 1`: they could be written as explicit
subexpressions `(i)` and `(1)` without affecting the order of operations, but they
are not explicit as written.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure about how detailed this should be as I would imagine it could depend on how formatter operate internally.

Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md Outdated
Comment on lines +80 to +82
SPEC endeavor to use them where practical. These rules are intended to respect and
complement the PEP8 rules (relevant sections includes [id20](https://www.python.org/dev/peps/pep-0008/#id20)
and [id28](https://www.python.org/dev/peps/pep-0008/#id28)).
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is missing from this intro is if the rules are to be applied sequentially or if they are independent.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither? All are supposed to be satisfied simultaneously.

Comment thread spec-0012/index.md
i.e., do not add extraneous parentheses. For example, prefer `u**v + y**z`
over `(u**v) + (y**z)`, and prefer `x + y + z` over `(x + y) + z`.
1. Always use the `**` operator and unary `+`, `-`, and `~` operators *without*
surrounding whitespace. For example, prefer `-x**4` over `- (x ** 4)`.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer: I think we should be more direct, these are rules and there should be no ambiguity. And you have rule 10 to break things if needed.

With such document, the norm is to use something like the RFC-2119 https://datatracker.ietf.org/doc/html/rfc2119

Here and bellow with things like "should". This makes it too ambiguous to me as well. There should be one and only one was to do things. That's the goal of this doc, that it can be used by any formatter and it would give the same output consistently.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change "prefer... over" to "use... instead of"?

I read RFC-2119 to mean that "should" is the prefered use because these are not absolute requirements. There are exceptions as defined by rule 10.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucascolley what do you think on this point?

Comment thread spec-0012/index.md
horizontal bar. For example, prefer `z = t/v * x/y` over `z = t / v * x / y`
if this would be written mathematically as the product of two fractions,
e.g. $\frac{t}{v} \cdot \frac{x}{y}.
5. Considering the previous rules, only `**`, `*`, `/`, and the unary `+`, `-`, and `~`
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the previous rules

Linked to my comment saying that we should say if rules are to be applied in a specific order or not.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't get the "only". To me it looks like all operators are listed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the distinction between unary and binary +/- and the existence of many other operators.

Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md
Comment on lines +112 to +114
- The expressions `--x` and `-~x` would be implicit subexpressions without spaces
containing more than one unary operator. The former can be simplified to `+x` or
simply `x`, and the latter requires explicit parentheses, i.e. `-(~x)`.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave this for now and see what formatting folks tell us about the feasibility of such thing. Like if we go down that route, why not also recommend some operations vs other for precision issue? All that could seem a bit too far for a formatter though.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Note that these are all just examples of the implications of the two simple rules above that I thought were worth pointing out specifically.

Comment thread spec-0012/index.md
Comment on lines +140 to +141
(t
+ (w + (x + (y + z)))))
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could clarify that we don't specify how parenthesis should be handled and it's only about where to break on mathematical operations.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread spec-0012/index.md Outdated
* z)
```
If there are multiple candidates, include the break at the first opportunity.
9. The packages or subpackages from which mathematical functions and constants are used
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is out of scope to me. Also linter/formatters have different rules for imports and typically would not allow to import * and would complain about shadowing, etc.

In general this is covered by the famous Namespaces are one honking great idea -- let's do more of those!. And long live import scipy as sp (which is getting loads of upvote on SO 😜 )

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a last minute addition. I thought it might be worth broadening the scope a bit in an effort to make mathematical code more clear. Can we see what others think?

Note that it doesn't specifically disallow from _ import * or specify how namespaces should be used in general. It only requires that the namespace be unambiguous (if there are mathematical functions from multiple packages used in the same file) and show one way of achieving that as an example.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this guidance is useful, however it does seem to be a bit of a digression from the main topic of the SPEC. Could it become a separate proposal, or something like an appendix?

Comment thread spec-0012/index.md
`i + 1` are explicit subexpressions of the expression `range(1, i + 1)`. `i` and
`1` are "implicit" subexpressions of `i + 1`: they could be written as explicit
subexpressions `(i)` and `(1)` without affecting the order of operations, but they
are not explicit as written.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To some extent, I intended to write this to communicate the rules with the developers of Black and ruff. If it does not need to be as precise for readers, the formal definitions can be moved to the postscript for the interested reader.

Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md
i.e., do not add extraneous parentheses. For example, prefer `u**v + y**z`
over `(u**v) + (y**z)`, and prefer `x + y + z` over `(x + y) + z`.
1. Always use the `**` operator and unary `+`, `-`, and `~` operators *without*
surrounding whitespace. For example, prefer `-x**4` over `- (x ** 4)`.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change "prefer... over" to "use... instead of"?

I read RFC-2119 to mean that "should" is the prefered use because these are not absolute requirements. There are exceptions as defined by rule 10.

Comment thread spec-0012/index.md
horizontal bar. For example, prefer `z = t/v * x/y` over `z = t / v * x / y`
if this would be written mathematically as the product of two fractions,
e.g. $\frac{t}{v} \cdot \frac{x}{y}.
5. Considering the previous rules, only `**`, `*`, `/`, and the unary `+`, `-`, and `~`
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the distinction between unary and binary +/- and the existence of many other operators.

Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md Outdated
Comment thread spec-0012/index.md
Comment on lines +112 to +114
- The expressions `--x` and `-~x` would be implicit subexpressions without spaces
containing more than one unary operator. The former can be simplified to `+x` or
simply `x`, and the latter requires explicit parentheses, i.e. `-(~x)`.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Note that these are all just examples of the implications of the two simple rules above that I thought were worth pointing out specifically.

Comment thread spec-0012/index.md
Comment thread spec-0012/index.md Outdated
* z)
```
If there are multiple candidates, include the break at the first opportunity.
9. The packages or subpackages from which mathematical functions and constants are used
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a last minute addition. I thought it might be worth broadening the scope a bit in an effort to make mathematical code more clear. Can we see what others think?

Note that it doesn't specifically disallow from _ import * or specify how namespaces should be used in general. It only requires that the namespace be unambiguous (if there are mathematical functions from multiple packages used in the same file) and show one way of achieving that as an example.

@mdhaber
Copy link
Copy Markdown
Author

mdhaber commented Aug 11, 2024

@lucascolley I saw your post on the forum and I thought I'd draw your attention to this. It looks like you'll have some different opinions based on what you wrote - since we're trying to come up with a complete specification, please think about how they fit into the whole.

Copy link
Copy Markdown

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! I haven't checked that I would agree with every possible instance of these rules, but I would certainly be happy with them being implemented, modulo my inline disagreement.

Comment thread spec-0012/index.md
operator due to the presence of the lower-priority addition operator. However,
this would lead to `t**v*x**y` being an implicit subexpression without spaces
containing more than one `**` operator. This code would be executed as
`(t**v)*(x**y) + z`, but the explicit parentheses should be included for clarity.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread spec-0012/index.md
4. Typically, surround MD operators with whitespace, except in the following situations.
- When there are lower-priority operators (namely AS) within the same compound
expression. For example, prefer `z = -x * y**t` over `z = -x*y**t`, but
prefer `z = w + x*y**t` over `z = w + x * y**t` due to the presence of the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As below, I think there should be parentheses here to distinguish between * and **:

z = w + x*(y**t)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how would the rules format w * x * y**z + u ** v * q?

I think my preference is:

(w * x * y**z) + (u**v * q)

But I would accept:

(w*x*(y**z)) + ((u**v)*q)

Copy link
Copy Markdown
Author

@mdhaber mdhaber Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the presence of the lower priority operator + in the expression means that there will be no space around the multiplication operators. Also, ** is supposed to appear at the end of "implicit subexpression without spaces". So it would be w*x*y**z + q*u**v.

This is intended to be reminiscent of mathematics in which we would write
$wxy^z + qu^v$.

All operators with priority lower than PEMDAS operators need parentheses. But I didn't require parentheses in expressions that involve, say, multiplication and addition because I think people are familiar with the order of operations of such operators from elementary school and it is natural to them to omit the parentheses . I posit that it is just as natural to write w*x + y*z instead of (w*x) + (y*z) as it is to write w**x * y**z instead of (w**x) * (y**z). Similarly, I think putting ** at the end of subexpressions without spaces is enough to distinguish it visually from *.

The rules would be a bit simpler to express, actually, if we did require those parentheses like you want. But I'm not sure if that's very common in real code. I had ChatGPT write me a script to extract long math operations from the codebase.. maybe I'll resurrect that to see.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about w*x*(y**z) + q*(u**v)? I would be happy with that.

Copy link
Copy Markdown
Author

@mdhaber mdhaber Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a stronger rationale than what makes one of us happy to change that : )

Would you really write $qu^v$ as $q(u^v)$?

And you'll say no, but the superscript stands out so it's readable. Then I say ok, well I think the ** at the end stands out so it's readable.

Is what you're suggesting much more common in code already? That would be a good reason to change.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's more common, but indeed the problem for me is that it doesn't stand out. Especially for longer expressions such as

a + b*c*d*e**f

so we just disagree on

I think putting ** at the end of subexpressions without spaces is enough to distinguish it visually from *.

But yeah, I'm happy to be outvoted on this one. Just for the record!

Copy link
Copy Markdown
Author

@mdhaber mdhaber Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah, I think that's what we need is more opinions. Maybe we could add a limit on the number of chained operators in subexpressions without spaces, because I agree that b*c*d*e**f is not great style. But then maybe that's one of those cases where the last rule says - here, we can break a rule that has stopped making sense in this extreme situation.

Comment thread spec-0012/index.md Outdated
* z)
```
If there are multiple candidates, include the break at the first opportunity.
9. The packages or subpackages from which mathematical functions and constants are used
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this guidance is useful, however it does seem to be a bit of a digression from the main topic of the SPEC. Could it become a separate proposal, or something like an appendix?

@mdhaber
Copy link
Copy Markdown
Author

mdhaber commented Aug 11, 2024

@tupui I can take out the import guidance. Perhaps there can be a separate SPEC about that.

Are there other things here that we don't need other opinions about before changing?

@tupui
Copy link
Copy Markdown
Owner

tupui commented Aug 11, 2024

On my side this can go in if you're both happy. Next step then to me would be to make another wide announcement and give some time for feedback. At that point I would also ping Charlie and Lukas for their inputs since in the end they are the ones who will probably have to put this in Ruff and Black.

@mdhaber
Copy link
Copy Markdown
Author

mdhaber commented Aug 11, 2024

Let me make the fixes I've indicated here before merge. But I would start with Charlie and Lukas to see if these are precise enough to be implementable before announcing.

@tupui
Copy link
Copy Markdown
Owner

tupui commented Aug 11, 2024

Sounds good 👍

@mdhaber
Copy link
Copy Markdown
Author

mdhaber commented Aug 11, 2024

OK, I committed the inline comments and removed Rule 9 about imports. Let's see whether the Ruff/Black developers think these are close to being implementable.

@tupui
Copy link
Copy Markdown
Owner

tupui commented Aug 11, 2024

Ok thanks 🙏 merging and will ping them on the real MR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants