Explain bounds proof failure caused by free variables in inferred/declared bounds#903
Explain bounds proof failure caused by free variables in inferred/declared bounds#903
Conversation
* Simplify logic and improve comments
…der declared bounds as well
clang/lib/Sema/SemaBounds.cpp
Outdated
| SrcRange.Dump(llvm::outs()); | ||
| #endif | ||
| ProofResult R = SrcRange.InRange(DeclaredRange, Cause, EquivExprs, Facts); | ||
| // For static bounds cast, we do not check free variables. |
There was a problem hiding this comment.
This comment is somewhat confusing since it's not clear that checking static bounds cast will always pass a nullptr EquivExprs parameter. I suggest moving this logic into the InRangeWithFreeVars function, so that in InRangeWithFreeVars, if EquivExprs is null, simply return InRange without checking any free variables.
| _Array_ptr<int> f15(unsigned num1, unsigned num2){ | ||
| _Array_ptr<int> p : byte_count(num1) = test_f13(num2); // expected-warning {{cannot prove declared bounds for 'p' are valid after initialization}} \ | ||
| _Array_ptr<int> p : byte_count(num1) = test_f13(num2); // expected-error {{it is not possible to prove that the inferred bounds of 'p' imply the declared bounds of 'p' after initialization}} \ | ||
| // expected-note {{the declared upper bounds use the value of the variable 'num1', and there is no relational information between 'num1' and any of the variables used by the inferred upper bounds}} \ |
There was a problem hiding this comment.
The notes about the bounds should be displayed before the note about lack of relational information. This will be confusing for a programmer to follow. This can be addressed in follow-up work.
clang/lib/Sema/SemaBounds.cpp
Outdated
| PartialOverlap = 0x80 // There was only partial overlap of the destination bounds with | ||
| PartialOverlap = 0x80, // There was only partial overlap of the destination bounds with | ||
| // the source bounds. | ||
| HasFreeVariables = 0x200, // Source or destination has free variables. |
There was a problem hiding this comment.
Nit: change this value to 0x100 and remove trailing comma.
… for the declared bounds (#913)
dtarditi
left a comment
There was a problem hiding this comment.
The algorithm seems to have subtle complications that I didn't realize when we first discussed the idea. The basic idea is just to look at sets of variables, but there are subtleties involving expressions using those variables. Expressions that use those variables may have relational information between each other and may appear in equivalent expression sets. This relational information is being lost when we boil things down to just sets of variables.
@kkjeer and @dopelsunce, it may be worth discussing next steps before making changes. I think this may address some of the error messages that we're seeing be generated now.
clang/lib/Sema/SemaBounds.cpp
Outdated
| FreeVariablePosition Pos2, | ||
| EquivExprSets *EquivExprs, | ||
| FreeVariableListTy &FreeVars) { | ||
| // If E1 or E2 accesses memory via poiter, we skip because we cannot |
There was a problem hiding this comment.
Typo in comment: poit->pointer.
clang/lib/Sema/SemaBounds.cpp
Outdated
| return ProofResult::Maybe; | ||
| } | ||
|
|
||
| // InRangeWithFreeVars is the same as InRange, but gathers free variables |
There was a problem hiding this comment.
I think this function needs a better comment. This function is an extension of InRange. It tries to prove that R is within this range. If R is provably not within this range, it sets Cause to explain reasons why. This function takes into account variables that are free in one of the lower/upper bounds expressions that are being compared.
InRange compares the lower bounds from the two ranges and then the upper bounds from the two ragnes. Given a pair of lower (or upper) bounds expressions, a variable is free if it appears in one expression but does not appear in the other expression, and there is no known relation between that variable and variables in the other expression. In that situation, it is impossible to prove that the comparison is true, given the facts that the compiler has.
clang/lib/Sema/SemaBounds.cpp
Outdated
| EquivExprSets *EquivExprs, | ||
| std::pair<ComparisonSet, ComparisonSet> &Facts, | ||
| FreeVariableListTy &FreeVariables) { | ||
| // If we do not have equivalence relational information (EquivExprs |
There was a problem hiding this comment.
I'd reword this comment:
If there is no relational information at all, then the compiler
didn't attempt to gather any. To avoid confusing
programmers, don't try to take free variables into account.
There may be some simple fact the compiler doesn't know that would cause
confusion about why the compiler is claiming that no fact is known.
| // CHECK: `-IntegerLiteral {{0x[0-9a-f]+}} 'int' 1 | ||
|
|
||
| _Array_ptr<int> r : count(1) = &p[0]; | ||
| _Array_ptr<int> r : count(1) = &p[0]; // expected-error {{it is not possible to prove that the inferred bounds of 'r' imply the declared bounds of 'r' after initialization}} |
There was a problem hiding this comment.
This seems unexpected to me that we can no longer prove this. We seem to be missing the fact that r equals p.
There was a problem hiding this comment.
The set EquivExprs contains { (_Array_ptr<int>)(&p[0]), r }. Without the changes in this PR, there would be a warning: "cannot prove declared bounds for 'r' are valid after initialization" since the bases p and r of the inferred bounds (p, p + 1) and declared bounds (r, r + 1) are not equal.
clang/lib/Sema/SemaBounds.cpp
Outdated
| return HasFreeVariables; | ||
| } | ||
|
|
||
| // AddFreeVariables maps each free variable in SrcVars w.r.t. DstVars |
There was a problem hiding this comment.
This comment isn't very clear. Given two sets of variables SrcVars and DstVars, for each variable v in SrcVars, it determines whether v occurs in DstVars. There are also some side conditions about v with respect to EquivExprs.
clang/lib/Sema/SemaBounds.cpp
Outdated
| return false; | ||
|
|
||
| bool HasFreeVariables = false; | ||
| EqualExprTy Vars1 = CollectVariableSet(S, E1); |
There was a problem hiding this comment.
This algorithm isn't correct because of its handling of equivalent expression sets. With equivalent expression sets, you may have two expressions that are known to be equal, such as "a.f == b.f", even though there is no equality of the variables used in the expressions. We don't know that a == b, for example.
We may need to generalize the notion of a variable v having relational information to another variable y. We need to check that there no expressions constructed from v or y that have relational information either.
clang/lib/Sema/SemaBounds.cpp
Outdated
|
|
||
| // EquivVars holds sets of DeclRefExpr and IntegerLiteral filtered from | ||
| // EquivExprs. | ||
| EquivExprSets EquivVars; |
There was a problem hiding this comment.
An implication of my comment is that we shouldn't be filtering EquivVars. We could be removing expressions that have equality relationships and that use variables.
clang/lib/Sema/SemaBounds.cpp
Outdated
| // EquivExprs that contains both V and a constant (i.e. an integer-typed | ||
| // value); and | ||
| // 2. for each variable U in DstVars, there is not set in EquivExprs | ||
| // that contains both V and U. |
There was a problem hiding this comment.
I think condition 2 needs to be amended to work in terms of uses of variables. There is not a set in EquivExprs that contains two different expressions, where one expression uses V and the other expression uses U.
| def note_free_variable_decl_or_inferred : Note< | ||
| "the %select{declared|inferred}0 %select{lower |upper |}1bounds use the value " | ||
| "of the variable '%2', and there is no relational information between '%2' " | ||
| "and any of the variables used by the %select{inferred|declared}0 " |
There was a problem hiding this comment.
This may need to be reworded. There is no relational information involving v and any of the expressions used by the other bounds. Perhaps something along the lines of
"the %select{declared|inferred}0 %slect{lower | upper |}bound use the variable %2' and there is no relational information involving '%2' and any of expressions used by the %select{inferred|declared}0 "
* Change IsEqualToConstant to search for an rvalue cast of a variable * Add FindVarRelationship method to search for an indirect relationship between variables * Remove EquivVars set that filters out everything except DeclRefExprs and IntegerLiterals * Remove expected free variables errors where there are indirect variable relationships * Update note messages to mention lack of relational information between a free variable and expressions * Remove comma from expected notes * Fix comments * Make free variable-related methods non-static * Move top-level free variable method CheckFreeVarInExprs above other free variable methods * Fix comment typo * Update CheckFreeVarsInExprs comment Co-authored-by: Katherine Kjeer <kkjeer@wellesley.edu>
This test was originally in the checkedc/tests/typechecking directory but was moved to checkedc-clang since it tests compiler implementation details (diagnostic messages).
dtarditi
left a comment
There was a problem hiding this comment.
There are a few suggestions around comments that I'd like to see resolved. Also, there is a merge conflict according to GitHub. Once both of these are done, I'll approve the merge.
clang/lib/Sema/SemaBounds.cpp
Outdated
| EquivExprSets *EquivExprs, | ||
| std::pair<ComparisonSet, ComparisonSet> &Facts, | ||
| FreeVariableListTy &FreeVariables) { | ||
| // If we do not have equivalence relational information (EquivExprs |
clang/lib/Sema/SemaBounds.cpp
Outdated
| return ProofResult::Maybe; | ||
| } | ||
|
|
||
| // InRangeWithFreeVars is the same as InRange, but gathers free variables |
| const EqualExprTy &GetVariableList() const { return VariableList; } | ||
|
|
||
| bool VisitDeclRefExpr(DeclRefExpr *E) { | ||
| if (!EqualExprsContainsExpr(SemaRef, VariableList, E, nullptr)) { |
There was a problem hiding this comment.
@kkjeer, is my comment still accurate? If so, could you leave a comment in the code about the algorithm being quadratic. I don't like to leave quadratic algorithms around in a compiler without calling them out.
…into non-equiv-exprs-in-bounds # Conflicts: # clang/lib/Sema/SemaBounds.cpp
We don't think it is important to the test, and since checkedc#903, it is causing a compiler error (which was previously a warning).
We don't think it is important to the test, and since checkedc#903, it is causing a compiler error (which was previously a warning).
This PR enables the compiler to emit an error when a free variable (defined below) is found in the declared/inferred bounds on bounds assignments, and passing arguments. A note is printed for each free variable found.
Free variables w.r.t. expressions
Assume the helper function
VarSet(E)returns the set of variables in expressionE.Given two expressions
SrcExprandDstExpr, a set of equivalent expression setsEQ. A variableVinVarSet(SrcExpr)is free if:Vdoes not equal to some constant)Vis not an integer type, or there is no set inEQthat contains a constant andV; and,Vdoes not have an alias inVarSet(DstExpr)) For eachUinVarSet(DstExpr), there is not set inEQthat contains bothVandU.Free variables w.r.t. declared/inferred bounds
We extend the above definition for free variables with respect to the lower (or upper bounds) of the declared bounds (or inferred bounds). In other words, a variable used by the declared lower bounds may be free even if it is used in the inferred upper bounds.
Affected cases
Free variable is detected in the following cases:
Note: we do not check free variables in static bounds cast.
Note for each free variable
We use the template: "note: the declared/inferred bounds use the value of the variable 'q', and there is no relational information between 'q' and any of the variables used by the inferred/declared bounds".