The is_CNF method has (at least) two uses:
- Figuring out whether a sentence is in CNF as described by the KCM, where variables can't occur more than once in a clause
- Figuring out whether a sentence can be fed into a SAT solver
These are at odds with each other. Near-CNF sentences with duplicate variables in a clause are safe for SAT solvers, but because we use is_CNF() internally, they're not used for those.
I'm working on caching sentence properties, and marking sentences as having properties in advance. The sentences produced by nnf.tseitin can safely be marked as CNF, but the sentences produced by dimacs.load (in CNF mode) can't be.
So a looser alternative to the current is_CNF would be useful, if only for internal purposes. (It would also be faster even without caching.)
The reason for the strict definition in the KCM seems to be to make DNF a subset of DNNF and keep CNF symmetrical with DNF. But outside the KCM, the looser definition of CNF (and DNF) seems to be the common one, e.g. Wikipedia, MathWorld.
I think people would expect is_CNF to use the looser definition after all, at least by default. Could the looser behavior give problems?
Would it be a good idea to give is_CNF and is_DNF a strict=False argument so you need to opt in to the KCM definition?
The
is_CNFmethod has (at least) two uses:These are at odds with each other. Near-CNF sentences with duplicate variables in a clause are safe for SAT solvers, but because we use
is_CNF()internally, they're not used for those.I'm working on caching sentence properties, and marking sentences as having properties in advance. The sentences produced by
nnf.tseitincan safely be marked as CNF, but the sentences produced bydimacs.load(in CNF mode) can't be.So a looser alternative to the current
is_CNFwould be useful, if only for internal purposes. (It would also be faster even without caching.)The reason for the strict definition in the KCM seems to be to make DNF a subset of DNNF and keep CNF symmetrical with DNF. But outside the KCM, the looser definition of CNF (and DNF) seems to be the common one, e.g. Wikipedia, MathWorld.
I think people would expect
is_CNFto use the looser definition after all, at least by default. Could the looser behavior give problems?Would it be a good idea to give
is_CNFandis_DNFastrict=Falseargument so you need to opt in to the KCM definition?