Conversation
WalkthroughAdds a directed-graph implementation and related container primitives (EFSet, PackedMap, defaults), graph ingestion from a DB, sampling utilities, SCC/component graph construction, Katz and closeness-like centralities (serial and parallel), and dependency updates. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant Algo as algo.ClosenessParallel
participant DG as container.DirectedGraph
participant Worker as Worker Goroutine
Caller->>Algo: ClosenessForDirectedUnweightedGraphParallel(digraph, direction, sampleFunc, n)
Algo->>DG: sampleIDs := sampleFunc(digraph, n)
Algo->>Worker: spawn NumCPU workers
loop enqueue sampled nodes
Algo->>Worker: send nodeID on jobs chan
end
par workers consume jobs
Worker->>DG: BFSTree(nodeID, direction)
Worker->>Worker: accumulate distanceSum
alt distanceSum > 0
Worker->>Algo: mutex-protected update scores[nodeID]
else
Worker-->>Worker: skip
end
and
Worker-->>Algo: worker done
end
Algo-->>Caller: return scores map
sequenceDiagram
autonumber
actor Client
participant Fetch as container.FetchDirectedGraph
participant DB as graph.Database
participant Tx as ReadTx
participant Ch as edge chan
participant DG as DirectedGraph
Client->>Fetch: FetchDirectedGraph(ctx, db, filter)
Fetch->>DG: dg := NewDirectedGraph()
Fetch->>Ch: start consumer goroutine (reads chan, dg.AddEdge)
Fetch->>DB: begin read transaction
DB->>Tx: tx result set
loop each row
Tx-->>Fetch: scan startID,endID
Fetch->>Ch: enqueue (startID,endID)
end
Fetch->>Ch: close channel
Fetch->>DG: wait consumer to finish
Fetch-->>Client: return dg or error
sequenceDiagram
autonumber
actor Caller
participant SCC as algo.StronglyConnectedComponents
participant DG as container.DirectedGraph
participant CG as ComponentDirectedGraph
Caller->>SCC: StronglyConnectedComponents(digraph, direction)
SCC->>DG: non-recursive Tarjan-like traversal (stack, on-stack bitmap)
SCC-->>Caller: components bitmaps, node->component map
Caller->>CG: NewComponentDirectedGraph(digraph, direction)
CG->>SCC: reuse components mapping
CG->>DG: add edges between component nodes based on original edges
CG-->>Caller: ComponentDirectedGraph
Estimated code review effort🎯 4 (Complex) | ⏱️ ~80 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 18
🧹 Nitpick comments (13)
cypher/models/pgsql/translate/projection.go (1)
458-482: Potential index mismatch when deriving GROUP BY from built projections.
Using projection[i] assumes a 1:1 mapping with currentPart.projections.Items. buildExternalProjection can return multiple SelectItems per logical item in the future, which would desynchronize indexes and produce incorrect GROUP BY columns.Consider deriving GROUP BY from the finalized projection slice directly, filtering out aggregates, instead of index-based lookup. Example approach (illustrative):
// After building 'projection' and detecting hasAggregates: if hasAggregates { nonAggregateExprs = nonAggregateExprs[:0] for _, projExpr := range projection { base := projExpr if aliased, ok := projExpr.(*pgsql.AliasedExpression); ok { base = aliased.Expression } // Skip aggregate function calls if fc, ok := unwrapParenthetical(base).(pgsql.FunctionCall); ok { if syms, err := GetAggregatedFunctionParameterSymbols(fc); err != nil { return err } else if !syms.IsEmpty() { continue } } nonAggregateExprs = append(nonAggregateExprs, base) } singlePartQuerySelect.GroupBy = nonAggregateExprs }go.mod (1)
7-7: Two xxhash libs in the graph; prefer one to reduce footprint.
You require both github.com/OneOfOne/xxhash and github.com/cespare/xxhash/v2. Unless both are needed for API compatibility, consider standardizing on one.If both remain, document rationale (performance, existing call sites).
container/defaults.go (1)
6-11: Defaults are extremely large; guard against accidental massive allocations.
DefaultNumMaxEntries=8e9 and DefaultNumBuckets=131072 can lead to outsized preallocations if used directly.
- Reduce defaults to sane dev values (e.g., 1e6 entries) and require explicit overrides for larger deployments.
- Ensure constructors interpret these as logical limits, not immediate allocation sizes. Add doc comments to clarify semantics.
- DefaultNumMaxEntries = 8_000_000_000 - DefaultNumBuckets = 1024 * 128 + // Tuned for dev; override in prod explicitly + DefaultNumMaxEntries = 1_000_000 + DefaultNumBuckets = 1 << 14 // 16384container/ef.go (3)
93-111: Bit indexing mixes 1-based offsets; standardize to 0-based to avoid off-by-ones.
The +1 adjustments in Set/Test increase cognitive load and risk errors.- s.upperBits.Set(upper + s.len + 1) + s.upperBits.Set(upper + s.len) ... - s.lowerBits.SetTo(lowerOffset+bitIdx+1, lowerBitValue > 0) + s.lowerBits.SetTo(lowerOffset+bitIdx, lowerBitValue > 0)If you keep 1-based layout for EF math, document it prominently.
112-120: Append order check allows duplicates; confirm intended.
value < s.last permits equal values; EFSet typically stores non-decreasing sequences. If strict increasing is required, change to <=.- if value < s.last { + if value <= s.last { return ErrUnorderedAppend }
180-200: Iterator lacks termination signaling; consider ok return.
Next() returns a value even after exhaustion; typical iterators return (val, ok). Without a size check, callers can read garbage after end.-type EFSetIterator struct { +type EFSetIterator struct { vec *EFSet upperIdx uint lowerIdx uint } ... -func (s *EFSetIterator) Next() uint64 { +func (s *EFSetIterator) Next() (uint64, bool) { + if s.lowerIdx >= s.vec.len { + return 0, false + } ... - return value + return value, true }container/pacmap.go (1)
93-100: Buffer compaction jobs to reduce sender blocking.Use a buffered channel sized to the number of allocated buckets.
Apply this diff:
func (s *PackedMap) CompactParallel(workers int) { var ( - jobC = make(chan *packedBucket) + jobC = make(chan *packedBucket, len(s.allocatedBuckets)) wg = &sync.WaitGroup{} )algo/sample.go (1)
3-10: Remove fmt import after eliminating debug print.Apply this diff:
import ( - "fmt" "math/rand" "sort"algo/closeness.go (1)
37-38: Buffer the work channel to avoid sender blocking.A buffer of nSamples reduces contention and tail latency.
Apply this diff:
- nodeC = make(chan uint64) + nodeC = make(chan uint64, nSamples)Confirm that BFSTree is safe for concurrent reads on the chosen DirectedGraph implementation.
container/digraph.go (1)
191-197: Expose read-only node iteration instead of returning the mutable set.Returning the internal Duplex allows external mutation of graph nodes, breaking invariants. Prefer a read-only view or only iteration APIs.
Consider removing Nodes() from the interface or returning an immutable wrapper; rely on EachNode for traversal.
algo/scc.go (3)
18-30: Use int for allocation hints (portability) and avoid negative surprises.initialAlloc is int64; make cap args generally use int. Cast to int to avoid implicit conversions and potential overflow on 32-bit.
Apply this diff:
- initialAlloc = int64(math.Sqrt(float64(numNodes))) + initialAlloc = int(math.Sqrt(float64(numNodes)))And ensure subsequent make calls use initialAlloc (type int).
124-129: Minor: consider documenting direction semantics for SCC construction.SCC sets are invariant to edge reversal, but the component graph edges depend on traversal direction; a short comment would help maintainers.
130-147: Avoid redundant component edges (optional).If multiple original edges induce the same component edge, you may want to de-duplicate by checking existence before AddEdge or by using a per-component bitmap to track emitted edges.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (12)
algo/closeness.go(1 hunks)algo/katz.go(1 hunks)algo/sample.go(1 hunks)algo/scc.go(1 hunks)algo/weight.go(1 hunks)container/defaults.go(1 hunks)container/digraph.go(1 hunks)container/ef.go(1 hunks)container/fetch.go(1 hunks)container/pacmap.go(1 hunks)cypher/models/pgsql/translate/projection.go(1 hunks)go.mod(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-14T01:22:17.837Z
Learnt from: zinic
PR: SpecterOps/DAWGS#10
File: database/v1compat/traversal/traversal.go:227-227
Timestamp: 2025-08-14T01:22:17.837Z
Learning: In database/v1compat/traversal/traversal.go, the fetchDirection variable should remain consistent during expansion. The fetchDirection represents the query perspective for retrieving relationships and must be the same for both current and next expansion steps to maintain proper traversal semantics.
Applied to files:
container/fetch.go
🧬 Code graph analysis (7)
algo/sample.go (2)
container/digraph.go (1)
DirectedGraph(11-21)graph/graph.go (1)
Direction(26-26)
algo/katz.go (3)
container/digraph.go (2)
DirectedGraph(11-21)Weight(8-8)algo/weight.go (1)
Weight(3-3)graph/graph.go (1)
DirectionBoth(18-18)
container/fetch.go (3)
graph/graph.go (3)
Database(367-406)ID(95-95)Transaction(309-347)graph/query.go (1)
Criteria(70-70)container/digraph.go (2)
DirectedGraph(11-21)NewDirectedGraph(29-35)
algo/scc.go (4)
container/digraph.go (2)
DirectedGraph(11-21)NewDirectedGraph(29-35)graph/graph.go (4)
Direction(26-26)ID(95-95)DirectionInbound(16-16)DirectionOutbound(17-17)cardinality/cardinality.go (1)
Duplex(46-58)cardinality/roaring64.go (1)
NewBitmap64(23-27)
container/pacmap.go (2)
container/defaults.go (4)
DefaultMaxValue(8-8)DefaultNumBuckets(10-10)DefaultNumMaxEntries(9-9)EntrySizeBytes(7-7)container/ef.go (2)
EFSet(14-24)CompressToEFSet(150-158)
algo/closeness.go (4)
container/digraph.go (2)
DirectedGraph(11-21)Weight(8-8)graph/graph.go (1)
Direction(26-26)algo/sample.go (1)
SampleFunc(12-12)algo/weight.go (1)
Weight(3-3)
container/digraph.go (4)
cardinality/cardinality.go (1)
Duplex(46-58)graph/graph.go (5)
ID(95-95)Direction(26-26)DirectionOutbound(17-17)DirectionInbound(16-16)DirectionBoth(18-18)cardinality/roaring64.go (1)
NewBitmap64(23-27)algo/weight.go (1)
Weight(3-3)
🔇 Additional comments (8)
algo/weight.go (1)
3-3: LGTM: shared Weight alias is clear and appropriate.
Consistent type aliasing will simplify API signatures across algo packages.cypher/models/pgsql/translate/projection.go (1)
442-444: Nit: variable alignment only; no behavior change.
No concerns with the formatting tweak.go.mod (3)
8-8: Confirm compatibility with RoaringBitmap v2.10.0
No direct usage of any changed generics or serialization helpers detected inroaring.New()or.Each(…); run the full test suite and verify downstream code compiles and serializes correctly.
11-11: No changes needed github.com/bits-and-blooms/bitset v1.24.0 is a valid release tag and includes the BitSet.Select method.
3-3: Verify CI Go version
No CI config files detected; ensure your CI/build toolchain uses Go ≥1.23.0 to avoid module or stdlib mismatches.container/ef.go (1)
172-179: Verify NextSet’s signature and handle its bool return
advanceHigherBitsPosition discards the second return from NextSet; ensure the bitset you’re using exposes NextSet(uint) (uint, bool) and explicitly check the bool flag to catch missing set bits.algo/scc.go (1)
11-116: Iterative Tarjan adaptation is non-trivial; please verify correctness with tests.Edge cases: multi-branch back-edges, deep nesting, and multiple roots. Recommend tests for:
- Single node w/ and w/o self-loop
- Two cycles connected by one-direction edge
- Tree (no cycles)
- Random DAG
I can generate Go unit tests covering these scenarios if helpful.
container/digraph.go (1)
152-170: Or method is part of the Provider interface, soClone().Or(...)is valid and needs no fallback.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (8)
container/pacmap.go (2)
33-42: Restore pre-compaction lookups and guard nil state
Findreturns nothing before the first compaction becauses.keysis nil, so we silently miss every append. We also risk panicking ifs.keys.Findreturns a rank larger than the currents.valueslength. Please short-circuit whens.keysis nil and fall back to scanningpendingKeys(while reading the matching value) so buckets behave correctly prior to compaction. The previous review highlighted this already.func (s *packedBucket) Find(key uint64) (uint64, bool) { - if s.keys != nil { - keyRank, found := s.keys.Find(key) - - if found { - return s.values[keyRank], true - } - } - - return 0, false + if s.keys != nil { + if keyRank, found := s.keys.Find(key); found && int(keyRank) < len(s.values) { + return s.values[keyRank], true + } + } + for i := range s.pendingKeys { + if s.pendingKeys[i] == key { + return s.pendingValueAt(i), true + } + } + return 0, false }Note:
pendingValueAt(or equivalent) needs to read the value aligned with the pending key; if you keep sharings.valuesfor both settled and pending entries, ensure you index the correct tail segment instead of overwriting historic slots.
50-68: Compact currently corrupts data (mutating comparator + dropping settled keys)Two independent bugs here:
- The
sort.Slicecomparator swapss.valuesentries.lessmust be pure; mutating within it violates the contract and produces undefined ordering.CompressToEFSet(s.pendingKeys)rebuilds the index from only the pending keys. Any keys that were previously compacted disappear, and ranks no longer line up withs.values(e.g., the first re-compaction returns the wrong value for the newest key and forgets earlier ones).You need to merge existing keys/values with the pending batch, sort using paired key/value structs (or parallel slices manipulated outside the comparator), and deduplicate so EFSet receives a strictly increasing sequence. Please rework
Compactaccordingly; the prior review already requested this.func (s *packedBucket) Compact() { if len(s.pendingKeys) == 0 { return } - // Sort by key and swap value positions so the two remain associated by their position - sort.Slice(s.pendingKeys, func(i, j int) bool { - if s.pendingKeys[i] < s.pendingKeys[j] { - s.values[i], s.values[j] = s.values[j], s.values[i] - return true - } - - return false - }) - - s.keys = CompressToEFSet(s.pendingKeys) - s.pendingKeys = nil - s.values = s.values[0:len(s.values):len(s.values)] + type kv struct { + k uint64 + v uint64 + } + + combined := make([]kv, 0, s.keyCount()+len(s.pendingKeys)) + combined = append(compactExisting(combined, s.keys, s.values)...) + for _, pending := range pendingPairs(s.pendingKeys, s.values) { + combined = append(combined, pending) + } + + sort.Slice(combined, func(i, j int) bool { return combined[i].k < combined[j].k }) + + keys := make([]uint64, 0, len(combined)) + vals := make([]uint64, 0, len(combined)) + for _, pair := range dedupeKV(combined) { + keys = append(keys, pair.k) + vals = append(vals, pair.v) + } + + s.keys = CompressToEFSet(keys) + s.values = vals + s.pendingKeys = s.pendingKeys[:0] }Fill in
keyCount,compactExisting,pendingPairs, anddedupeKV(or inline their logic) so that previously compacted entries survive and values stay aligned.container/fetch.go (1)
33-57: Still need to join the merge goroutine before returningWhen
ReadTransactionfails we now closeedgeC, but we still return beforemergeWG.Wait(). The goroutine drainingedgeCcan keep running (and still be mutatingdigraph) after the function exits, which is the original leak/race reported earlier. Please capture the transaction error, always close the channel and wait for the goroutine to finish, then return the error if set.- if err := graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error { + var readErr error + if readErr = graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error { return tx.Relationships().Filter(relationshipFilter).Query(func(results graph.Result) error { … - }); err != nil { - // Ensure that the edge channel is closed to prevent goroutine leaks - close(edgeC) - return nil, err - } - - close(edgeC) - mergeWG.Wait() - - return digraph, nil + }); readErr != nil { + // fall through so we still close + wait before returning the error + } + close(edgeC) + mergeWG.Wait() + if readErr != nil { + return nil, readErr + } + return digraph, nilalgo/scc.go (1)
4-79: Compilation fails:minis undefinedLines 56 and 78 call
min, but Go does not provide that helper and none is defined in this file. The code will not compile. Introduce a small integeriminhelper and use it in both places.import ( "math" @@ ) +func imin(a, b int) int { + if a < b { + return a + } + return b +} + @@ - lowLinks[nextCursor.id] = min(lowLinks[nextCursor.id], lowLinks[lastSearchedNodeID]) + lowLinks[nextCursor.id] = imin(lowLinks[nextCursor.id], lowLinks[lastSearchedNodeID]) @@ - lowLinks[nextCursor.id] = min(lowLinks[nextCursor.id], visitedIndex[nextBranchID]) + lowLinks[nextCursor.id] = imin(lowLinks[nextCursor.id], visitedIndex[nextBranchID])container/digraph.go (3)
55-89: BFSTree is still depth-first and yields wrong distancesThe traversal keeps a LIFO stack, so it’s performing DFS. That can overestimate shortest-path distances and revisit nodes out of order. Please switch to a FIFO queue and mark the root as visited before enqueuing, as suggested earlier.
func (s *directedGraph) BFSTree(nodeID uint64, direction graph.Direction) []ShortestPathTerminal { var ( visited = cardinality.NewBitmap64() - stack []ShortestPathTerminal + queue []ShortestPathTerminal terminals []ShortestPathTerminal ) - stack = append(stack, ShortestPathTerminal{ - NodeID: nodeID, - Distance: 0, - }) + visited.CheckedAdd(nodeID) + queue = append(queue, ShortestPathTerminal{NodeID: nodeID, Distance: 0}) - for len(stack) > 0 { - nextCursor := stack[len(stack)-1] - stack = stack[:len(stack)-1] + for len(queue) > 0 { + nextCursor := queue[0] + queue = queue[1:] s.EachAdjacent(nextCursor.NodeID, direction, func(adjacentNodeID uint64) bool { if visited.CheckedAdd(adjacentNodeID) { terminalCursor := ShortestPathTerminal{ NodeID: adjacentNodeID, Distance: nextCursor.Distance + 1, } - // If not visited, descend into this node next - stack = append(stack, terminalCursor) + queue = append(queue, terminalCursor) // This reachable node represents one of the shortest path terminals terminals = append(terminals, terminalCursor) } return true }) } return terminals }
149-151:NumEdgesreturns node count, not edge countThis should sum the outgoing adjacency sizes; returning the node count breaks any consumers expecting actual edge totals.
func (s *directedGraph) NumEdges() uint64 { - return s.nodes.Cardinality() + var edges uint64 + for _, adjacent := range s.outbound { + edges += adjacent.Cardinality() + } + return edges }
218-242:AddEdgestill calls non-existentID.Uint64()and skips CheckedAdd
graph.IDis auint64type; it has noUint64()method, so this will not compile. Additionally, useCheckedAddto stay within theDuplexinterface contract when inserting into the bitmaps.func (s *directedGraph) AddEdge(start, end graph.ID) { - var ( - startUint64 = start.Uint64() - endUint64 = end.Uint64() - ) - - if edgeBitmap, exists := s.outbound[startUint64]; exists { - edgeBitmap.Add(endUint64) - } else { - edgeBitmap = cardinality.NewBitmap64() - edgeBitmap.Add(endUint64) - - s.outbound[startUint64] = edgeBitmap - } - - if edgeBitmap, exists := s.inbound[endUint64]; exists { - edgeBitmap.Add(startUint64) - } else { - edgeBitmap = cardinality.NewBitmap64() - edgeBitmap.Add(startUint64) - - s.inbound[endUint64] = edgeBitmap - } - - s.nodes.Add(startUint64, endUint64) + startUint64 := uint64(start) + endUint64 := uint64(end) + + outboundAdj, exists := s.outbound[startUint64] + if !exists { + outboundAdj = cardinality.NewBitmap64() + s.outbound[startUint64] = outboundAdj + } + outboundAdj.CheckedAdd(endUint64) + + inboundAdj, exists := s.inbound[endUint64] + if !exists { + inboundAdj = cardinality.NewBitmap64() + s.inbound[endUint64] = inboundAdj + } + inboundAdj.CheckedAdd(startUint64) + + s.nodes.CheckedAdd(startUint64) + s.nodes.CheckedAdd(endUint64) }algo/sample.go (1)
55-79: Stride sampler degenerates to deterministic picksWith many common inputs (
NumNodes()/nSamples == 1) the stride becomes 1, so every node is selected in iteration order and the “random” sample is entirely deterministic. That undermines algorithms relying on unbiased sampling. Switching to standard reservoir sampling avoids the bias and keeps complexity linear.func SampleRandom(digraph container.DirectedGraph, nSamples int) []uint64 { - if numNodes := int(digraph.Nodes().Cardinality()); nSamples <= 0 || numNodes == 0 { - return nil - } else if nSamples > numNodes { - nSamples = numNodes - } - - var ( - samples = make([]uint64, 0, nSamples) - stride = digraph.Nodes().Cardinality() / uint64(nSamples) - counter = (rand.Uint64() % stride) + 1 - remainder = stride - counter - ) - - digraph.Nodes().Each(func(value uint64) bool { - if counter -= 1; counter == 0 { - samples = append(samples, value) - counter = (rand.Uint64() % stride) + 1 + remainder - remainder = stride - (counter - remainder) - } - - return len(samples) < nSamples - }) - - return samples + total := int(digraph.Nodes().Cardinality()) + if nSamples <= 0 || total == 0 { + return nil + } + if nSamples > total { + nSamples = total + } + + samples := make([]uint64, 0, nSamples) + i := 0 + digraph.EachNode(func(node uint64) bool { + if i < nSamples { + samples = append(samples, node) + } else { + j := rand.Intn(i + 1) + if j < nSamples { + samples[j] = node + } + } + i++ + return true + }) + + return samples }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (11)
algo/closeness.go(1 hunks)algo/katz.go(1 hunks)algo/sample.go(1 hunks)algo/scc.go(1 hunks)algo/weight.go(1 hunks)container/defaults.go(1 hunks)container/digraph.go(1 hunks)container/ef.go(1 hunks)container/fetch.go(1 hunks)container/pacmap.go(1 hunks)go.mod(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- container/defaults.go
- container/ef.go
- algo/weight.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-06T17:49:51.741Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.741Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not add components with no incident edges as vertices in the component digraph.
Applied to files:
algo/scc.go
📚 Learning: 2025-10-06T17:49:51.741Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.741Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not handle `graph.DirectionBoth` when building the component graph.
Applied to files:
algo/scc.go
🧬 Code graph analysis (7)
algo/scc.go (4)
container/digraph.go (2)
DirectedGraph(11-22)NewDirectedGraph(30-36)graph/graph.go (4)
Direction(26-26)ID(95-95)DirectionInbound(16-16)DirectionOutbound(17-17)cardinality/cardinality.go (1)
Duplex(46-58)cardinality/roaring64.go (1)
NewBitmap64(23-27)
algo/sample.go (2)
container/digraph.go (1)
DirectedGraph(11-22)graph/graph.go (1)
Direction(26-26)
container/fetch.go (3)
graph/graph.go (3)
Database(367-406)ID(95-95)Transaction(309-347)graph/query.go (1)
Criteria(70-70)container/digraph.go (2)
DirectedGraph(11-22)NewDirectedGraph(30-36)
container/digraph.go (3)
cardinality/cardinality.go (1)
Duplex(46-58)graph/graph.go (5)
ID(95-95)Direction(26-26)DirectionOutbound(17-17)DirectionInbound(16-16)DirectionBoth(18-18)cardinality/roaring64.go (1)
NewBitmap64(23-27)
container/pacmap.go (2)
container/defaults.go (3)
DefaultMaxValue(8-8)DefaultNumBuckets(10-10)DefaultNumMaxEntries(9-9)container/ef.go (2)
EFSet(14-24)CompressToEFSet(150-158)
algo/closeness.go (4)
container/digraph.go (2)
DirectedGraph(11-22)Weight(8-8)graph/graph.go (1)
Direction(26-26)algo/sample.go (1)
SampleFunc(11-11)algo/weight.go (1)
Weight(3-3)
algo/katz.go (3)
container/digraph.go (2)
DirectedGraph(11-22)Weight(8-8)algo/weight.go (1)
Weight(3-3)graph/graph.go (1)
DirectionBoth(18-18)
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (10)
container/ef.go (5)
28-45: Use the universe width when deriving Elias–Fano bit sizes
The entry/upper/lower bit widths are still derived fromuniverseMaxinstead of the actual range[universeMin, universeMax], so any non-zero minimum mis-sizes the partitions and corrupts lookups. Recompute the widths fromuniverseMax - universeMin + 1(with small-range guards) before producinglowerBitsMask.Apply this diff:
- logN = math.Ceil(math.Log2(float64(maxValues + 2))) - entryLength = math.Ceil(math.Log2(math.Max(float64(universeMax), 2))) - upperBitsLength = math.Ceil(math.Log2(math.Max(float64(maxValues), 2))) - lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength) + n = math.Max(float64(maxValues), 1) + uRange = math.Max(float64(universeMax-universeMin)+1, 2) + logN = math.Ceil(math.Log2(n + 1)) + entryLength = math.Ceil(math.Log2(uRange)) + upperBitsLength = math.Ceil(math.Log2(n)) + lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength)
52-71: Correct allocation math to use the universe range
AllocateEFSetrepeats the same absolute-universeMaxsizing error and also shiftsuniverseMaxdirectly when computingupperBitsetLength, which explodes onceuniverseMin > 0. Recalculate everything from the inclusive range and shiftuniverseMax-universeMin.Apply this diff:
- logN = math.Ceil(math.Log2(float64(maxValues + 2))) - entryLength = math.Ceil(math.Log2(math.Max(float64(universeMax), 2))) - upperBitsLength = math.Ceil(math.Log2(math.Max(float64(maxValues), 2))) - lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength) - lowerBitsMask = (uint64(1) << lowerBitsLength) - 1 - upperBitsetLength = maxValues + uint(universeMax>>lowerBitsLength) + 2 + n = math.Max(float64(maxValues), 1) + uRange = math.Max(float64(universeMax-universeMin)+1, 2) + logN = math.Ceil(math.Log2(n + 1)) + entryLength = math.Ceil(math.Log2(uRange)) + upperBitsLength = math.Ceil(math.Log2(n)) + lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength) + lowerBitsMask = (uint64(1) << lowerBitsLength) - 1 + universeSpan = universeMax - universeMin + upperBitsetLength = maxValues + uint(universeSpan>>lowerBitsLength) + 2
75-91: Replace the nonexistent.Selectwith a real nth-set helper
bitset.BitSetdoes not exposeSelect, so this code will not compile. We need to walk the set bits withNextSet(or similar) and computeupperfrom the position of the(index+1)‑th set bit.Apply this diff (and add the helper):
-func (s *EFSet) Lookup(index uint) uint64 { - var ( - upper = s.upperBits.Select(index) - index - 1 - lower uint64 = 0 - lowerIndex = index * s.lowerBitsLength - ) +func (s *EFSet) Lookup(index uint) uint64 { + if index >= s.len { + panic("efset: index out of range") + } + + upperPos, ok := s.selectNthSet(index + 1) + if !ok { + panic("efset: upper bits exhausted") + } + + var ( + upper = upperPos - (index + 1) + lower uint64 = 0 + lowerIndex = index * s.lowerBitsLength + ) @@ return (uint64(upper<<s.lowerBitsLength) | lower) + s.universeMin } + +func (s *EFSet) selectNthSet(n uint) (uint, bool) { + if n == 0 { + return 0, false + } + + pos := uint(0) + remaining := n + + for { + idx, ok := s.upperBits.NextSet(pos) + if !ok { + return 0, false + } + if remaining == 1 { + return idx, true + } + remaining-- + pos = idx + 1 + } +}
129-148: Handle empty sets before computinghighEntry
highEntry := s.len - 1underflows tomath.MaxUintwhen the set is empty, which cascades into out-of-bounds lookups. Guards.len == 0up front and bail before the binary search.Apply this diff:
func (s *EFSet) Find(target uint64) (uint, bool) { - var ( - lowEntry uint = 0 - highEntry = s.len - 1 - ) + if s.len == 0 { + return 0, false + } + + var ( + lowEntry uint = 0 + highEntry = s.len - 1 + )
150-156: Protect against empty input inCompressToEFSet
Dereferencingvalues[0]/values[len(values)-1]panics when asked to compress an empty slice. Return an empty set instead.Apply this diff:
func CompressToEFSet(values []uint64) *EFSet { + if len(values) == 0 { + return AllocateEFSet(0, 0, 0) + } + compactedVec := AllocateEFSet(values[0], values[len(values)-1], uint(len(values)))container/pacmap.go (1)
55-63: Keep the sort comparator side-effect free
The comparator swaps entries insidesort.Slice, which violates the contract and leavess.valuesin an undefined order—the sorter may call your predicate repeatedly on the same indices. Pair the keys with their values, sort that slice, then rebuild aligned slices.Apply this diff:
- // Sort by key and swap value positions so the two remain associated by their position - sort.Slice(s.pendingKeys, func(i, j int) bool { - if s.pendingKeys[i] < s.pendingKeys[j] { - s.values[i], s.values[j] = s.values[j], s.values[i] - return true - } - - return false - }) - - s.keys = CompressToEFSet(s.pendingKeys) - s.pendingKeys = nil - s.values = s.values[0:len(s.values):len(s.values)] + type kv struct { + key uint64 + val uint64 + } + + n := len(s.pendingKeys) + pairs := make([]kv, n) + for i := 0; i < n; i++ { + pairs[i] = kv{key: s.pendingKeys[i], val: s.values[i]} + } + + sort.Slice(pairs, func(i, j int) bool { return pairs[i].key < pairs[j].key }) + + s.pendingKeys = s.pendingKeys[:0] + for i := 0; i < n; i++ { + s.pendingKeys = append(s.pendingKeys, pairs[i].key) + s.values[i] = pairs[i].val + } + s.values = s.values[:n:n] + s.keys = CompressToEFSet(s.pendingKeys) + s.pendingKeys = nilcontainer/digraph.go (3)
61-95: Implement a real breadth-first traversal
BFSTreeis still using a LIFO stack and only marks nodes visited after dequeuing neighbors, so distances are depth-first and the source can be re-enqueued. Switch to a FIFO queue and mark the source visited up front so distances remain shortest.Suggested patch:
func (s *directedGraph) BFSTree(nodeID uint64, direction graph.Direction) []ShortestPathTerminal { var ( visited = cardinality.NewBitmap64() - stack []ShortestPathTerminal + queue []ShortestPathTerminal terminals []ShortestPathTerminal ) - stack = append(stack, ShortestPathTerminal{ + visited.CheckedAdd(nodeID) + queue = append(queue, ShortestPathTerminal{ NodeID: nodeID, Distance: 0, }) - for len(stack) > 0 { - nextCursor := stack[len(stack)-1] - stack = stack[:len(stack)-1] + for len(queue) > 0 { + nextCursor := queue[0] + queue = queue[1:] s.EachAdjacent(nextCursor.NodeID, direction, func(adjacentNodeID uint64) bool { if visited.CheckedAdd(adjacentNodeID) { terminalCursor := ShortestPathTerminal{ NodeID: adjacentNodeID, Distance: nextCursor.Distance + 1, } - // If not visited, descend into this node next - stack = append(stack, terminalCursor) + queue = append(queue, terminalCursor) // This reachable node represents one of the shortest path terminals terminals = append(terminals, terminalCursor) }
155-157: Return the actual edge count
NumEdgescurrently reports the number of nodes, breaking any metrics that rely on edge totals. Sum the outbound adjacency cardinalities instead.Suggested diff:
func (s *directedGraph) NumEdges() uint64 { - return s.nodes.Cardinality() + var count uint64 + for _, adjacent := range s.outbound { + count += adjacent.Cardinality() + } + return count }
224-248: FixAddEdgeID conversion and use safe bitmap inserts
graph.IDis an alias ofuint64; calling.Uint64()will not compile, andAddon the roaring bitmap isn’t part of the Duplex contract. Convert withuint64(...)and stick toCheckedAddfor both adjacency bitmaps ands.nodes.Apply this diff:
func (s *directedGraph) AddEdge(start, end graph.ID) { var ( - startUint64 = start.Uint64() - endUint64 = end.Uint64() + startUint64 = uint64(start) + endUint64 = uint64(end) ) if edgeBitmap, exists := s.outbound[startUint64]; exists { - edgeBitmap.Add(endUint64) + edgeBitmap.CheckedAdd(endUint64) } else { edgeBitmap = cardinality.NewBitmap64() - edgeBitmap.Add(endUint64) + edgeBitmap.CheckedAdd(endUint64) s.outbound[startUint64] = edgeBitmap } if edgeBitmap, exists := s.inbound[endUint64]; exists { - edgeBitmap.Add(startUint64) + edgeBitmap.CheckedAdd(startUint64) } else { edgeBitmap = cardinality.NewBitmap64() - edgeBitmap.Add(startUint64) + edgeBitmap.CheckedAdd(startUint64) s.inbound[endUint64] = edgeBitmap } - s.nodes.Add(startUint64, endUint64) + s.nodes.CheckedAdd(startUint64) + s.nodes.CheckedAdd(endUint64) }container/fetch.go (1)
33-60: Always wait for the merge worker before returning
On the error path you closeedgeCbut return without waiting, so the consumer goroutine can outlive the call (and keep mutatingdigraph). Wrap the close+wait in a deferred cleanup so both success and failure paths rendezvous cleanly.Example fix:
var ( edgeC = make(chan anonymousEdge, 4096) mergeWG = &sync.WaitGroup{} digraph = NewDirectedGraph() ) mergeWG.Add(1) - - go func() { + go func() { defer mergeWG.Done() @@ - if err := graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error { + defer func() { + close(edgeC) + mergeWG.Wait() + }() + + if err := graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error { @@ - }); err != nil { - // Ensure that the edge channel is closed to prevent goroutine leaks - close(edgeC) - return nil, err - } - - close(edgeC) - mergeWG.Wait() - - return digraph, nil + }); err != nil { + return nil, err + } + + return digraph, nil
🧹 Nitpick comments (1)
algo/closeness.go (1)
15-26: Consider extracting the shared BFS accumulation into a helper.Both functions duplicate the BFSTree traversal and score computation. Hoisting that into a small helper (e.g.,
computeCloseness(...) (Weight, bool)) keeps the serial and parallel paths in lockstep and makes future tweaks to the scoring rule (normalization, guards, etc.) a one-touch change.Also applies to: 47-62
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (11)
algo/closeness.go(1 hunks)algo/katz.go(1 hunks)algo/sample.go(1 hunks)algo/scc.go(1 hunks)algo/weight.go(1 hunks)container/defaults.go(1 hunks)container/digraph.go(1 hunks)container/ef.go(1 hunks)container/fetch.go(1 hunks)container/pacmap.go(1 hunks)go.mod(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- algo/weight.go
- container/defaults.go
- algo/sample.go
- go.mod
- algo/katz.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-06T17:49:51.741Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.741Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not add components with no incident edges as vertices in the component digraph.
Applied to files:
algo/scc.go
📚 Learning: 2025-10-06T17:49:51.741Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.741Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not handle `graph.DirectionBoth` when building the component graph.
Applied to files:
algo/scc.go
🧬 Code graph analysis (5)
algo/scc.go (4)
container/digraph.go (2)
DirectedGraph(11-22)NewDirectedGraph(30-36)graph/graph.go (4)
Direction(26-26)ID(95-95)DirectionInbound(16-16)DirectionOutbound(17-17)cardinality/cardinality.go (1)
Duplex(46-58)cardinality/roaring64.go (1)
NewBitmap64(23-27)
algo/closeness.go (4)
container/digraph.go (2)
DirectedGraph(11-22)Weight(8-8)graph/graph.go (1)
Direction(26-26)algo/sample.go (1)
SampleFunc(11-11)algo/weight.go (1)
Weight(3-3)
container/digraph.go (3)
cardinality/cardinality.go (1)
Duplex(46-58)graph/graph.go (5)
ID(95-95)Direction(26-26)DirectionOutbound(17-17)DirectionInbound(16-16)DirectionBoth(18-18)cardinality/roaring64.go (1)
NewBitmap64(23-27)
container/pacmap.go (2)
container/defaults.go (3)
DefaultMaxValue(8-8)DefaultNumBuckets(10-10)DefaultNumMaxEntries(9-9)container/ef.go (2)
EFSet(14-24)CompressToEFSet(150-158)
container/fetch.go (3)
graph/graph.go (3)
Database(367-406)ID(95-95)Transaction(309-347)graph/query.go (1)
Criteria(70-70)container/digraph.go (2)
DirectedGraph(11-22)NewDirectedGraph(30-36)
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (9)
container/ef.go (5)
28-37: Elias–Fano bit‑widths must use universe range, not absolute universeMaxentryLength/upperBitsLength should derive from U = (universeMax - universeMin + 1). Current math mis-sizes when universeMin != 0.
Apply:
func NewEFSet(universeMin, universeMax uint64, maxValues uint) *EFSet { - var ( - logN = math.Ceil(math.Log2(float64(maxValues + 2))) - entryLength = math.Ceil(math.Log2(math.Max(float64(universeMax), 2))) - upperBitsLength = math.Ceil(math.Log2(math.Max(float64(maxValues), 2))) - lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength) - lowerBitsMask = (uint64(1) << lowerBitsLength) - 1 + var ( + urange = (universeMax - universeMin) + 1 + nf = math.Max(float64(maxValues), 1) + logN = math.Ceil(math.Log2(nf + 1)) + entryLength = math.Ceil(math.Log2(math.Max(float64(urange), 2))) + upperBitsLength = math.Ceil(math.Log2(nf)) + lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength) + lowerBitsMask = (uint64(1) << lowerBitsLength) - 1 upperBitset = bitset.New(0) lowerBitset = bitset.New(0) )
52-63: Preallocation must also use universe range; fix upperBitsetLength and sizingUse U = universeMax - universeMin in upperBitsetLength; also preallocate lowerBits for 1‑based indexing (+1) you use when Set/Test.
Apply:
func AllocateEFSet(universeMin, universeMax uint64, maxValues uint) *EFSet { - var ( - logN = math.Ceil(math.Log2(float64(maxValues + 2))) - entryLength = math.Ceil(math.Log2(math.Max(float64(universeMax), 2))) - upperBitsLength = math.Ceil(math.Log2(math.Max(float64(maxValues), 2))) - lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength) - lowerBitsMask = (uint64(1) << lowerBitsLength) - 1 - upperBitsetLength = maxValues + uint(universeMax>>lowerBitsLength) + 2 - upperBitset = bitset.New(upperBitsetLength) - lowerBitset = bitset.New(maxValues * lowerBitsLength) - ) + var ( + u = universeMax - universeMin + urange = u + 1 + nf = math.Max(float64(maxValues), 1) + logN = math.Ceil(math.Log2(nf + 1)) + entryLength = math.Ceil(math.Log2(math.Max(float64(urange), 2))) + upperBitsLength = math.Ceil(math.Log2(nf)) + lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength) + lowerBitsMask = (uint64(1) << lowerBitsLength) - 1 + upperBitsetLength = maxValues + uint(u>>lowerBitsLength) + 2 + upperBitset = bitset.New(upperBitsetLength) + lowerBitset = bitset.New(maxValues*lowerBitsLength + 1) + )Optional: guard oversized allocations to avoid overflow on maxValues*lowerBitsLength.
93-109: Replace nonexistent BitSet.Select with safe nth‑set‑bit selectionbitset.BitSet has NextSet, not Select; current code won’t compile.
Apply:
func (s *EFSet) Lookup(index uint) uint64 { var ( - upper = s.upperBits.Select(index) - index - 1 + upperPos = s.selectNthSet(index + 1) + upper = upperPos - (index + 1) lower uint64 = 0 lowerIndex = index * s.lowerBitsLength )Add this method in the file:
// selectNthSet returns 0-based position of the n-th (1-based) set bit. func (s *EFSet) selectNthSet(n uint) uint { if n == 0 { return 0 } pos := uint(0) for { idx, ok := s.upperBits.NextSet(pos) if !ok { return pos } n-- if n == 0 { return idx } pos = idx + 1 } }#!/bin/bash # Confirm no remaining .Select calls on BitSet rg -nP --type=go '\.Select\s*\(' -C2
155-160: Fix underflow when EFSet is emptyhighEntry := s.len - 1 underflows for len==0.
Apply:
func (s *EFSet) Find(target uint64) (uint, bool) { - var ( - lowEntry uint = 0 - highEntry = s.len - 1 - ) + if s.len == 0 { + return 0, false + } + lowEntry := uint(0) + highEntry := s.len - 1
176-184: Guard empty input in CompressToEFSet (panic today)values[0] and values[len-1] panic on len==0.
Apply:
func CompressToEFSet(values []uint64) *EFSet { - compactedVec := AllocateEFSet(values[0], values[len(values)-1], uint(len(values))) + if len(values) == 0 { + return AllocateEFSet(0, 0, 0) + } + compactedVec := AllocateEFSet(values[0], values[len(values)-1], uint(len(values)))container/fetch.go (1)
33-63: Always close channel and wait before returning (avoid leak/race on error)On error you close(edgeC) but don’t Wait; the merge goroutine may still be running. Capture the error, close and wait in all paths, then return.
Apply:
- if err := graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error { + var readErr error + if readErr = graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error { return tx.Relationships().Filter(relationshipFilter).Query(func(results graph.Result) error { @@ - }); err != nil { - // Ensure that the edge channel is closed to prevent goroutine leaks - close(edgeC) - return nil, err - } - - close(edgeC) - mergeWG.Wait() - - return digraph, nil + }); readErr != nil { + // fall through to close and wait + } + close(edgeC) + mergeWG.Wait() + if readErr != nil { + return nil, readErr + } + return digraph, nilcontainer/digraph.go (3)
61-96: BFS uses LIFO stack (DFS semantics) and doesn't mark root visited.Lines 74-75 pop from the end of
stack, implementing DFS rather than BFS. This produces non-shortest distances. Additionally, the root isn't marked visited before traversal, allowing it to be enqueued as a terminal.Apply this diff:
func (s *directedGraph) BFSTree(nodeID uint64, direction graph.Direction) []ShortestPathTerminal { var ( visited = cardinality.NewBitmap64() - stack []ShortestPathTerminal + queue []ShortestPathTerminal terminals []ShortestPathTerminal ) - stack = append(stack, ShortestPathTerminal{ + // Mark root visited and enqueue + visited.CheckedAdd(nodeID) + queue = append(queue, ShortestPathTerminal{ NodeID: nodeID, Distance: 0, }) - for len(stack) > 0 { - nextCursor := stack[len(stack)-1] - stack = stack[:len(stack)-1] + for len(queue) > 0 { + nextCursor := queue[0] + queue = queue[1:] s.EachAdjacent(nextCursor.NodeID, direction, func(adjacentNodeID uint64) bool { if visited.CheckedAdd(adjacentNodeID) { terminalCursor := ShortestPathTerminal{ NodeID: adjacentNodeID, Distance: nextCursor.Distance + 1, } - // If not visited, descend into this node next - stack = append(stack, terminalCursor) + // Enqueue for BFS + queue = append(queue, terminalCursor) // This reachable node represents one of the shortest path terminals terminals = append(terminals, terminalCursor) } return true }) } return terminals }
155-157: NumEdges returns node count instead of edge count.This method incorrectly returns
s.nodes.Cardinality(), which counts nodes. It should sum the cardinalities of all outbound adjacency sets.Apply this diff:
func (s *directedGraph) NumEdges() uint64 { - return s.nodes.Cardinality() + var edges uint64 + for _, adj := range s.outbound { + edges += adj.Cardinality() + } + return edges }
224-249: Compilation error: graph.ID is uint64, not a struct with .Uint64() method.Lines 226-227 call
.Uint64()ongraph.ID, butgraph.IDis defined astype ID uint64(see relevant_code_snippets), so this will not compile. Additionally, preferCheckedAddfor consistency with the Duplex interface, and add nodes individually rather than using a variadic call.Apply this diff:
func (s *directedGraph) AddEdge(start, end graph.ID) { var ( - startUint64 = start.Uint64() - endUint64 = end.Uint64() + startUint64 = uint64(start) + endUint64 = uint64(end) ) if edgeBitmap, exists := s.outbound[startUint64]; exists { - edgeBitmap.Add(endUint64) + edgeBitmap.CheckedAdd(endUint64) } else { edgeBitmap = cardinality.NewBitmap64() - edgeBitmap.Add(endUint64) + edgeBitmap.CheckedAdd(endUint64) s.outbound[startUint64] = edgeBitmap } if edgeBitmap, exists := s.inbound[endUint64]; exists { - edgeBitmap.Add(startUint64) + edgeBitmap.CheckedAdd(startUint64) } else { edgeBitmap = cardinality.NewBitmap64() - edgeBitmap.Add(startUint64) + edgeBitmap.CheckedAdd(startUint64) s.inbound[endUint64] = edgeBitmap } - s.nodes.Add(startUint64, endUint64) + s.nodes.CheckedAdd(startUint64) + s.nodes.CheckedAdd(endUint64) }
🧹 Nitpick comments (7)
container/ef.go (1)
61-62: Avoid hidden reallocations of lowerBits due to 1‑based indexingYou write/read lowerBits at indices +1; preallocate maxValues*lowerBitsLength + 1 to prevent growth.
container/pacmap_test.go (1)
10-25: LGTMCovers Put → Compact → Get happy path. Consider adding a pre‑Compact Get case and a duplicate‑key overwrite case in a follow‑up.
algo/scc.go (1)
130-134: Intentional choice: include all components as verticesNoting this differs from an alternative design that omits isolated components; if intentional, consider a brief comment for future readers.
container/pacmap.go (4)
121-137: Compact() with no existing keys: consider dedup to enforce “last write wins”Current path stores duplicates as-is; EFSet.Find will return one rank arbitrarily. Recommend sorting (already done) and coalescing duplicates so the final value reflects the last pending write.
Apply:
for idx, nextKVPair := range s.pending { - s.keys.Append(nextKVPair.key) - s.values[idx] = nextKVPair.value + if idx > 0 && nextKVPair.key == s.pending[idx-1].key { + s.values[len(s.values)-1] = nextKVPair.value + } else { + s.keys.MustAppend(nextKVPair.key) + s.values[idx] = nextKVPair.value + } }
38-48: Optional: support lookups before first compactionFind returns false when s.keys is nil. If desired, also scan s.pending for the key.
161-185: CompactParallel: guard workers <= 0Avoid starting zero goroutines (deadlock on send).
Apply:
func (s *PackedMap) CompactParallel(workers int) { + if workers <= 0 { + workers = 1 + }
206-212: Get allocates buckets on missgetBucket creates a bucket on reads; consider a non-allocating fast path for Get.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (12)
algo/closeness.go(1 hunks)algo/katz.go(1 hunks)algo/sample.go(1 hunks)algo/scc.go(1 hunks)algo/weight.go(1 hunks)container/defaults.go(1 hunks)container/digraph.go(1 hunks)container/ef.go(1 hunks)container/fetch.go(1 hunks)container/pacmap.go(1 hunks)container/pacmap_test.go(1 hunks)go.mod(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- algo/weight.go
- algo/katz.go
- algo/closeness.go
- go.mod
- container/defaults.go
- algo/sample.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-06T17:49:51.771Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.771Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not add components with no incident edges as vertices in the component digraph.
Applied to files:
algo/scc.go
📚 Learning: 2025-10-06T17:49:51.771Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.771Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not handle `graph.DirectionBoth` when building the component graph.
Applied to files:
algo/scc.go
🧬 Code graph analysis (6)
container/pacmap_test.go (1)
container/pacmap.go (2)
NewPacMap(148-153)DefaultParameters(15-21)
algo/scc.go (4)
container/digraph.go (2)
DirectedGraph(11-22)NewDirectedGraph(30-36)graph/graph.go (4)
Direction(26-26)ID(95-95)DirectionInbound(16-16)DirectionOutbound(17-17)cardinality/cardinality.go (1)
Duplex(46-58)cardinality/roaring64.go (1)
NewBitmap64(23-27)
container/digraph.go (3)
cardinality/cardinality.go (1)
Duplex(46-58)graph/graph.go (5)
ID(95-95)Direction(26-26)DirectionOutbound(17-17)DirectionInbound(16-16)DirectionBoth(18-18)cardinality/roaring64.go (1)
NewBitmap64(23-27)
container/fetch.go (2)
graph/graph.go (3)
Database(367-406)ID(95-95)Transaction(309-347)container/digraph.go (2)
DirectedGraph(11-22)NewDirectedGraph(30-36)
container/pacmap.go (2)
container/defaults.go (3)
DefaultMaxValue(8-8)DefaultNumBuckets(10-10)DefaultNumMaxEntries(9-9)container/ef.go (2)
EFSet(16-26)NewEFSet(28-50)
container/ef.go (1)
cardinality/cardinality.go (1)
Iterator(39-42)
🔇 Additional comments (7)
container/digraph.go (7)
8-9: LGTM!Type aliases provide clear semantic meaning for weights and adjacency structures.
11-22: LGTM!The interface provides a clean, comprehensive API for directed graph operations.
24-36: LGTM!Constructor properly initializes the graph structure with empty adjacency maps and node bitmap.
38-41: LGTM!Structure clearly represents a terminal node with its shortest-path distance.
43-59: LGTM!The method now correctly respects the direction parameter by using
getAdjacentfor each node, addressing the previous concern.
98-149: LGTM!The normalization logic correctly reindexes nodes into a dense range and remaps all edges, returning the reverse mapping for translation.
159-222: LGTM!Helper methods correctly handle direction-based adjacency queries and iteration. The
getAdjacentmethod properly merges inbound/outbound sets for bidirectional queries.
Add approximate Katz and Closeness centrality implementations. Add an implementation of Tarjan's SCC algorithm.
Summary by CodeRabbit
New Features
Performance
Tests
Style
Chores