Skip to content

Datastructure and Algorithm WIP#15

Merged
zinic merged 2 commits intomainfrom
assoc-exp
Oct 7, 2025
Merged

Datastructure and Algorithm WIP#15
zinic merged 2 commits intomainfrom
assoc-exp

Conversation

@zinic
Copy link
Copy Markdown
Contributor

@zinic zinic commented Sep 25, 2025

Summary by CodeRabbit

  • New Features

    • Added directed-graph analytics: closeness (single-threaded & parallel), Katz centrality, strongly connected components and a component-graph view.
    • Added node sampling utilities and graph construction/normalization/fetching from a database.
    • Introduced space-efficient data structures for sets and packed maps.
  • Performance

    • Parallelized closeness computation and parallel compaction for packed maps.
  • Tests

    • Added tests exercising the packed map.
  • Style

    • Minor whitespace cleanup in query translation.
  • Chores

    • Dependency updates (RoaringBitmap, bitset).

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 25, 2025

Walkthrough

Adds a directed-graph implementation and related container primitives (EFSet, PackedMap, defaults), graph ingestion from a DB, sampling utilities, SCC/component graph construction, Katz and closeness-like centralities (serial and parallel), and dependency updates.

Changes

Cohort / File(s) Summary
Algo: centrality, sampling, SCC
algo/closeness.go, algo/katz.go, algo/sample.go, algo/scc.go, algo/weight.go
New exported algorithms: closeness-like scores for directed unweighted graphs (serial and parallel), Katz centrality (power iteration), sampling helpers (SampleHighestDegrees, SampleRandom, SampleFunc), SCC computation and component-graph builder, and Weight alias.
Container: directed graph core + fetch
container/digraph.go, container/fetch.go
New DirectedGraph implementation and API (adjacency, degrees, BFSTree, Normalize, Dimensions, etc.) and FetchDirectedGraph to stream edges from a graph DB into a DirectedGraph via a producer/consumer pattern.
Container: compact sets & maps
container/ef.go, container/pacmap.go, container/pacmap_test.go, container/defaults.go
Adds EFSet compressed set with iterator and helpers; PackedMap (bucketed, batched append, compaction with parallel workers) and tests; default container sizing/constants.
SQL translation (formatting)
cypher/models/pgsql/translate/projection.go
Whitespace reformat of variable declarations; no semantic change.
Module dependencies
go.mod
Upgrades github.com/RoaringBitmap/roaring/v2; adds github.com/bits-and-blooms/bitset dependency and adjusts indirects.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Algo as algo.ClosenessParallel
  participant DG as container.DirectedGraph
  participant Worker as Worker Goroutine
  Caller->>Algo: ClosenessForDirectedUnweightedGraphParallel(digraph, direction, sampleFunc, n)
  Algo->>DG: sampleIDs := sampleFunc(digraph, n)
  Algo->>Worker: spawn NumCPU workers
  loop enqueue sampled nodes
    Algo->>Worker: send nodeID on jobs chan
  end
  par workers consume jobs
    Worker->>DG: BFSTree(nodeID, direction)
    Worker->>Worker: accumulate distanceSum
    alt distanceSum > 0
      Worker->>Algo: mutex-protected update scores[nodeID]
    else
      Worker-->>Worker: skip
    end
  and
    Worker-->>Algo: worker done
  end
  Algo-->>Caller: return scores map
Loading
sequenceDiagram
  autonumber
  actor Client
  participant Fetch as container.FetchDirectedGraph
  participant DB as graph.Database
  participant Tx as ReadTx
  participant Ch as edge chan
  participant DG as DirectedGraph

  Client->>Fetch: FetchDirectedGraph(ctx, db, filter)
  Fetch->>DG: dg := NewDirectedGraph()
  Fetch->>Ch: start consumer goroutine (reads chan, dg.AddEdge)
  Fetch->>DB: begin read transaction
  DB->>Tx: tx result set
  loop each row
    Tx-->>Fetch: scan startID,endID
    Fetch->>Ch: enqueue (startID,endID)
  end
  Fetch->>Ch: close channel
  Fetch->>DG: wait consumer to finish
  Fetch-->>Client: return dg or error
Loading
sequenceDiagram
  autonumber
  actor Caller
  participant SCC as algo.StronglyConnectedComponents
  participant DG as container.DirectedGraph
  participant CG as ComponentDirectedGraph

  Caller->>SCC: StronglyConnectedComponents(digraph, direction)
  SCC->>DG: non-recursive Tarjan-like traversal (stack, on-stack bitmap)
  SCC-->>Caller: components bitmaps, node->component map
  Caller->>CG: NewComponentDirectedGraph(digraph, direction)
  CG->>SCC: reuse components mapping
  CG->>DG: add edges between component nodes based on original edges
  CG-->>Caller: ComponentDirectedGraph
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~80 minutes

Poem

I thump through nodes and edges tonight,
I hop on SCCs till dawn's first light.
Katz hums softly, closeness speeds the race,
EF sets nestle in a tidy place.
Packed maps crunch—my whiskers twitch with glee. 🐇🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title “Datastructure and Algorithm WIP” is overly generic and marked as work‐in‐progress, failing to convey any specific changes or highlight the primary additions such as the new graph algorithms and container data structures. Please update the title to clearly summarize the main changes, for example “Add directed graph algorithms and container data structures” or a similarly concise description of the key features introduced.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch assoc-exp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🧹 Nitpick comments (13)
cypher/models/pgsql/translate/projection.go (1)

458-482: Potential index mismatch when deriving GROUP BY from built projections.
Using projection[i] assumes a 1:1 mapping with currentPart.projections.Items. buildExternalProjection can return multiple SelectItems per logical item in the future, which would desynchronize indexes and produce incorrect GROUP BY columns.

Consider deriving GROUP BY from the finalized projection slice directly, filtering out aggregates, instead of index-based lookup. Example approach (illustrative):

// After building 'projection' and detecting hasAggregates:
if hasAggregates {
    nonAggregateExprs = nonAggregateExprs[:0]
    for _, projExpr := range projection {
        base := projExpr
        if aliased, ok := projExpr.(*pgsql.AliasedExpression); ok {
            base = aliased.Expression
        }
        // Skip aggregate function calls
        if fc, ok := unwrapParenthetical(base).(pgsql.FunctionCall); ok {
            if syms, err := GetAggregatedFunctionParameterSymbols(fc); err != nil {
                return err
            } else if !syms.IsEmpty() {
                continue
            }
        }
        nonAggregateExprs = append(nonAggregateExprs, base)
    }
    singlePartQuerySelect.GroupBy = nonAggregateExprs
}
go.mod (1)

7-7: Two xxhash libs in the graph; prefer one to reduce footprint.
You require both github.com/OneOfOne/xxhash and github.com/cespare/xxhash/v2. Unless both are needed for API compatibility, consider standardizing on one.

If both remain, document rationale (performance, existing call sites).

container/defaults.go (1)

6-11: Defaults are extremely large; guard against accidental massive allocations.
DefaultNumMaxEntries=8e9 and DefaultNumBuckets=131072 can lead to outsized preallocations if used directly.

  • Reduce defaults to sane dev values (e.g., 1e6 entries) and require explicit overrides for larger deployments.
  • Ensure constructors interpret these as logical limits, not immediate allocation sizes. Add doc comments to clarify semantics.
- DefaultNumMaxEntries = 8_000_000_000
- DefaultNumBuckets    = 1024 * 128
+ // Tuned for dev; override in prod explicitly
+ DefaultNumMaxEntries = 1_000_000
+ DefaultNumBuckets    = 1 << 14 // 16384
container/ef.go (3)

93-111: Bit indexing mixes 1-based offsets; standardize to 0-based to avoid off-by-ones.
The +1 adjustments in Set/Test increase cognitive load and risk errors.

- s.upperBits.Set(upper + s.len + 1)
+ s.upperBits.Set(upper + s.len)
...
- s.lowerBits.SetTo(lowerOffset+bitIdx+1, lowerBitValue > 0)
+ s.lowerBits.SetTo(lowerOffset+bitIdx, lowerBitValue > 0)

If you keep 1-based layout for EF math, document it prominently.


112-120: Append order check allows duplicates; confirm intended.
value < s.last permits equal values; EFSet typically stores non-decreasing sequences. If strict increasing is required, change to <=.

- if value < s.last {
+ if value <= s.last {
     return ErrUnorderedAppend
 }

180-200: Iterator lacks termination signaling; consider ok return.
Next() returns a value even after exhaustion; typical iterators return (val, ok). Without a size check, callers can read garbage after end.

-type EFSetIterator struct {
+type EFSetIterator struct {
     vec      *EFSet
     upperIdx uint
     lowerIdx uint
 }
...
-func (s *EFSetIterator) Next() uint64 {
+func (s *EFSetIterator) Next() (uint64, bool) {
+    if s.lowerIdx >= s.vec.len {
+        return 0, false
+    }
     ...
-    return value
+    return value, true
}
container/pacmap.go (1)

93-100: Buffer compaction jobs to reduce sender blocking.

Use a buffered channel sized to the number of allocated buckets.

Apply this diff:

 func (s *PackedMap) CompactParallel(workers int) {
 	var (
-		jobC = make(chan *packedBucket)
+		jobC = make(chan *packedBucket, len(s.allocatedBuckets))
 		wg   = &sync.WaitGroup{}
 	)
algo/sample.go (1)

3-10: Remove fmt import after eliminating debug print.

Apply this diff:

 import (
-	"fmt"
 	"math/rand"
 	"sort"
algo/closeness.go (1)

37-38: Buffer the work channel to avoid sender blocking.

A buffer of nSamples reduces contention and tail latency.

Apply this diff:

-		nodeC      = make(chan uint64)
+		nodeC      = make(chan uint64, nSamples)

Confirm that BFSTree is safe for concurrent reads on the chosen DirectedGraph implementation.

container/digraph.go (1)

191-197: Expose read-only node iteration instead of returning the mutable set.

Returning the internal Duplex allows external mutation of graph nodes, breaking invariants. Prefer a read-only view or only iteration APIs.

Consider removing Nodes() from the interface or returning an immutable wrapper; rely on EachNode for traversal.

algo/scc.go (3)

18-30: Use int for allocation hints (portability) and avoid negative surprises.

initialAlloc is int64; make cap args generally use int. Cast to int to avoid implicit conversions and potential overflow on 32-bit.

Apply this diff:

-		initialAlloc                = int64(math.Sqrt(float64(numNodes)))
+		initialAlloc                = int(math.Sqrt(float64(numNodes)))

And ensure subsequent make calls use initialAlloc (type int).


124-129: Minor: consider documenting direction semantics for SCC construction.

SCC sets are invariant to edge reversal, but the component graph edges depend on traversal direction; a short comment would help maintainers.


130-147: Avoid redundant component edges (optional).

If multiple original edges induce the same component edge, you may want to de-duplicate by checking existence before AddEdge or by using a per-component bitmap to track emitted edges.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e3947c and 140af5b.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (12)
  • algo/closeness.go (1 hunks)
  • algo/katz.go (1 hunks)
  • algo/sample.go (1 hunks)
  • algo/scc.go (1 hunks)
  • algo/weight.go (1 hunks)
  • container/defaults.go (1 hunks)
  • container/digraph.go (1 hunks)
  • container/ef.go (1 hunks)
  • container/fetch.go (1 hunks)
  • container/pacmap.go (1 hunks)
  • cypher/models/pgsql/translate/projection.go (1 hunks)
  • go.mod (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-14T01:22:17.837Z
Learnt from: zinic
PR: SpecterOps/DAWGS#10
File: database/v1compat/traversal/traversal.go:227-227
Timestamp: 2025-08-14T01:22:17.837Z
Learning: In database/v1compat/traversal/traversal.go, the fetchDirection variable should remain consistent during expansion. The fetchDirection represents the query perspective for retrieving relationships and must be the same for both current and next expansion steps to maintain proper traversal semantics.

Applied to files:

  • container/fetch.go
🧬 Code graph analysis (7)
algo/sample.go (2)
container/digraph.go (1)
  • DirectedGraph (11-21)
graph/graph.go (1)
  • Direction (26-26)
algo/katz.go (3)
container/digraph.go (2)
  • DirectedGraph (11-21)
  • Weight (8-8)
algo/weight.go (1)
  • Weight (3-3)
graph/graph.go (1)
  • DirectionBoth (18-18)
container/fetch.go (3)
graph/graph.go (3)
  • Database (367-406)
  • ID (95-95)
  • Transaction (309-347)
graph/query.go (1)
  • Criteria (70-70)
container/digraph.go (2)
  • DirectedGraph (11-21)
  • NewDirectedGraph (29-35)
algo/scc.go (4)
container/digraph.go (2)
  • DirectedGraph (11-21)
  • NewDirectedGraph (29-35)
graph/graph.go (4)
  • Direction (26-26)
  • ID (95-95)
  • DirectionInbound (16-16)
  • DirectionOutbound (17-17)
cardinality/cardinality.go (1)
  • Duplex (46-58)
cardinality/roaring64.go (1)
  • NewBitmap64 (23-27)
container/pacmap.go (2)
container/defaults.go (4)
  • DefaultMaxValue (8-8)
  • DefaultNumBuckets (10-10)
  • DefaultNumMaxEntries (9-9)
  • EntrySizeBytes (7-7)
container/ef.go (2)
  • EFSet (14-24)
  • CompressToEFSet (150-158)
algo/closeness.go (4)
container/digraph.go (2)
  • DirectedGraph (11-21)
  • Weight (8-8)
graph/graph.go (1)
  • Direction (26-26)
algo/sample.go (1)
  • SampleFunc (12-12)
algo/weight.go (1)
  • Weight (3-3)
container/digraph.go (4)
cardinality/cardinality.go (1)
  • Duplex (46-58)
graph/graph.go (5)
  • ID (95-95)
  • Direction (26-26)
  • DirectionOutbound (17-17)
  • DirectionInbound (16-16)
  • DirectionBoth (18-18)
cardinality/roaring64.go (1)
  • NewBitmap64 (23-27)
algo/weight.go (1)
  • Weight (3-3)
🔇 Additional comments (8)
algo/weight.go (1)

3-3: LGTM: shared Weight alias is clear and appropriate.
Consistent type aliasing will simplify API signatures across algo packages.

cypher/models/pgsql/translate/projection.go (1)

442-444: Nit: variable alignment only; no behavior change.
No concerns with the formatting tweak.

go.mod (3)

8-8: Confirm compatibility with RoaringBitmap v2.10.0
No direct usage of any changed generics or serialization helpers detected in roaring.New() or .Each(…); run the full test suite and verify downstream code compiles and serializes correctly.


11-11: No changes needed github.com/bits-and-blooms/bitset v1.24.0 is a valid release tag and includes the BitSet.Select method.


3-3: Verify CI Go version
No CI config files detected; ensure your CI/build toolchain uses Go ≥1.23.0 to avoid module or stdlib mismatches.

container/ef.go (1)

172-179: Verify NextSet’s signature and handle its bool return
advanceHigherBitsPosition discards the second return from NextSet; ensure the bitset you’re using exposes NextSet(uint) (uint, bool) and explicitly check the bool flag to catch missing set bits.

algo/scc.go (1)

11-116: Iterative Tarjan adaptation is non-trivial; please verify correctness with tests.

Edge cases: multi-branch back-edges, deep nesting, and multiple roots. Recommend tests for:

  • Single node w/ and w/o self-loop
  • Two cycles connected by one-direction edge
  • Tree (no cycles)
  • Random DAG

I can generate Go unit tests covering these scenarios if helpful.

container/digraph.go (1)

152-170: Or method is part of the Provider interface, so Clone().Or(...) is valid and needs no fallback.

Likely an incorrect or invalid review comment.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (8)
container/pacmap.go (2)

33-42: Restore pre-compaction lookups and guard nil state

Find returns nothing before the first compaction because s.keys is nil, so we silently miss every append. We also risk panicking if s.keys.Find returns a rank larger than the current s.values length. Please short-circuit when s.keys is nil and fall back to scanning pendingKeys (while reading the matching value) so buckets behave correctly prior to compaction. The previous review highlighted this already.

 func (s *packedBucket) Find(key uint64) (uint64, bool) {
-	if s.keys != nil {
-		keyRank, found := s.keys.Find(key)
-
-		if found {
-			return s.values[keyRank], true
-		}
-	}
-
-	return 0, false
+	if s.keys != nil {
+		if keyRank, found := s.keys.Find(key); found && int(keyRank) < len(s.values) {
+			return s.values[keyRank], true
+		}
+	}
+	for i := range s.pendingKeys {
+		if s.pendingKeys[i] == key {
+			return s.pendingValueAt(i), true
+		}
+	}
+	return 0, false
 }

Note: pendingValueAt (or equivalent) needs to read the value aligned with the pending key; if you keep sharing s.values for both settled and pending entries, ensure you index the correct tail segment instead of overwriting historic slots.


50-68: Compact currently corrupts data (mutating comparator + dropping settled keys)

Two independent bugs here:

  1. The sort.Slice comparator swaps s.values entries. less must be pure; mutating within it violates the contract and produces undefined ordering.
  2. CompressToEFSet(s.pendingKeys) rebuilds the index from only the pending keys. Any keys that were previously compacted disappear, and ranks no longer line up with s.values (e.g., the first re-compaction returns the wrong value for the newest key and forgets earlier ones).

You need to merge existing keys/values with the pending batch, sort using paired key/value structs (or parallel slices manipulated outside the comparator), and deduplicate so EFSet receives a strictly increasing sequence. Please rework Compact accordingly; the prior review already requested this.

 func (s *packedBucket) Compact() {
 	if len(s.pendingKeys) == 0 {
 		return
 	}
 
-	// Sort by key and swap value positions so the two remain associated by their position
-	sort.Slice(s.pendingKeys, func(i, j int) bool {
-		if s.pendingKeys[i] < s.pendingKeys[j] {
-			s.values[i], s.values[j] = s.values[j], s.values[i]
-			return true
-		}
-
-		return false
-	})
-
-	s.keys = CompressToEFSet(s.pendingKeys)
-	s.pendingKeys = nil
-	s.values = s.values[0:len(s.values):len(s.values)]
+	type kv struct {
+		k uint64
+		v uint64
+	}
+
+	combined := make([]kv, 0, s.keyCount()+len(s.pendingKeys))
+	combined = append(compactExisting(combined, s.keys, s.values)...)
+	for _, pending := range pendingPairs(s.pendingKeys, s.values) {
+		combined = append(combined, pending)
+	}
+
+	sort.Slice(combined, func(i, j int) bool { return combined[i].k < combined[j].k })
+
+	keys := make([]uint64, 0, len(combined))
+	vals := make([]uint64, 0, len(combined))
+	for _, pair := range dedupeKV(combined) {
+		keys = append(keys, pair.k)
+		vals = append(vals, pair.v)
+	}
+
+	s.keys = CompressToEFSet(keys)
+	s.values = vals
+	s.pendingKeys = s.pendingKeys[:0]
 }

Fill in keyCount, compactExisting, pendingPairs, and dedupeKV (or inline their logic) so that previously compacted entries survive and values stay aligned.

container/fetch.go (1)

33-57: Still need to join the merge goroutine before returning

When ReadTransaction fails we now close edgeC, but we still return before mergeWG.Wait(). The goroutine draining edgeC can keep running (and still be mutating digraph) after the function exits, which is the original leak/race reported earlier. Please capture the transaction error, always close the channel and wait for the goroutine to finish, then return the error if set.

-	if err := graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error {
+	var readErr error
+	if readErr = graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error {
 		return tx.Relationships().Filter(relationshipFilter).Query(func(results graph.Result) error {
 			…
-	}); err != nil {
-		// Ensure that the edge channel is closed to prevent goroutine leaks
-		close(edgeC)
-		return nil, err
-	}
-
-	close(edgeC)
-	mergeWG.Wait()
-
-	return digraph, nil
+	}); readErr != nil {
+		// fall through so we still close + wait before returning the error
+	}
+	close(edgeC)
+	mergeWG.Wait()
+	if readErr != nil {
+		return nil, readErr
+	}
+	return digraph, nil
algo/scc.go (1)

4-79: Compilation fails: min is undefined

Lines 56 and 78 call min, but Go does not provide that helper and none is defined in this file. The code will not compile. Introduce a small integer imin helper and use it in both places.

 import (
 	"math"
@@
 )
 
+func imin(a, b int) int {
+	if a < b {
+		return a
+	}
+	return b
+}
+
@@
-				lowLinks[nextCursor.id] = min(lowLinks[nextCursor.id], lowLinks[lastSearchedNodeID])
+				lowLinks[nextCursor.id] = imin(lowLinks[nextCursor.id], lowLinks[lastSearchedNodeID])
@@
-					lowLinks[nextCursor.id] = min(lowLinks[nextCursor.id], visitedIndex[nextBranchID])
+					lowLinks[nextCursor.id] = imin(lowLinks[nextCursor.id], visitedIndex[nextBranchID])
container/digraph.go (3)

55-89: BFSTree is still depth-first and yields wrong distances

The traversal keeps a LIFO stack, so it’s performing DFS. That can overestimate shortest-path distances and revisit nodes out of order. Please switch to a FIFO queue and mark the root as visited before enqueuing, as suggested earlier.

 func (s *directedGraph) BFSTree(nodeID uint64, direction graph.Direction) []ShortestPathTerminal {
 	var (
 		visited   = cardinality.NewBitmap64()
-		stack     []ShortestPathTerminal
+		queue     []ShortestPathTerminal
 		terminals []ShortestPathTerminal
 	)
 
-	stack = append(stack, ShortestPathTerminal{
-		NodeID:   nodeID,
-		Distance: 0,
-	})
+	visited.CheckedAdd(nodeID)
+	queue = append(queue, ShortestPathTerminal{NodeID: nodeID, Distance: 0})
 
-	for len(stack) > 0 {
-		nextCursor := stack[len(stack)-1]
-		stack = stack[:len(stack)-1]
+	for len(queue) > 0 {
+		nextCursor := queue[0]
+		queue = queue[1:]
 
 		s.EachAdjacent(nextCursor.NodeID, direction, func(adjacentNodeID uint64) bool {
 			if visited.CheckedAdd(adjacentNodeID) {
 				terminalCursor := ShortestPathTerminal{
 					NodeID:   adjacentNodeID,
 					Distance: nextCursor.Distance + 1,
 				}
 
-				// If not visited, descend into this node next
-				stack = append(stack, terminalCursor)
+				queue = append(queue, terminalCursor)
 
 				// This reachable node represents one of the shortest path terminals
 				terminals = append(terminals, terminalCursor)
 			}
 
 			return true
 		})
 	}
 
 	return terminals
 }

149-151: NumEdges returns node count, not edge count

This should sum the outgoing adjacency sizes; returning the node count breaks any consumers expecting actual edge totals.

 func (s *directedGraph) NumEdges() uint64 {
-	return s.nodes.Cardinality()
+	var edges uint64
+	for _, adjacent := range s.outbound {
+		edges += adjacent.Cardinality()
+	}
+	return edges
 }

218-242: AddEdge still calls non-existent ID.Uint64() and skips CheckedAdd

graph.ID is a uint64 type; it has no Uint64() method, so this will not compile. Additionally, use CheckedAdd to stay within the Duplex interface contract when inserting into the bitmaps.

 func (s *directedGraph) AddEdge(start, end graph.ID) {
-	var (
-		startUint64 = start.Uint64()
-		endUint64   = end.Uint64()
-	)
-
-	if edgeBitmap, exists := s.outbound[startUint64]; exists {
-		edgeBitmap.Add(endUint64)
-	} else {
-		edgeBitmap = cardinality.NewBitmap64()
-		edgeBitmap.Add(endUint64)
-
-		s.outbound[startUint64] = edgeBitmap
-	}
-
-	if edgeBitmap, exists := s.inbound[endUint64]; exists {
-		edgeBitmap.Add(startUint64)
-	} else {
-		edgeBitmap = cardinality.NewBitmap64()
-		edgeBitmap.Add(startUint64)
-
-		s.inbound[endUint64] = edgeBitmap
-	}
-
-	s.nodes.Add(startUint64, endUint64)
+	startUint64 := uint64(start)
+	endUint64 := uint64(end)
+
+	outboundAdj, exists := s.outbound[startUint64]
+	if !exists {
+		outboundAdj = cardinality.NewBitmap64()
+		s.outbound[startUint64] = outboundAdj
+	}
+	outboundAdj.CheckedAdd(endUint64)
+
+	inboundAdj, exists := s.inbound[endUint64]
+	if !exists {
+		inboundAdj = cardinality.NewBitmap64()
+		s.inbound[endUint64] = inboundAdj
+	}
+	inboundAdj.CheckedAdd(startUint64)
+
+	s.nodes.CheckedAdd(startUint64)
+	s.nodes.CheckedAdd(endUint64)
 }
algo/sample.go (1)

55-79: Stride sampler degenerates to deterministic picks

With many common inputs (NumNodes()/nSamples == 1) the stride becomes 1, so every node is selected in iteration order and the “random” sample is entirely deterministic. That undermines algorithms relying on unbiased sampling. Switching to standard reservoir sampling avoids the bias and keeps complexity linear.

 func SampleRandom(digraph container.DirectedGraph, nSamples int) []uint64 {
-	if numNodes := int(digraph.Nodes().Cardinality()); nSamples <= 0 || numNodes == 0 {
-		return nil
-	} else if nSamples > numNodes {
-		nSamples = numNodes
-	}
-
-	var (
-		samples   = make([]uint64, 0, nSamples)
-		stride    = digraph.Nodes().Cardinality() / uint64(nSamples)
-		counter   = (rand.Uint64() % stride) + 1
-		remainder = stride - counter
-	)
-
-	digraph.Nodes().Each(func(value uint64) bool {
-		if counter -= 1; counter == 0 {
-			samples = append(samples, value)
-			counter = (rand.Uint64() % stride) + 1 + remainder
-			remainder = stride - (counter - remainder)
-		}
-
-		return len(samples) < nSamples
-	})
-
-	return samples
+	total := int(digraph.Nodes().Cardinality())
+	if nSamples <= 0 || total == 0 {
+		return nil
+	}
+	if nSamples > total {
+		nSamples = total
+	}
+
+	samples := make([]uint64, 0, nSamples)
+	i := 0
+	digraph.EachNode(func(node uint64) bool {
+		if i < nSamples {
+			samples = append(samples, node)
+		} else {
+			j := rand.Intn(i + 1)
+			if j < nSamples {
+				samples[j] = node
+			}
+		}
+		i++
+		return true
+	})
+
+	return samples
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 140af5b and 9adea34.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • algo/closeness.go (1 hunks)
  • algo/katz.go (1 hunks)
  • algo/sample.go (1 hunks)
  • algo/scc.go (1 hunks)
  • algo/weight.go (1 hunks)
  • container/defaults.go (1 hunks)
  • container/digraph.go (1 hunks)
  • container/ef.go (1 hunks)
  • container/fetch.go (1 hunks)
  • container/pacmap.go (1 hunks)
  • go.mod (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • container/defaults.go
  • container/ef.go
  • algo/weight.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-06T17:49:51.741Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.741Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not add components with no incident edges as vertices in the component digraph.

Applied to files:

  • algo/scc.go
📚 Learning: 2025-10-06T17:49:51.741Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.741Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not handle `graph.DirectionBoth` when building the component graph.

Applied to files:

  • algo/scc.go
🧬 Code graph analysis (7)
algo/scc.go (4)
container/digraph.go (2)
  • DirectedGraph (11-22)
  • NewDirectedGraph (30-36)
graph/graph.go (4)
  • Direction (26-26)
  • ID (95-95)
  • DirectionInbound (16-16)
  • DirectionOutbound (17-17)
cardinality/cardinality.go (1)
  • Duplex (46-58)
cardinality/roaring64.go (1)
  • NewBitmap64 (23-27)
algo/sample.go (2)
container/digraph.go (1)
  • DirectedGraph (11-22)
graph/graph.go (1)
  • Direction (26-26)
container/fetch.go (3)
graph/graph.go (3)
  • Database (367-406)
  • ID (95-95)
  • Transaction (309-347)
graph/query.go (1)
  • Criteria (70-70)
container/digraph.go (2)
  • DirectedGraph (11-22)
  • NewDirectedGraph (30-36)
container/digraph.go (3)
cardinality/cardinality.go (1)
  • Duplex (46-58)
graph/graph.go (5)
  • ID (95-95)
  • Direction (26-26)
  • DirectionOutbound (17-17)
  • DirectionInbound (16-16)
  • DirectionBoth (18-18)
cardinality/roaring64.go (1)
  • NewBitmap64 (23-27)
container/pacmap.go (2)
container/defaults.go (3)
  • DefaultMaxValue (8-8)
  • DefaultNumBuckets (10-10)
  • DefaultNumMaxEntries (9-9)
container/ef.go (2)
  • EFSet (14-24)
  • CompressToEFSet (150-158)
algo/closeness.go (4)
container/digraph.go (2)
  • DirectedGraph (11-22)
  • Weight (8-8)
graph/graph.go (1)
  • Direction (26-26)
algo/sample.go (1)
  • SampleFunc (11-11)
algo/weight.go (1)
  • Weight (3-3)
algo/katz.go (3)
container/digraph.go (2)
  • DirectedGraph (11-22)
  • Weight (8-8)
algo/weight.go (1)
  • Weight (3-3)
graph/graph.go (1)
  • DirectionBoth (18-18)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (10)
container/ef.go (5)

28-45: Use the universe width when deriving Elias–Fano bit sizes
The entry/upper/lower bit widths are still derived from universeMax instead of the actual range [universeMin, universeMax], so any non-zero minimum mis-sizes the partitions and corrupts lookups. Recompute the widths from universeMax - universeMin + 1 (with small-range guards) before producing lowerBitsMask.

Apply this diff:

-		logN            = math.Ceil(math.Log2(float64(maxValues + 2)))
-		entryLength     = math.Ceil(math.Log2(math.Max(float64(universeMax), 2)))
-		upperBitsLength = math.Ceil(math.Log2(math.Max(float64(maxValues), 2)))
-		lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength)
+		n               = math.Max(float64(maxValues), 1)
+		uRange          = math.Max(float64(universeMax-universeMin)+1, 2)
+		logN            = math.Ceil(math.Log2(n + 1))
+		entryLength     = math.Ceil(math.Log2(uRange))
+		upperBitsLength = math.Ceil(math.Log2(n))
+		lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength)

52-71: Correct allocation math to use the universe range
AllocateEFSet repeats the same absolute-universeMax sizing error and also shifts universeMax directly when computing upperBitsetLength, which explodes once universeMin > 0. Recalculate everything from the inclusive range and shift universeMax-universeMin.

Apply this diff:

-		logN              = math.Ceil(math.Log2(float64(maxValues + 2)))
-		entryLength       = math.Ceil(math.Log2(math.Max(float64(universeMax), 2)))
-		upperBitsLength   = math.Ceil(math.Log2(math.Max(float64(maxValues), 2)))
-		lowerBitsLength   = uint(math.Max(entryLength, logN) - upperBitsLength)
-		lowerBitsMask     = (uint64(1) << lowerBitsLength) - 1
-		upperBitsetLength = maxValues + uint(universeMax>>lowerBitsLength) + 2
+		n                 = math.Max(float64(maxValues), 1)
+		uRange            = math.Max(float64(universeMax-universeMin)+1, 2)
+		logN              = math.Ceil(math.Log2(n + 1))
+		entryLength       = math.Ceil(math.Log2(uRange))
+		upperBitsLength   = math.Ceil(math.Log2(n))
+		lowerBitsLength   = uint(math.Max(entryLength, logN) - upperBitsLength)
+		lowerBitsMask     = (uint64(1) << lowerBitsLength) - 1
+		universeSpan      = universeMax - universeMin
+		upperBitsetLength = maxValues + uint(universeSpan>>lowerBitsLength) + 2

75-91: Replace the nonexistent .Select with a real nth-set helper
bitset.BitSet does not expose Select, so this code will not compile. We need to walk the set bits with NextSet (or similar) and compute upper from the position of the (index+1)‑th set bit.

Apply this diff (and add the helper):

-func (s *EFSet) Lookup(index uint) uint64 {
-	var (
-		upper             = s.upperBits.Select(index) - index - 1
-		lower      uint64 = 0
-		lowerIndex        = index * s.lowerBitsLength
-	)
+func (s *EFSet) Lookup(index uint) uint64 {
+	if index >= s.len {
+		panic("efset: index out of range")
+	}
+
+	upperPos, ok := s.selectNthSet(index + 1)
+	if !ok {
+		panic("efset: upper bits exhausted")
+	}
+
+	var (
+		upper             = upperPos - (index + 1)
+		lower      uint64 = 0
+		lowerIndex        = index * s.lowerBitsLength
+	)
@@
 	return (uint64(upper<<s.lowerBitsLength) | lower) + s.universeMin
 }
+
+func (s *EFSet) selectNthSet(n uint) (uint, bool) {
+	if n == 0 {
+		return 0, false
+	}
+
+	pos := uint(0)
+	remaining := n
+
+	for {
+		idx, ok := s.upperBits.NextSet(pos)
+		if !ok {
+			return 0, false
+		}
+		if remaining == 1 {
+			return idx, true
+		}
+		remaining--
+		pos = idx + 1
+	}
+}

129-148: Handle empty sets before computing highEntry
highEntry := s.len - 1 underflows to math.MaxUint when the set is empty, which cascades into out-of-bounds lookups. Guard s.len == 0 up front and bail before the binary search.

Apply this diff:

 func (s *EFSet) Find(target uint64) (uint, bool) {
-	var (
-		lowEntry  uint = 0
-		highEntry      = s.len - 1
-	)
+	if s.len == 0 {
+		return 0, false
+	}
+
+	var (
+		lowEntry  uint = 0
+		highEntry      = s.len - 1
+	)

150-156: Protect against empty input in CompressToEFSet
Dereferencing values[0]/values[len(values)-1] panics when asked to compress an empty slice. Return an empty set instead.

Apply this diff:

 func CompressToEFSet(values []uint64) *EFSet {
+	if len(values) == 0 {
+		return AllocateEFSet(0, 0, 0)
+	}
+
 	compactedVec := AllocateEFSet(values[0], values[len(values)-1], uint(len(values)))
container/pacmap.go (1)

55-63: Keep the sort comparator side-effect free
The comparator swaps entries inside sort.Slice, which violates the contract and leaves s.values in an undefined order—the sorter may call your predicate repeatedly on the same indices. Pair the keys with their values, sort that slice, then rebuild aligned slices.

Apply this diff:

-	// Sort by key and swap value positions so the two remain associated by their position
-	sort.Slice(s.pendingKeys, func(i, j int) bool {
-		if s.pendingKeys[i] < s.pendingKeys[j] {
-			s.values[i], s.values[j] = s.values[j], s.values[i]
-			return true
-		}
-
-		return false
-	})
-
-	s.keys = CompressToEFSet(s.pendingKeys)
-	s.pendingKeys = nil
-	s.values = s.values[0:len(s.values):len(s.values)]
+	type kv struct {
+		key uint64
+		val uint64
+	}
+
+	n := len(s.pendingKeys)
+	pairs := make([]kv, n)
+	for i := 0; i < n; i++ {
+		pairs[i] = kv{key: s.pendingKeys[i], val: s.values[i]}
+	}
+
+	sort.Slice(pairs, func(i, j int) bool { return pairs[i].key < pairs[j].key })
+
+	s.pendingKeys = s.pendingKeys[:0]
+	for i := 0; i < n; i++ {
+		s.pendingKeys = append(s.pendingKeys, pairs[i].key)
+		s.values[i] = pairs[i].val
+	}
+	s.values = s.values[:n:n]
+	s.keys = CompressToEFSet(s.pendingKeys)
+	s.pendingKeys = nil
container/digraph.go (3)

61-95: Implement a real breadth-first traversal
BFSTree is still using a LIFO stack and only marks nodes visited after dequeuing neighbors, so distances are depth-first and the source can be re-enqueued. Switch to a FIFO queue and mark the source visited up front so distances remain shortest.

Suggested patch:

 func (s *directedGraph) BFSTree(nodeID uint64, direction graph.Direction) []ShortestPathTerminal {
 	var (
 		visited   = cardinality.NewBitmap64()
-		stack     []ShortestPathTerminal
+		queue     []ShortestPathTerminal
 		terminals []ShortestPathTerminal
 	)
 
-	stack = append(stack, ShortestPathTerminal{
+	visited.CheckedAdd(nodeID)
+	queue = append(queue, ShortestPathTerminal{
 		NodeID:   nodeID,
 		Distance: 0,
 	})
 
-	for len(stack) > 0 {
-		nextCursor := stack[len(stack)-1]
-		stack = stack[:len(stack)-1]
+	for len(queue) > 0 {
+		nextCursor := queue[0]
+		queue = queue[1:]
 
 		s.EachAdjacent(nextCursor.NodeID, direction, func(adjacentNodeID uint64) bool {
 			if visited.CheckedAdd(adjacentNodeID) {
 				terminalCursor := ShortestPathTerminal{
 					NodeID:   adjacentNodeID,
 					Distance: nextCursor.Distance + 1,
 				}
 
-				// If not visited, descend into this node next
-				stack = append(stack, terminalCursor)
+				queue = append(queue, terminalCursor)
 
 				// This reachable node represents one of the shortest path terminals
 				terminals = append(terminals, terminalCursor)
 			}

155-157: Return the actual edge count
NumEdges currently reports the number of nodes, breaking any metrics that rely on edge totals. Sum the outbound adjacency cardinalities instead.

Suggested diff:

 func (s *directedGraph) NumEdges() uint64 {
-	return s.nodes.Cardinality()
+	var count uint64
+	for _, adjacent := range s.outbound {
+		count += adjacent.Cardinality()
+	}
+	return count
 }

224-248: Fix AddEdge ID conversion and use safe bitmap inserts
graph.ID is an alias of uint64; calling .Uint64() will not compile, and Add on the roaring bitmap isn’t part of the Duplex contract. Convert with uint64(...) and stick to CheckedAdd for both adjacency bitmaps and s.nodes.

Apply this diff:

 func (s *directedGraph) AddEdge(start, end graph.ID) {
 	var (
-		startUint64 = start.Uint64()
-		endUint64   = end.Uint64()
+		startUint64 = uint64(start)
+		endUint64   = uint64(end)
 	)
 
 	if edgeBitmap, exists := s.outbound[startUint64]; exists {
-		edgeBitmap.Add(endUint64)
+		edgeBitmap.CheckedAdd(endUint64)
 	} else {
 		edgeBitmap = cardinality.NewBitmap64()
-		edgeBitmap.Add(endUint64)
+		edgeBitmap.CheckedAdd(endUint64)
 
 		s.outbound[startUint64] = edgeBitmap
 	}
 
 	if edgeBitmap, exists := s.inbound[endUint64]; exists {
-		edgeBitmap.Add(startUint64)
+		edgeBitmap.CheckedAdd(startUint64)
 	} else {
 		edgeBitmap = cardinality.NewBitmap64()
-		edgeBitmap.Add(startUint64)
+		edgeBitmap.CheckedAdd(startUint64)
 
 		s.inbound[endUint64] = edgeBitmap
 	}
 
-	s.nodes.Add(startUint64, endUint64)
+	s.nodes.CheckedAdd(startUint64)
+	s.nodes.CheckedAdd(endUint64)
 }
container/fetch.go (1)

33-60: Always wait for the merge worker before returning
On the error path you close edgeC but return without waiting, so the consumer goroutine can outlive the call (and keep mutating digraph). Wrap the close+wait in a deferred cleanup so both success and failure paths rendezvous cleanly.

Example fix:

 	var (
 		edgeC   = make(chan anonymousEdge, 4096)
 		mergeWG = &sync.WaitGroup{}
 		digraph = NewDirectedGraph()
 	)
 
 	mergeWG.Add(1)
-
-	go func() {
+	go func() {
 		defer mergeWG.Done()
@@
-	if err := graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error {
+	defer func() {
+		close(edgeC)
+		mergeWG.Wait()
+	}()
+
+	if err := graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error {
@@
-	}); err != nil {
-		// Ensure that the edge channel is closed to prevent goroutine leaks
-		close(edgeC)
-		return nil, err
-	}
-
-	close(edgeC)
-	mergeWG.Wait()
-
-	return digraph, nil
+	}); err != nil {
+		return nil, err
+	}
+
+	return digraph, nil
🧹 Nitpick comments (1)
algo/closeness.go (1)

15-26: Consider extracting the shared BFS accumulation into a helper.

Both functions duplicate the BFSTree traversal and score computation. Hoisting that into a small helper (e.g., computeCloseness(...) (Weight, bool)) keeps the serial and parallel paths in lockstep and makes future tweaks to the scoring rule (normalization, guards, etc.) a one-touch change.

Also applies to: 47-62

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9adea34 and 6c7d3f8.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • algo/closeness.go (1 hunks)
  • algo/katz.go (1 hunks)
  • algo/sample.go (1 hunks)
  • algo/scc.go (1 hunks)
  • algo/weight.go (1 hunks)
  • container/defaults.go (1 hunks)
  • container/digraph.go (1 hunks)
  • container/ef.go (1 hunks)
  • container/fetch.go (1 hunks)
  • container/pacmap.go (1 hunks)
  • go.mod (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • algo/weight.go
  • container/defaults.go
  • algo/sample.go
  • go.mod
  • algo/katz.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-06T17:49:51.741Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.741Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not add components with no incident edges as vertices in the component digraph.

Applied to files:

  • algo/scc.go
📚 Learning: 2025-10-06T17:49:51.741Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.741Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not handle `graph.DirectionBoth` when building the component graph.

Applied to files:

  • algo/scc.go
🧬 Code graph analysis (5)
algo/scc.go (4)
container/digraph.go (2)
  • DirectedGraph (11-22)
  • NewDirectedGraph (30-36)
graph/graph.go (4)
  • Direction (26-26)
  • ID (95-95)
  • DirectionInbound (16-16)
  • DirectionOutbound (17-17)
cardinality/cardinality.go (1)
  • Duplex (46-58)
cardinality/roaring64.go (1)
  • NewBitmap64 (23-27)
algo/closeness.go (4)
container/digraph.go (2)
  • DirectedGraph (11-22)
  • Weight (8-8)
graph/graph.go (1)
  • Direction (26-26)
algo/sample.go (1)
  • SampleFunc (11-11)
algo/weight.go (1)
  • Weight (3-3)
container/digraph.go (3)
cardinality/cardinality.go (1)
  • Duplex (46-58)
graph/graph.go (5)
  • ID (95-95)
  • Direction (26-26)
  • DirectionOutbound (17-17)
  • DirectionInbound (16-16)
  • DirectionBoth (18-18)
cardinality/roaring64.go (1)
  • NewBitmap64 (23-27)
container/pacmap.go (2)
container/defaults.go (3)
  • DefaultMaxValue (8-8)
  • DefaultNumBuckets (10-10)
  • DefaultNumMaxEntries (9-9)
container/ef.go (2)
  • EFSet (14-24)
  • CompressToEFSet (150-158)
container/fetch.go (3)
graph/graph.go (3)
  • Database (367-406)
  • ID (95-95)
  • Transaction (309-347)
graph/query.go (1)
  • Criteria (70-70)
container/digraph.go (2)
  • DirectedGraph (11-22)
  • NewDirectedGraph (30-36)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (9)
container/ef.go (5)

28-37: Elias–Fano bit‑widths must use universe range, not absolute universeMax

entryLength/upperBitsLength should derive from U = (universeMax - universeMin + 1). Current math mis-sizes when universeMin != 0.

Apply:

 func NewEFSet(universeMin, universeMax uint64, maxValues uint) *EFSet {
-	var (
-		logN            = math.Ceil(math.Log2(float64(maxValues + 2)))
-		entryLength     = math.Ceil(math.Log2(math.Max(float64(universeMax), 2)))
-		upperBitsLength = math.Ceil(math.Log2(math.Max(float64(maxValues), 2)))
-		lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength)
-		lowerBitsMask   = (uint64(1) << lowerBitsLength) - 1
+	var (
+		urange          = (universeMax - universeMin) + 1
+		nf              = math.Max(float64(maxValues), 1)
+		logN            = math.Ceil(math.Log2(nf + 1))
+		entryLength     = math.Ceil(math.Log2(math.Max(float64(urange), 2)))
+		upperBitsLength = math.Ceil(math.Log2(nf))
+		lowerBitsLength = uint(math.Max(entryLength, logN) - upperBitsLength)
+		lowerBitsMask   = (uint64(1) << lowerBitsLength) - 1
 		upperBitset     = bitset.New(0)
 		lowerBitset     = bitset.New(0)
 	)

52-63: Preallocation must also use universe range; fix upperBitsetLength and sizing

Use U = universeMax - universeMin in upperBitsetLength; also preallocate lowerBits for 1‑based indexing (+1) you use when Set/Test.

Apply:

 func AllocateEFSet(universeMin, universeMax uint64, maxValues uint) *EFSet {
-	var (
-		logN              = math.Ceil(math.Log2(float64(maxValues + 2)))
-		entryLength       = math.Ceil(math.Log2(math.Max(float64(universeMax), 2)))
-		upperBitsLength   = math.Ceil(math.Log2(math.Max(float64(maxValues), 2)))
-		lowerBitsLength   = uint(math.Max(entryLength, logN) - upperBitsLength)
-		lowerBitsMask     = (uint64(1) << lowerBitsLength) - 1
-		upperBitsetLength = maxValues + uint(universeMax>>lowerBitsLength) + 2
-		upperBitset       = bitset.New(upperBitsetLength)
-		lowerBitset       = bitset.New(maxValues * lowerBitsLength)
-	)
+	var (
+		u                 = universeMax - universeMin
+		urange            = u + 1
+		nf                = math.Max(float64(maxValues), 1)
+		logN              = math.Ceil(math.Log2(nf + 1))
+		entryLength       = math.Ceil(math.Log2(math.Max(float64(urange), 2)))
+		upperBitsLength   = math.Ceil(math.Log2(nf))
+		lowerBitsLength   = uint(math.Max(entryLength, logN) - upperBitsLength)
+		lowerBitsMask     = (uint64(1) << lowerBitsLength) - 1
+		upperBitsetLength = maxValues + uint(u>>lowerBitsLength) + 2
+		upperBitset       = bitset.New(upperBitsetLength)
+		lowerBitset       = bitset.New(maxValues*lowerBitsLength + 1)
+	)

Optional: guard oversized allocations to avoid overflow on maxValues*lowerBitsLength.


93-109: Replace nonexistent BitSet.Select with safe nth‑set‑bit selection

bitset.BitSet has NextSet, not Select; current code won’t compile.

Apply:

 func (s *EFSet) Lookup(index uint) uint64 {
 	var (
-		upper             = s.upperBits.Select(index) - index - 1
+		upperPos          = s.selectNthSet(index + 1)
+		upper             = upperPos - (index + 1)
 		lower      uint64 = 0
 		lowerIndex        = index * s.lowerBitsLength
 	)

Add this method in the file:

// selectNthSet returns 0-based position of the n-th (1-based) set bit.
func (s *EFSet) selectNthSet(n uint) uint {
	if n == 0 {
		return 0
	}
	pos := uint(0)
	for {
		idx, ok := s.upperBits.NextSet(pos)
		if !ok {
			return pos
		}
		n--
		if n == 0 {
			return idx
		}
		pos = idx + 1
	}
}
#!/bin/bash
# Confirm no remaining .Select calls on BitSet
rg -nP --type=go '\.Select\s*\(' -C2

155-160: Fix underflow when EFSet is empty

highEntry := s.len - 1 underflows for len==0.

Apply:

 func (s *EFSet) Find(target uint64) (uint, bool) {
-	var (
-		lowEntry  uint = 0
-		highEntry      = s.len - 1
-	)
+	if s.len == 0 {
+		return 0, false
+	}
+	lowEntry := uint(0)
+	highEntry := s.len - 1

176-184: Guard empty input in CompressToEFSet (panic today)

values[0] and values[len-1] panic on len==0.

Apply:

 func CompressToEFSet(values []uint64) *EFSet {
-	compactedVec := AllocateEFSet(values[0], values[len(values)-1], uint(len(values)))
+	if len(values) == 0 {
+		return AllocateEFSet(0, 0, 0)
+	}
+	compactedVec := AllocateEFSet(values[0], values[len(values)-1], uint(len(values)))
container/fetch.go (1)

33-63: Always close channel and wait before returning (avoid leak/race on error)

On error you close(edgeC) but don’t Wait; the merge goroutine may still be running. Capture the error, close and wait in all paths, then return.

Apply:

-	if err := graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error {
+	var readErr error
+	if readErr = graphDB.ReadTransaction(ctx, func(tx graph.Transaction) error {
 		return tx.Relationships().Filter(relationshipFilter).Query(func(results graph.Result) error {
@@
-	}); err != nil {
-		// Ensure that the edge channel is closed to prevent goroutine leaks
-		close(edgeC)
-		return nil, err
-	}
-
-	close(edgeC)
-	mergeWG.Wait()
-
-	return digraph, nil
+	}); readErr != nil {
+		// fall through to close and wait
+	}
+	close(edgeC)
+	mergeWG.Wait()
+	if readErr != nil {
+		return nil, readErr
+	}
+	return digraph, nil
container/digraph.go (3)

61-96: BFS uses LIFO stack (DFS semantics) and doesn't mark root visited.

Lines 74-75 pop from the end of stack, implementing DFS rather than BFS. This produces non-shortest distances. Additionally, the root isn't marked visited before traversal, allowing it to be enqueued as a terminal.

Apply this diff:

 func (s *directedGraph) BFSTree(nodeID uint64, direction graph.Direction) []ShortestPathTerminal {
 	var (
 		visited   = cardinality.NewBitmap64()
-		stack     []ShortestPathTerminal
+		queue     []ShortestPathTerminal
 		terminals []ShortestPathTerminal
 	)
 
-	stack = append(stack, ShortestPathTerminal{
+	// Mark root visited and enqueue
+	visited.CheckedAdd(nodeID)
+	queue = append(queue, ShortestPathTerminal{
 		NodeID:   nodeID,
 		Distance: 0,
 	})
 
-	for len(stack) > 0 {
-		nextCursor := stack[len(stack)-1]
-		stack = stack[:len(stack)-1]
+	for len(queue) > 0 {
+		nextCursor := queue[0]
+		queue = queue[1:]
 
 		s.EachAdjacent(nextCursor.NodeID, direction, func(adjacentNodeID uint64) bool {
 			if visited.CheckedAdd(adjacentNodeID) {
 				terminalCursor := ShortestPathTerminal{
 					NodeID:   adjacentNodeID,
 					Distance: nextCursor.Distance + 1,
 				}
 
-				// If not visited, descend into this node next
-				stack = append(stack, terminalCursor)
+				// Enqueue for BFS
+				queue = append(queue, terminalCursor)
 
 				// This reachable node represents one of the shortest path terminals
 				terminals = append(terminals, terminalCursor)
 			}
 
 			return true
 		})
 	}
 
 	return terminals
 }

155-157: NumEdges returns node count instead of edge count.

This method incorrectly returns s.nodes.Cardinality(), which counts nodes. It should sum the cardinalities of all outbound adjacency sets.

Apply this diff:

 func (s *directedGraph) NumEdges() uint64 {
-	return s.nodes.Cardinality()
+	var edges uint64
+	for _, adj := range s.outbound {
+		edges += adj.Cardinality()
+	}
+	return edges
 }

224-249: Compilation error: graph.ID is uint64, not a struct with .Uint64() method.

Lines 226-227 call .Uint64() on graph.ID, but graph.ID is defined as type ID uint64 (see relevant_code_snippets), so this will not compile. Additionally, prefer CheckedAdd for consistency with the Duplex interface, and add nodes individually rather than using a variadic call.

Apply this diff:

 func (s *directedGraph) AddEdge(start, end graph.ID) {
 	var (
-		startUint64 = start.Uint64()
-		endUint64   = end.Uint64()
+		startUint64 = uint64(start)
+		endUint64   = uint64(end)
 	)
 
 	if edgeBitmap, exists := s.outbound[startUint64]; exists {
-		edgeBitmap.Add(endUint64)
+		edgeBitmap.CheckedAdd(endUint64)
 	} else {
 		edgeBitmap = cardinality.NewBitmap64()
-		edgeBitmap.Add(endUint64)
+		edgeBitmap.CheckedAdd(endUint64)
 
 		s.outbound[startUint64] = edgeBitmap
 	}
 
 	if edgeBitmap, exists := s.inbound[endUint64]; exists {
-		edgeBitmap.Add(startUint64)
+		edgeBitmap.CheckedAdd(startUint64)
 	} else {
 		edgeBitmap = cardinality.NewBitmap64()
-		edgeBitmap.Add(startUint64)
+		edgeBitmap.CheckedAdd(startUint64)
 
 		s.inbound[endUint64] = edgeBitmap
 	}
 
-	s.nodes.Add(startUint64, endUint64)
+	s.nodes.CheckedAdd(startUint64)
+	s.nodes.CheckedAdd(endUint64)
 }
🧹 Nitpick comments (7)
container/ef.go (1)

61-62: Avoid hidden reallocations of lowerBits due to 1‑based indexing

You write/read lowerBits at indices +1; preallocate maxValues*lowerBitsLength + 1 to prevent growth.

container/pacmap_test.go (1)

10-25: LGTM

Covers Put → Compact → Get happy path. Consider adding a pre‑Compact Get case and a duplicate‑key overwrite case in a follow‑up.

algo/scc.go (1)

130-134: Intentional choice: include all components as vertices

Noting this differs from an alternative design that omits isolated components; if intentional, consider a brief comment for future readers.

container/pacmap.go (4)

121-137: Compact() with no existing keys: consider dedup to enforce “last write wins”

Current path stores duplicates as-is; EFSet.Find will return one rank arbitrarily. Recommend sorting (already done) and coalescing duplicates so the final value reflects the last pending write.

Apply:

 	for idx, nextKVPair := range s.pending {
-		s.keys.Append(nextKVPair.key)
-		s.values[idx] = nextKVPair.value
+		if idx > 0 && nextKVPair.key == s.pending[idx-1].key {
+			s.values[len(s.values)-1] = nextKVPair.value
+		} else {
+			s.keys.MustAppend(nextKVPair.key)
+			s.values[idx] = nextKVPair.value
+		}
 	}

38-48: Optional: support lookups before first compaction

Find returns false when s.keys is nil. If desired, also scan s.pending for the key.


161-185: CompactParallel: guard workers <= 0

Avoid starting zero goroutines (deadlock on send).

Apply:

 func (s *PackedMap) CompactParallel(workers int) {
+	if workers <= 0 {
+		workers = 1
+	}

206-212: Get allocates buckets on miss

getBucket creates a bucket on reads; consider a non-allocating fast path for Get.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c7d3f8 and 51bd464.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (12)
  • algo/closeness.go (1 hunks)
  • algo/katz.go (1 hunks)
  • algo/sample.go (1 hunks)
  • algo/scc.go (1 hunks)
  • algo/weight.go (1 hunks)
  • container/defaults.go (1 hunks)
  • container/digraph.go (1 hunks)
  • container/ef.go (1 hunks)
  • container/fetch.go (1 hunks)
  • container/pacmap.go (1 hunks)
  • container/pacmap_test.go (1 hunks)
  • go.mod (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • algo/weight.go
  • algo/katz.go
  • algo/closeness.go
  • go.mod
  • container/defaults.go
  • algo/sample.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-06T17:49:51.771Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.771Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not add components with no incident edges as vertices in the component digraph.

Applied to files:

  • algo/scc.go
📚 Learning: 2025-10-06T17:49:51.771Z
Learnt from: zinic
PR: SpecterOps/DAWGS#15
File: algo/scc.go:124-159
Timestamp: 2025-10-06T17:49:51.771Z
Learning: In `algo/scc.go`, the `NewComponentDirectedGraph` function intentionally does not handle `graph.DirectionBoth` when building the component graph.

Applied to files:

  • algo/scc.go
🧬 Code graph analysis (6)
container/pacmap_test.go (1)
container/pacmap.go (2)
  • NewPacMap (148-153)
  • DefaultParameters (15-21)
algo/scc.go (4)
container/digraph.go (2)
  • DirectedGraph (11-22)
  • NewDirectedGraph (30-36)
graph/graph.go (4)
  • Direction (26-26)
  • ID (95-95)
  • DirectionInbound (16-16)
  • DirectionOutbound (17-17)
cardinality/cardinality.go (1)
  • Duplex (46-58)
cardinality/roaring64.go (1)
  • NewBitmap64 (23-27)
container/digraph.go (3)
cardinality/cardinality.go (1)
  • Duplex (46-58)
graph/graph.go (5)
  • ID (95-95)
  • Direction (26-26)
  • DirectionOutbound (17-17)
  • DirectionInbound (16-16)
  • DirectionBoth (18-18)
cardinality/roaring64.go (1)
  • NewBitmap64 (23-27)
container/fetch.go (2)
graph/graph.go (3)
  • Database (367-406)
  • ID (95-95)
  • Transaction (309-347)
container/digraph.go (2)
  • DirectedGraph (11-22)
  • NewDirectedGraph (30-36)
container/pacmap.go (2)
container/defaults.go (3)
  • DefaultMaxValue (8-8)
  • DefaultNumBuckets (10-10)
  • DefaultNumMaxEntries (9-9)
container/ef.go (2)
  • EFSet (16-26)
  • NewEFSet (28-50)
container/ef.go (1)
cardinality/cardinality.go (1)
  • Iterator (39-42)
🔇 Additional comments (7)
container/digraph.go (7)

8-9: LGTM!

Type aliases provide clear semantic meaning for weights and adjacency structures.


11-22: LGTM!

The interface provides a clean, comprehensive API for directed graph operations.


24-36: LGTM!

Constructor properly initializes the graph structure with empty adjacency maps and node bitmap.


38-41: LGTM!

Structure clearly represents a terminal node with its shortest-path distance.


43-59: LGTM!

The method now correctly respects the direction parameter by using getAdjacent for each node, addressing the previous concern.


98-149: LGTM!

The normalization logic correctly reindexes nodes into a dense range and remaps all edges, returning the reverse mapping for translation.


159-222: LGTM!

Helper methods correctly handle direction-based adjacency queries and iteration. The getAdjacent method properly merges inbound/outbound sets for bidirectional queries.

Add approximate Katz and Closeness centrality implementations.
Add an implementation of Tarjan's SCC algorithm.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant