Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new envinfo package to log system environment details, including CPU, memory, disk usage, and cgroup limits, and adds an iometrics package to track and summarize IO throughput during push, pull, and fetch operations. The review identified several areas for improvement: tryV2 and tryV1 in cgroup_linux.go should be refactored to ensure memory limits are detected even if CPU limits are inaccessible; cpu.Percent usage in envinfo.go may return unhelpful data at startup; the isVirtualFS function appears to be dead code; and the formatBytes utility is duplicated and should be consolidated or replaced with the existing go-humanize dependency.
pkg/envinfo/cgroup_linux.go
Outdated
| cpuMax, err := os.ReadFile("/sys/fs/cgroup/cpu.max") | ||
| if err != nil { | ||
| return false | ||
| } | ||
|
|
||
| parts := strings.Fields(strings.TrimSpace(string(cpuMax))) | ||
| if len(parts) == 2 && parts[0] != "max" { | ||
| quota, err1 := strconv.ParseFloat(parts[0], 64) | ||
| period, err2 := strconv.ParseFloat(parts[1], 64) | ||
| if err1 == nil && err2 == nil && period > 0 { | ||
| limits.CPUQuota = quota / period | ||
| limits.InCgroup = true | ||
| } | ||
| } | ||
|
|
||
| memMax, err := os.ReadFile("/sys/fs/cgroup/memory.max") | ||
| if err != nil { | ||
| return limits.InCgroup | ||
| } | ||
|
|
||
| memStr := strings.TrimSpace(string(memMax)) | ||
| if memStr != "max" { | ||
| memLimit, err := strconv.ParseUint(memStr, 10, 64) | ||
| if err == nil { | ||
| limits.MemLimit = memLimit | ||
| limits.InCgroup = true | ||
| } | ||
| } | ||
|
|
||
| return limits.InCgroup |
There was a problem hiding this comment.
The tryV2 function returns early if it fails to read the CPU quota file. This prevents the function from checking for memory limits, which might still be present even if CPU limits are not configured or accessible. It's better to attempt reading both independently to ensure comprehensive limit detection.
if cpuMax, err := os.ReadFile("/sys/fs/cgroup/cpu.max"); err == nil {
parts := strings.Fields(strings.TrimSpace(string(cpuMax)))
if len(parts) == 2 && parts[0] != "max" {
quota, err1 := strconv.ParseFloat(parts[0], 64)
period, err2 := strconv.ParseFloat(parts[1], 64)
if err1 == nil && err2 == nil && period > 0 {
limits.CPUQuota = quota / period
limits.InCgroup = true
}
}
}
if memMax, err := os.ReadFile("/sys/fs/cgroup/memory.max"); err == nil {
memStr := strings.TrimSpace(string(memMax))
if memStr != "max" {
if memLimit, err := strconv.ParseUint(memStr, 10, 64); err == nil {
limits.MemLimit = memLimit
limits.InCgroup = true
}
}
}
pkg/envinfo/cgroup_linux.go
Outdated
| quotaBytes, err := os.ReadFile("/sys/fs/cgroup/cpu/cpu.cfs_quota_us") | ||
| if err != nil { | ||
| return | ||
| } | ||
|
|
||
| quota, err := strconv.ParseFloat(strings.TrimSpace(string(quotaBytes)), 64) | ||
| if err != nil || quota <= 0 { | ||
| // -1 means no limit. | ||
| return | ||
| } | ||
|
|
||
| periodBytes, err := os.ReadFile("/sys/fs/cgroup/cpu/cpu.cfs_period_us") | ||
| if err != nil { | ||
| return | ||
| } | ||
|
|
||
| period, err := strconv.ParseFloat(strings.TrimSpace(string(periodBytes)), 64) | ||
| if err != nil || period <= 0 { | ||
| return | ||
| } | ||
|
|
||
| limits.CPUQuota = quota / period | ||
| limits.InCgroup = true | ||
|
|
||
| // cgroup v1: Memory limit. | ||
| memBytes, err := os.ReadFile("/sys/fs/cgroup/memory/memory.limit_in_bytes") | ||
| if err != nil { | ||
| return | ||
| } | ||
|
|
||
| memLimit, err := strconv.ParseUint(strings.TrimSpace(string(memBytes)), 10, 64) | ||
| if err != nil { | ||
| return | ||
| } | ||
|
|
||
| // Very large values (like 2^63) indicate no limit. | ||
| const noLimitThreshold = 1 << 62 | ||
| if memLimit < noLimitThreshold { | ||
| limits.MemLimit = memLimit | ||
| } |
There was a problem hiding this comment.
In tryV1, the function returns early if any CPU-related file is missing or if the quota is not set, which skips the memory limit detection. Additionally, limits.InCgroup should be set to true if a memory limit is successfully detected, even if no CPU limit is found.
if quotaBytes, err := os.ReadFile("/sys/fs/cgroup/cpu/cpu.cfs_quota_us"); err == nil {
if quota, err := strconv.ParseFloat(strings.TrimSpace(string(quotaBytes)), 64); err == nil && quota > 0 {
if periodBytes, err := os.ReadFile("/sys/fs/cgroup/cpu/cpu.cfs_period_us"); err == nil {
if period, err := strconv.ParseFloat(strings.TrimSpace(string(periodBytes)), 64); err == nil && period > 0 {
limits.CPUQuota = quota / period
limits.InCgroup = true
}
}
}
}
// cgroup v1: Memory limit.
if memBytes, err := os.ReadFile("/sys/fs/cgroup/memory/memory.limit_in_bytes"); err == nil {
if memLimit, err := strconv.ParseUint(strings.TrimSpace(string(memBytes)), 10, 64); err == nil {
// Very large values (like 2^63) indicate no limit.
const noLimitThreshold = 1 << 62
if memLimit < noLimitThreshold {
limits.MemLimit = memLimit
limits.InCgroup = true
}
}
}
pkg/envinfo/envinfo.go
Outdated
| fields["model"] = infos[0].ModelName | ||
| } | ||
|
|
||
| percents, err := cpu.Percent(0, false) |
There was a problem hiding this comment.
cpu.Percent(0, false) returns the CPU usage since the last call. When called for the first time at process startup with a 0 interval, it typically returns 0.0% or an error, which doesn't provide useful information about the system's current load. Consider using a small interval (e.g., 100ms) to get a meaningful snapshot, or omit this field if startup delay is a concern.
pkg/envinfo/envinfo.go
Outdated
| func isVirtualFS(fstype, device string) bool { | ||
| fstypeLower := strings.ToLower(fstype) | ||
|
|
||
| virtualFSTypes := []string{ | ||
| "fuse", "nfs", "cifs", "smb", "smbfs", | ||
| "tmpfs", "ramfs", "devtmpfs", | ||
| "overlay", "aufs", | ||
| "sshfs", "s3fs", "gcsfuse", "ossfs", | ||
| "9p", "virtiofs", | ||
| } | ||
|
|
||
| for _, vfs := range virtualFSTypes { | ||
| if fstypeLower == vfs || strings.HasPrefix(fstypeLower, vfs+".") { | ||
| return true | ||
| } | ||
| } | ||
|
|
||
| // Devices not under /dev/ are generally not block devices. | ||
| // e.g., "s3fs", "sshfs#user@host:", "server:/export" | ||
| if device != "" && !strings.HasPrefix(device, "/dev/") { | ||
| return true | ||
| } | ||
|
|
||
| return false | ||
| } |
pkg/envinfo/envinfo.go
Outdated
| func formatBytes(b uint64) string { | ||
| const ( | ||
| KB = 1024 | ||
| MB = KB * 1024 | ||
| GB = MB * 1024 | ||
| TB = GB * 1024 | ||
| ) | ||
|
|
||
| switch { | ||
| case b >= TB: | ||
| return fmt.Sprintf("%.2f TB", float64(b)/float64(TB)) | ||
| case b >= GB: | ||
| return fmt.Sprintf("%.2f GB", float64(b)/float64(GB)) | ||
| case b >= MB: | ||
| return fmt.Sprintf("%.2f MB", float64(b)/float64(MB)) | ||
| case b >= KB: | ||
| return fmt.Sprintf("%.2f KB", float64(b)/float64(KB)) | ||
| default: | ||
| return fmt.Sprintf("%d B", b) | ||
| } | ||
| } |
There was a problem hiding this comment.
The formatBytes function is implemented twice (here and in pkg/iometrics/format.go). Furthermore, the project already depends on github.com/dustin/go-humanize, which provides robust and well-tested functions for formatting byte sizes (e.g., humanize.Bytes). Consider consolidating this logic or using the existing dependency.
Add envinfo package that logs build version, runtime, CPU, memory, and disk usage information at startup. On Linux, cgroup CPU/memory limits are also logged to help diagnose container resource constraints. Signed-off-by: Zhao Chen <zhaochen.zju@gmail.com> Signed-off-by: Zhao Chen <winters.zc@antgroup.com>
Add pkg/iometrics package with a Tracker that wraps source io.Reader using atomic counters to measure bytes and cumulative Read() call duration. TrackTransfer() measures per-goroutine wall-clock time. After each operation completes, a summary is output to both terminal (stderr) and log file showing effective throughput, source read throughput, and readFraction (sourceReadTime / transferTime) to help identify whether the bottleneck is in source reading or sink writing. Signed-off-by: Zhao Chen <winters.zc@antgroup.com>
Replace generic "source/effective" labels with operation-aware labels (push: disk read / network write, pull: network read / disk write). Derive sink throughput from transferTime - sourceReadTime. Mark the slower side as bottleneck. Add overall effective throughput to output. Signed-off-by: Zhao Chen <winters.zc@antgroup.com>
- Fix cgroup v1/v2: detect CPU and memory limits independently so one missing limit doesn't prevent detecting the other - Remove cpu.Percent(0) from startup log — unreliable at process start - Remove unused isVirtualFS function (dead code after IO removal) - Replace duplicated formatBytes with humanize.IBytes from go-humanize Signed-off-by: Zhao Chen <winters.zc@antgroup.com>
- Read /proc/self/cgroup to find the process's actual cgroup path before reading limit files, so containers in k8s/Docker sub-cgroups get the correct CPU/memory limits instead of root cgroup values - Wrap manifest bytes.NewReader with tracker.WrapReader so push summaries include manifest bytes in totalBytes Signed-off-by: Zhao Chen <winters.zc@antgroup.com>
Summary
Add diagnostic logging to help identify performance bottlenecks in push/pull/fetch operations:
Example Output
Push (100MB to registry, tested on Linux x86_64):
Pull (50MB from registry):
Log file (structured, for automated analysis):
How It Works
Wraps the source
io.Readerwith acountingReaderusingsync/atomiccounters. EachRead()call records bytes and duration.TrackTransfer()measures per-goroutine wall-clock time. After all goroutines complete:← bottleneckLabels are operation-aware: push shows
disk read/network write, pull/fetch showsnetwork read/disk write.Changes
New packages:
pkg/envinfo/— CPU, memory, disk usage, cgroup v1/v2 limits (Linux), virtual FS detectionpkg/iometrics/—Tracker,countingReader, throughput formattingModified files:
cmd/root.go— callenvinfo.LogEnvironment()at startupcmd/build.go,cmd/pull.go— log disk info for work directoriespkg/backend/push.go,pull.go,fetch.go— integrateTrackerTest Results
Unit tests (macOS, 17 tests):
Covers: concurrent aggregation, timing accuracy, bottleneck detection, error propagation, zero-value safety, data integrity, virtual FS detection.
Remote machine test (Linux x86_64, Intel Xeon E5-2640 v3):
Both correctly identified the bottleneck side.