Arm Telemetry Solution provides a standardized framework for system-level performance analysis across Arm platforms, including CPUs and interconnects such as CMN. It includes telemetry specifications, a data framework, a top-down performance analysis methodology, command-line tools, and validation workloads.
The solution leverages telemetry data from Arm IP to identify performance bottlenecks and improve execution efficiency across the full system stack.
This repository is organized into the following components:
- data Contains the telemetry specification JSON for all supported Arm products, including CPU and CMN interconnect.
- tools Contains telemetry tools and utilities for telemetry data collection, analysis, and visualization.
- benchmarks Contains validation test suites such as
ustressandsystress.
The Arm Telemetry Solution enables a unified performance analysis workflow:
- Telemetry Specifications (JSON) define PMU events, metrics, and methodology for CPU and CMN
- Topdown Tool consumes these specifications to collect telemetry data, compute metrics, and apply the Topdown methodology
- Benchmarks (UStress / Systress) validate telemetry metrics and stress specific system components
This enables consistent, methodology-driven analysis across compute and system components.
- Arm Telemetry Solution
Arm Topdown Methodology specifies a set of metrics and performance analysis methodology using hardware PMU events, to help identify processor & system bottlenecks during workload execution. The methodology applies across compute and system components, enabling hierarchical analysis from CPU pipeline inefficiencies to interconnect and memory subsystem bottlenecks.
Arm Topdown methodology can be conducted in two stages:
-
Stage 1: Topdown Analysis Topdown hot spot analysis stage using stall-related metrics to locate the pipeline bottlenecks.
-
Stage 2: Micro-architecture Exploration Deeper analysis stage to further analyze bottlenecked resources, using per micro-architecture resource effectiveness metric groups and metrics.
With support for both CPU and CMN telemetry, the solution enables cross-component analysis, correlating CPU behavior with interconnect and memory system activity.
The Arm CPU Telemetry Solution enables collection, analysis, and representation of CPU telemetry data on Arm platforms.
-
Each supported CPU provides a Telemetry Specification defining PMU events and a metric-driven hierarchical decision tree for hotspot detection. This decision tree is Arm’s implementation of the Topdown Methodology for performance analysis.
-
Telemetry data is structured in the Arm Telemetry Framework, which standardizes events/metrics into machine-readable JSON (MRS). This supports large-scale data collection, processing, and integration with profiling tools.
-
The solution includes the Arm Top-Down tool, a simple CLI for profiling applications. It parses the MRS to collect telemetry data and deliver performance insights. The tool is supported on Linux and Windows.
For more information about Arm CPU Telemetry Solution, see Arm® Telemetry on Arm Developer, see Arm CPU Telemetry Solution Topdown Methodology Specification.
Key chapters from this solution architecture specification are as below:
| Chapter | Content |
|---|---|
| Arm Topdown Methodology | Topdown methodology and stages for performance analysis (Stage 1 and Stage 2). |
| Arm Telemetry Framework for CPUs | Arm telemetry framework and data model standardization. |
| Arm Telemetry Specification and Profiling Tools | Details on how telemetry specification is enabled for Linux and Windows perf tools. |
| Arm Top-Down tool Example | Arm Top-Down tool data collection example. |
| Linux perf data collection | Linux perf tool data collection example. |
| Windows perf data collection | Windows perf tool data collection example. |
Refer to Arm Neoverse V1 Performance Analysis Methodology whitepaper for an example Arm Topdown methodology supported by the Neoverse V1 processor, with example case studies.
Key chapters from this whitepaper are as below:
| Chapter | Content |
|---|---|
| 2 | PMU event and metric cheat sheets for performance analysis |
| 3 | Arm topdown performance analysis methodology (Neoverse V1). This chapter describes the methodology in detail with all metrics. |
| 4 | An example case study to demonstrate how to use our methodology for code tuning exercise. |
| Appendix B | Telemetry Specification: PMU events with concise descriptions |
| Appendix C | Telemetry Specification: Metrics and metric groups for performance analysis derives using PMU events |
Note:
The Arm CPU Telemetry Solution is supported across all Neoverse and Lumex CPUs, with PMU events, metrics, and methodology defined and upstreamed in Linux perf. Support for additional Arm CPUs will be available soon.
The Arm CMN Telemetry Solution extends the telemetry framework to Arm Coherent Mesh Network (CMN) interconnects, enabling system-level performance analysis beyond CPU cores.
- CMN telemetry specifications define PMU events and derived metrics for key interconnect components such as RN-F, HN-F, SN-F, and mesh links.
- These metrics enable visibility into bandwidth utilization, congestion, latency, and traffic distribution across the mesh.
- CMN telemetry integrates with the Arm Topdown methodology and tooling, enabling correlated CPU + interconnect analysis.
- CMN specifications follow the same JSON-based telemetry schema, enabling seamless integration with existing tools and workflows.
This support enables users to:
- Identify system bottlenecks caused by memory and interconnect pressure
- Correlate CPU stalls with fabric-level behavior
- Perform end-to-end performance analysis across compute and data movement
Refer to the Arm Neoverse CMN-700 Performance Analysis Methodology White Paper, for an example Neoverse CMN Topdown Methodology with example case studies.
Key chapters from this whitepaper are as below:
| Chapter | Content |
|---|---|
| 2 | Overview of Neoverse CMN architecture, telemetry capabilities, and framework |
| 3 | Arm Topdown methodology for Neoverse CMNs, including Stage 1 analysis and metric groups |
| 4 | CMN telemetry specifications and Topdown tool support |
| 5 | Validation of CMN Topdown methodology using the Systress benchmark suite |
| 6 | Case studies demonstrating system-level performance analysis using CMN telemetry |
| Appendix A/B | Data collection using Linux perf and example usage of the Topdown tool |
The building blocks of the Telemetry Framework are as follows.
-
Events are hardware PMU events that count micro-architectural activity.
-
Metrics specify mathematical relations between events that help with the correlation of events for analyzing the system.
-
Metric Groups specify a group of metrics that can be analysed together for a use case. Metric Groups can be components of methodology.
-
Methodology specifies different performance analysis approaches common among software consumers or performance analysts.
Arm provides a standardized JSON schema to describe PMU events, derived metrics, and methodology for supported IP blocks (e.g., CPU and CMN) in a single file, enabling seamless integration with tooling.
High level schema structure is as follows:
{ "events": {}, // PMU events supported by the CPU "metrics": {}, // Derived metrics supported by the CPU "groups": { // Grouping of events and metrics "function": {}, // Event groups by CPU function "metrics": {} // Metric groups for analysis/methodology }, "methodologies": { "topdown_methodology": {} // Stages and decision tree for Topdown analysis } }
| Field | Definition |
|---|---|
code |
Event register code for counting |
title |
Title of the event |
description |
Description of what is being counted for the event |
accesses |
Access interface – PMU/ETM |
architecture_defined |
Architecturally defined event, included in Arm Architecture Reference Manual |
product_defined |
Micro-architecture implementation specific event, specified by the product architecture |
| Field | Definition |
|---|---|
title |
Title of the Metrics |
formula |
Formula to compute the metrics |
description |
Description of the metrics |
units |
Metrics unit |
events |
Events needed to calculate the metrics |
sample_events |
Events for sampling if a bottleneck is detected with this metric |
| Field | Definition |
|---|---|
title |
Title |
description |
Description |
metric_grouping |
Metric groups used for each stage of the methodology added as lists |
| decision tree | Stage 1 topdown analysis tree with root_nodes and child metrics. Each metric has the following fields:
|
CMN telemetry specifications have two layers:
- top-level data describing the CMN version and revision, plus global system events, metrics, and metric groups
- per-component data under
componentsdescribing CMN internal devices and port components, with their events, watchpoints, metrics, filters, and groups
The specification contains static information for a CMN revision. It must be completed with mesh topology discovery to know which CMN instances, XP IDs, node IDs, and ports exist on a given system.
High level schema structure is as follows:
At the top level, product_configuration identifies the CMN version and revision described by the file. On Linux, this corresponds to the CMN perf device identifier. Global events and metrics define helper inputs and top-level derived metrics. The components section contains the per-component content used for analysis.
Within components, CMN internal devices are identified by product_configuration.device_id. Port components such as RNF, SNF, and CCG are resolved from the component name against the port device types defined for the system. Not every component has every optional section.
| Field | Definition |
|---|---|
code |
Event selector value for a CMN internal device event |
title |
Human-readable event name |
description |
Description of what the event counts |
system |
Optional flag marking a system/global helper event rather than a normal component-local PMU event |
Events are the basic PMU inputs used by CMN metrics. They apply to CMN internal device components, such as HN-F or HN-I. The specification also includes some global helper inputs, such as SYS_FREQUENCY and SYS_CMN_CYCLES, which are used by metrics but are not ordinary component-local event definitions.
| Field | Definition |
|---|---|
description |
Description of what the watchpoint is intended to count |
wp_val |
Value programmed into the watchpoint match |
wp_mask |
Mask that selects which bits participate in the match |
mesh_flit_dir |
CHI flit direction to monitor |
wp_chn_sel |
CHI channel to monitor |
wp_grp |
CMN watchpoint group to use |
field_name / field_value |
Decoded interpretation of the mask/value pair |
Watchpoints count protocol traffic patterns at the port level. This is important because a single port can contain multiple devices. They are especially important for port components such as RNF, SNF, and CCG, and are also used by some internal-device metrics.
| Field | Definition |
|---|---|
title |
Human-readable metric name |
formula |
Expression used to compute the metric |
description |
Description of what the metric means |
units |
Unit for the computed value |
events |
Raw PMU event inputs needed to calculate the metric |
watchpoints |
Watchpoint inputs needed to calculate the metric |
metrics |
Dependent metrics that must be resolved first |
filters |
Optional filter settings applied to specific metric events |
Metrics are the user-facing derived values reported by the tooling. A metric can combine several kinds of inputs: raw PMU events, watchpoints, or other metrics. In the shipped CMN files, the RNF, SNF, and CCG metrics are typically watchpoint-based metrics normalized by SYS_CMN_CYCLES.
| Field | Definition |
|---|---|
encodings |
Mapping from symbolic filter names to numeric encoding values |
access.register |
Register name where the filter is defined |
access.field |
Register field controlled by the filter |
filter_specification is the component-local catalog of filters that metrics can refer to. The main operational information is the encodings map, which lets a tool resolve a symbolic filter name to the numeric selector value needed for that metric.
| Field | Definition |
|---|---|
groups.function |
Groups of related raw events for a functional area |
groups.metrics |
Groups of related derived metrics |
Groups are convenience bundles that help users and tools select a meaningful set of events or metrics without listing them one by one. They do not change the underlying event or watchpoint definitions.
The tooling stack enables collection, parsing, and analysis of telemetry data. The Arm Top-Down tool serves as the primary entry point for methodology-driven performance analysis across CPU and CMN telemetry.
| Name | Description | Folder |
|---|---|---|
| Arm Top-Down tool | Primary CLI tool implementing the Arm Topdown methodology across CPU and CMN telemetry. It consumes telemetry specifications (JSON) to collect PMU & hardware telemetry data, compute metrics, and apply the Topdown methodology for quick analysis | tools/topdown_tool |
| Perf JSON Generator | Tool to generate JSON files for Linux perf tool which enable and document Arm PMU events and metrics. | tools/perf_json_generator |
| SPE Parser | Tool to parse SPE raw data and generate a Parquet or CSV file for further processing and analysis. | tools/spe_parser |
| UStress Charts | Visualization tooling for metrics generated from the ustress suite workloads. | tools/ustress_charts |
The benchmarks folder contains validation test suites used to stress CPU and system resources (including interconnect and memory subsystem) and validate the telemetry solution.
| Name | Description | Folder |
|---|---|---|
| Ustress Suite | Validation workload suite to stress test major CPU resources. | benchmarks/ustress |
| Systress Suite | System-level stress and validation suite targeting CMN and memory subsystem behavior. | benchmarks/systress |
| Matrix Multiplication Kernels | Dense matmul variants (naïve, loop-reordered, blocked) for locality and cache reuse studies. | benchmarks/matmul |
| Random Pointer Access | Pointer-chasing microbenchmark with optional software prefetch tuning. | benchmarks/random_pointer_access |
| SysStress Suite | Validation workload suite to stress milticore CPUs. | benchmarks/sysstress |
For feedback, collaboration or support, contact telemetry-solution@arm.com.
This project is licensed as Apache-2.0. See LICENSE.md for more details.
{ "document": {}, // Document metadata "product_configuration": {}, // CMN version and revision for the file "events": {}, // Global/system helper inputs such as SYS_FREQUENCY "metrics": {}, // Top-level composite metrics "groups": { "metrics": {} // Top-level metric groups }, "components": { "<component_name>": { // Example: HNF, HNS, RNF, SNF, CCG "product_configuration": { "device_id": 5 // Internal CMN device identifier when applicable }, "filter_specification": {}, // Optional component-local filter catalog "events": {}, // Component-local PMU events "watchpoints": {}, // Component-local watchpoint definitions "metrics": {}, // Component-local derived metrics "groups": { "function": {}, // Functional groups of related events "metrics": {} // Groups of related derived metrics } } } }