Skip to content

Migrate metrics from OpenCensus to OpenTelemetry#3332

Open
infernus01 wants to merge 4 commits intotektoncd:mainfrom
infernus01:migration-otel
Open

Migrate metrics from OpenCensus to OpenTelemetry#3332
infernus01 wants to merge 4 commits intotektoncd:mainfrom
infernus01:migration-otel

Conversation

@infernus01
Copy link
Copy Markdown
Member

@infernus01 infernus01 commented Apr 6, 2026

Changes

Migrates Operator metrics from the deprecated OpenCensus library to OpenTelemetry.

  • Replace OpenCensus imports (go.opencensus.io/stats, stats/view, tag) with OpenTelemetry (go.opentelemetry.io/otel, otel/attribute, otel/metric) in tektonpipeline/metrics.go, tektontrigger/metrics.go, tektonchain/metrics.go, and tektonresult/metrics.go
  • Convert counter metrics to metric.Int64Counter and gauge metric to metric.Float64Gauge
  • Rewrite tektonresult/metrics_test.go to use sdkmetric.ManualReader for asserting actual collected metric values
  • Update config-observability.yaml to use OTel configuration keys (metrics-protocol, tracing-protocol) and remove legacy OpenCensus keys
  • Bump knative.dev/pkg to v0.0.0-20260318013857-98d5a706d4fd (OTel observability stack)
  • Regenerate client code via ./hack/update-codegen.sh
    /kind feature

closes : #3307
fixes : #3331

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Release Notes

Migrated operator metrics from OpenCensus to OpenTelemetry.

ACTION REQUIRED:

1. Configuration key change
   Replace metrics.backend-destination with metrics-protocol in your tekton-config-observability ConfigMap.

2. Infrastructure metric renaming

   ┌──────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────┐
   │ Old Metric Name (OpenCensus)                                     │ New Metric Name (OpenTelemetry)                    │
   ├──────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────┤
   │ tekton_operator_lifecycle_workqueue_adds_total                   │ kn_workqueue_adds_total                            │
   │ tekton_operator_lifecycle_workqueue_depth                        │ kn_workqueue_depth                                 │
   │ tekton_operator_lifecycle_workqueue_queue_latency_seconds        │ kn_workqueue_queue_duration_seconds                │
   │ tekton_operator_lifecycle_workqueue_work_duration_seconds        │ kn_workqueue_process_duration_seconds              │
   │ tekton_operator_lifecycle_workqueue_unfinished_work_seconds      │ kn_workqueue_unfinished_work_seconds               │
   │ tekton_operator_lifecycle_client_latency                         │ http_client_request_duration_seconds               │
   │ tekton_operator_lifecycle_client_results                         │ kn_k8s_client_http_response_status_code_total      │
   │ tekton_operator_lifecycle_go_*                                   │ go_*                                               │
   │ tekton_operator_lifecycle_reconcile_count                        │ removed (use kn_workqueue_adds_total)              │
   │ tekton_operator_lifecycle_reconcile_latency                      │ removed (use kn_workqueue_process_duration_seconds)│
   └──────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────┘

@tekton-robot tekton-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. labels Apr 6, 2026
@tekton-robot tekton-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 6, 2026
@infernus01 infernus01 force-pushed the migration-otel branch 3 times, most recently from 0dc71d3 to f61fbf6 Compare April 7, 2026 07:49
Comment thread go.mod
k8s.io/client-go => k8s.io/client-go v0.32.4
k8s.io/code-generator => k8s.io/code-generator v0.32.4
k8s.io/kube-openapi => k8s.io/kube-openapi v0.0.0-20250627150254-e9823e99808e
knative.dev/eventing => knative.dev/eventing v0.30.3
Copy link
Copy Markdown
Member Author

@infernus01 infernus01 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see we are consuming eventing pkg anywhere in the code.

@infernus01 infernus01 force-pushed the migration-otel branch 2 times, most recently from 86e5edc to d7b90d1 Compare April 7, 2026 08:53
@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 7, 2026
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2026
Copy link
Copy Markdown
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove metrics in the operator. These aren't used by customers or users. It was for some telemetry, but it's not working properly for that also.

If I get time, I would write more on that.

Comment thread pkg/reconciler/kubernetes/tektonchain/metrics.go Outdated
Comment thread pkg/reconciler/kubernetes/tektonchain/metrics.go Outdated
Comment thread pkg/reconciler/kubernetes/tektonchain/metrics.go Outdated
Comment thread pkg/reconciler/kubernetes/tektonpipeline/metrics.go Outdated
Comment thread pkg/reconciler/kubernetes/tektonresult/metrics.go Outdated
Comment thread pkg/reconciler/kubernetes/tektontrigger/metrics.go Outdated
@jkhelil
Copy link
Copy Markdown
Member

jkhelil commented Apr 9, 2026

I think we should remove metrics in the operator. These aren't used by customers or users. It was for some telemetry, but it's not working properly for that also.

If I get time, I would write more on that.

We have stories to enhance overall metrics on operator, to move to olm level 3 and for 4, i didnt dig into details for these levels, but lets not remove metrics for now, until we tackle properly those stories please

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2026
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2026
@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2026
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 20, 2026
@jkhelil
Copy link
Copy Markdown
Member

jkhelil commented Apr 20, 2026

@khrm would like to move forward with this, do you lgtm , or do you still think we need antoher approch removing metrics in operator ?

@jkhelil
Copy link
Copy Markdown
Member

jkhelil commented Apr 20, 2026

@anithapriyanatarajan given you work on opentelemetry assessment, would like your review in this one please, would like to move froward on it

@khrm
Copy link
Copy Markdown
Contributor

khrm commented Apr 20, 2026

We have stories to enhance overall metrics on operator, to move to olm level 3 and for 4, i didnt dig into details for these levels, but lets not remove metrics for now, until we tackle properly those stories please

I was talking of removing the custom metrics which we added to the operator.
Other metrics from Knative would surface anyway. We don't use former custom metrics anywhere, and they aren't that usable. They were added for telemetry purposes, but it couldn't fulfil that either. We never documented those metrics. So documenting those now as part of the release notes doesn't make sense.

Even if we don't remove them, why should we surface those in release notes? It will confuse users.

@infernus01
Copy link
Copy Markdown
Member Author

@khrm For now I have updated the release notes sectino to not include those custom metrics.
Do you want me to remove those metrics from operator in this PR, or we can raise a follow-up issue to do so?

@jkhelil
Copy link
Copy Markdown
Member

jkhelil commented Apr 21, 2026

@khrm understood now, @infernus01 would you be able to update the PR then, i think it make sense to remove those to start with clean metrics list

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 21, 2026
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2026
@infernus01 infernus01 requested review from jkhelil and khrm April 22, 2026 06:48
Comment thread config/base/config-observability.yaml
Copy link
Copy Markdown
Contributor

@anithapriyanatarajan anithapriyanatarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@infernus01 Please consider metioning the metrics that were removed as part of release notes. Also remove the metrics.go file if no metrics are recorded.

@tekton-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign puneetpunamiya after the PR has been reviewed.
You can assign the PR to them by writing /assign @puneetpunamiya in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Update knative.dev/pkg to OTel-based observability stack,
bump k8s.io/* to v0.35.2, openshift/* to k8s-1.35-compatible
versions, knative.dev/eventing, tektoncd/pipeline to v1.11.0,
and go.opentelemetry.io/otel to v1.42.0. This removes
knative.dev/pkg/metrics (OpenCensus) from the vendor tree and
brings in knative.dev/pkg/observability (OpenTelemetry).

Signed-off-by: Shubham Bhardwaj <shubbhar@redhat.com>
Run ./hack/update-codegen.sh to regenerate client code,
informers, and injection reconcilers after the knative.dev/pkg
dependency bump.

Signed-off-by: Shubham Bhardwaj <shubbhar@redhat.com>
Replace OpenCensus imports (go.opencensus.io/stats, stats/view, tag)
with OpenTelemetry (go.opentelemetry.io/otel, otel/attribute,
otel/metric).

Also updates config-observability.yaml to use OTel configuration
keys (metrics-protocol, tracing-protocol) and removes legacy
OpenCensus keys.

Signed-off-by: Shubham Bhardwaj <shubbhar@redhat.com>
@infernus01
Copy link
Copy Markdown
Member Author

@infernus01 Please consider metioning the metrics that were removed as part of release notes. Also remove the metrics.go file if no metrics are recorded.

Wouldn't that conflict with @khrm's comment, that it was not even documented before why we should even write that in release notes.?

The custom metrics (pipeline_reconcile_count, trigger_reconcile_count,
chains_reconciled, results_reconciled) were never documented, never
consumed by any dashboard or alert.

Infrastructure metrics from Knative (workqueue, HTTP client, Go runtime)
continue to be emitted.

Signed-off-by: Shubham Bhardwaj <shubbhar@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate metrics from OpenCensus to OpenTelemetry

5 participants