Improving DirectFileStore for performance and scalability

At GitLab, we've been running [prometheus-client-mmap](https://gitlab.com/gitlab-org/ruby/gems/prometheus-client-mmap) to record and report Prometheus metrics. It's an early fork of this project and uses a C extension with `mmap()` to optimize both the read and write paths.

We'd like to [switch to `DirectFileStore`](https://gitlab.com/groups/gitlab-org/-/epics/9433), but we see a number of issues:

1. File descriptor usage: I don't believe `DirectFileStore` recording one metric per file is going to work at scale. With thousands of metrics, that's a lot of file descriptors that may have to be opened to read and write metrics. In `prometheus-client-mmap` we use one file metric per metric type (counter, histogram, gauge). I see there was an attempt to use one file per process in https://github.com/prometheus/client_ruby/pull/161. 
2. Read performance (related to #232, #143, #194): Aggregating thousands of metrics in Ruby is pretty CPU and memory intensive. We found that this was even difficult to do efficiently with a [Go exporter that reads the `prometheus-client-mmap` metrics](https://gitlab.com/gitlab-org/gitlab-metrics-exporter) since garbage collection becomes an issue due to memory allocations. We have a [prototype Rust port](https://gitlab.com/gitlab-org/ruby/gems/prometheus-client-mmap/-/merge_requests/79) that handles reading of metrics that is *faster* than the C implementation.
3. Aggregating metrics in a separate process: The metrics stored by `prometheus-client-mmap` can be aggregated by a separate process since the metric type is in the filename. With `DirectFileStore`, I believe the types are registered in the registry, so there's no way an outside process can determine the metric type just by scanning the `.bin` files.

I would like to propose a path forward:

1. Switch `DirectFileStore` to use file per process, or one file per metric type. The latter is simpler from our experience. If we do the former, we'd probably want to encode the metric type in the file.
2. Add an optional Rust extension for reading metrics generated by `DirectFileStore`. We can easily adapt the work in our Rust port for `DirectFileStore`.

A side point: I believe the metrics are always written in the native endian format. I propose that we enforce little endian (which is what x86 and ARM64 use) to avoid cross-platform confusion.

@dmagliola What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving DirectFileStore for performance and scalability #281

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improving DirectFileStore for performance and scalability #281

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions