feat: add diskann index#369
Conversation
… feat/diskann_index
… feat/diskann_index
| const std::vector<float> &b) const { | ||
| if (a.size() != b.size()) return false; | ||
| for (size_t i = 0; i < a.size(); ++i) | ||
| if (std::fabs(a[i] - b[i]) >= 1e-4f) return false; |
There was a problem hiding this comment.
测试定位bug的时候,放宽了要求,改回去了
| -Wl,--whole-archive | ||
| $<TARGET_FILE:core_knn_flat_static> | ||
| $<TARGET_FILE:core_knn_flat_sparse_static> | ||
| $<TARGET_FILE:core_knn_hnsw_static> |
| run: | | ||
| sudo apt-get update | ||
| sudo apt-get install -y --no-install-recommends \ | ||
| libaio-dev |
There was a problem hiding this comment.
如果用户的环境没有装libaio-dev,会发生什么?
There was a problem hiding this comment.
现在默认使用是需要安装libaio,可以通过配置的方式进行区分,千问的建议是通过linux安装包的方式安装libaio库:
Installation
zvec requires the libaio system library on linux platform.
On Ubuntu/Debian:
sudo apt-get install libaio1 libaio-dev
pip install zvecThere was a problem hiding this comment.
如果没有安装,会发生什么?这里预期的行为应该是 如果用户不安装aio,不影响除diskann的其他功能使用
| pytest \ | ||
| scikit-build-core \ | ||
| setuptools_scm | ||
| shell: bash |
There was a problem hiding this comment.
把bash加回去吧,统一一点,并且如果后续这里是多行命令,在非bash为默认shell的环境下可能会出问题
| } | ||
|
|
||
| auto &pool = ctx->expanded_nodes(); | ||
| for (uint32_t i = 0; i < pool.size(); i++) { |
There was a problem hiding this comment.
可以使用std::remove_if + erase,效率高一些
|
|
||
| virtual ~DiskAnnQueryParams() = default; | ||
|
|
||
| int list_size() const { |
| } | ||
|
|
||
| for (size_t i = 0; i < dimension; i++) { | ||
| centroid_data_ptr[i] /= entity_.doc_cnt(); |
There was a problem hiding this comment.
entity_.doc_cnt()可能为0吗?
There was a problem hiding this comment.
加了提前校验:
if (ailego_unlikely(holder->count() == 0)) {
LOG_ERROR("Holder is empty");
return IndexError_Runtime;
}
|
|
||
| (*entity_.mutable_medoid()) = medoid_id; | ||
|
|
||
| LOG_INFO("Medroid Calculation Done. ID: %zu", (size_t)medoid_id); |
|
|
||
| sector_internal_id_++; | ||
| if (sector_internal_id_ >= sector_vec_num_) { | ||
| std::vector<uint8_t> padding_(padding_size_, 0); |
There was a problem hiding this comment.
没有必要allocate一个临时的std::vector?
std::memset(data_ptr + data_size_, 0, padding_size_);
| float *centroid_data_{nullptr}; | ||
|
|
||
| diskann_id_t medoid_; | ||
| std::vector<diskann_id_t> entrypints_; |
| ## | ||
| ## Copyright (C) The Software Authors. All rights reserved. | ||
| ## | ||
| ## \file CMakeLists.txt |
There was a problem hiding this comment.
去掉吧,换成Copyright of zvec
|
|
||
| int list_size() const { | ||
| return list_size_; | ||
| } |
There was a problem hiding this comment.
需要透出的参数(query_params/index_params) 我看和其他类似产品是有区别的,这里的考量是什么?
There was a problem hiding this comment.
这里和diskann保持一致,使用list size
… feat/diskann_index
Add diskann index into Zvec to lower memory usage in vector search as per the description: #325