MDEV-37070 per-index adaptive hash index parameters#4507
Conversation
|
|
|
|
|
Yes, the combinations I'm tuning the test. diff --git a/mysql-test/suite/innodb/t/index_ahi_option.test b/mysql-test/suite/innodb/t/index_ahi_option.test
index 5b86c2a250d..7c92eab581e 100644
--- a/mysql-test/suite/innodb/t/index_ahi_option.test
+++ b/mysql-test/suite/innodb/t/index_ahi_option.test
@@ -46,8 +46,8 @@ INSERT INTO t1 VALUES (50001, 50, 1, 1), (50002, 50, 2, 2),
--echo #
SET GLOBAL innodb_monitor_enable = module_adaptive_hash;
-let $warm_up_rounds= 200;
-let $query_rounds= 10;
+let $warm_up_rounds= 0;
+let $query_rounds= 140;
--echo # Warming up AHI
let $i = $warm_up_rounds;
@@ -194,7 +194,7 @@ let $val0 = $val1;
ALTER TABLE t1 adaptive_hash_index='ON';
--echo # Warming up AHI
-let $i = 200;
+let $i = $warm_up_rounds;
while ($i)
{
--disable_query_logI'll run again the tests with |
dr-m
left a comment
There was a problem hiding this comment.
It’s a great idea to combine several fields into a single 32-bit atomic field. But, some "pretty" code can actually result in some quite ugly machine code (with conditional branches).
| - bit 23: fields valid | ||
| - bit 24: bytes valid | ||
| - bit 25: left valid | ||
| - bits [26, 31]: spare (6 bits) |
There was a problem hiding this comment.
It could be better to allocate something useful (such as part of the enabled/disabled preference) at the most significant bit. For example, on x86, the test instruction would allow the most significant bit to be copied to a flag.
There was a problem hiding this comment.
I kept this as is in the last version, since I did not want to change too much the new inline bool is_enabled(const dict_index_t *index) const noexcept logic. Is this acceptable?
e445a83 to
769793f
Compare
769793f to
fbc8869
Compare
fbc8869 to
972c1fd
Compare
|
All the commits have been squashed into one with description updated accordingly. @Thirunarayanan I believe review can start anyway since there's a lot to cover, I'll address the TODO and ping you when it's done if you agree. Thanks. |
fbdc7b9 to
67a57fc
Compare
|
@Thirunarayanan the TODO has been addressed by adding a comment in |
Thirunarayanan
left a comment
There was a problem hiding this comment.
Few test case scenarios are missing
- ALTER TABLE AHI=0 with concurrent AHI Access
- Buffer pool resize with ahi setting change
- Crash recovery with ahi just to see how it retains for the table metadata
- AHI in partition table
- Verify altering ahi is instant DDL
- Boundary check for complete_fields, bytes_from_incomplete_fields
|
|
||
| innobase_copy_frm_flags_from_table_share( | ||
| ctx->new_table, altered_table->s); | ||
| innobase_copy_frm_flags_from_table( |
There was a problem hiding this comment.
IIUC, alter table ahi=0 is no rebuild operation. When do we drop ahi entries of the indexes if we disable ahi for the table? Do we do it lazily somewhere? Since it is commit phase, there will be no other thread will read from ahi at this point
There was a problem hiding this comment.
Current behavior will keep the entries in the buffer pool until the pages are evicted.
In btr_search_update_hash_on_insert(), the stale entries will still be updated, unless we add check that btr_search.is_enabled(index) in the early return path:
if (!index)
return;
Currently, the tests run ALTER TABLE with ALGORITHM=INSTANT, LOCK=NONE, for example:
ALTER TABLE t1 adaptive_hash_index=OFF, ALGORITHM=INSTANT, LOCK=NONE;
If we drop the AHI entries on ALTER TABLE, it may delay the DDL execution, but will clean earlier the buffer pool and AHI partition.
I suppose the change in btr_search_update_hash_on_insert() may be implemented anyway.
What is the preferred way to proceed here?
There was a problem hiding this comment.
4f12d3c added early return in btr_search_update_hash_on_insert() if the AHI index is disabled for the index, preventing maintenance for indices that won't use the AHI.
| @param innodb_table InnoDB table definition | ||
| @param option_struct MariaDB table options | ||
| @param table MariaDB table handle */ | ||
| static void innodb_ahi_enable(dict_table_t *innodb_table, |
There was a problem hiding this comment.
Are we missing the boundary check like
CREATE TABLE t1 (
id INT PRIMARY KEY,
col1 VARCHAR(100),
INDEX idx3 (col1)
complete_fields=0
bytes_from_incomplete_field=0
) ENGINE=InnoDB;
Why complete_fields and bytes_from_incomplete_field is 0?
CREATE TABLE t2 (
id INT PRIMARY KEY,
col1 INT,
INDEX idx2 (col1)
complete_fields=5
) ENGINE=InnoDB;
Even though index has 1 field. We should also check whether bytes_from_incomplete_field is used when complete_field + 1 exist. Does it make sense to use bytes_from_incomplete_field for integer? (should we restrict bytes_from_incomplete_field to text, varchar, blob, varbinary?)
There was a problem hiding this comment.
There is a test in index_ahi_option_debug.test that the complete_fields=0 bytes_from_incomplete_field=0 results in no AHI usage:
ALTER TABLE t1_ahi_yes ADD INDEX idx_6 (col2, col1) complete_fields=0 bytes_from_incomplete_field=0 for_equal_hash_point_to_last_record=0;
ALTER TABLE t1_ahi_yes ADD INDEX idx_7 (col3, col2) complete_fields=0 bytes_from_incomplete_field=0 for_equal_hash_point_to_last_record=1;
...
CALL run_and_check_idx(240, 6, 't1_ahi_yes');
result_msg
# No AHI used in SELECT (idx_6)
CALL run_and_check_idx(240, 7, 't1_ahi_yes');
result_msg
# No AHI used in SELECT (idx_7)
However, you are right that we can disable the AHI earlier to avoid useless runtime overhead. It's a useless combination.
In the same test, combinations the exceed the valid boundaries are tested as well:
ALTER TABLE t1_ahi_yes ADD INDEX idx_4 (col2) complete_fields=2;
ALTER TABLE t1_ahi_yes ADD INDEX idx_5 (col3) bytes_from_incomplete_field=5;
...
CALL run_and_check_idx(240, 4, 't1_ahi_yes');
result_msg
# No AHI used in SELECT (idx_4)
CALL run_and_check_idx(240, 5, 't1_ahi_yes');
result_msg
# No AHI used in SELECT (idx_5)
...
It seems those result in no AHI usage as well.
innodb_ahi_enable() is void, it may be complex to propagate a failure to the caller.
What could be done, is adjusting the values internally in innodb_ahi_enable(), to not provide "extra" fields/bytes nor useless combination complete_fields=0 bytes_from_incomplete_field=0 to btr_search_info_update_hash(), I'll check it.
I think that using bytes_from_incomplete_field for integer may be interesting if one only wants to hash the high bytes (big endian storage), which is a strange use case, but legit.
There was a problem hiding this comment.
4f12d3c adds self-correction of the options in innodb_ahi_enable(). The only option which is not corrected is bytes_from_incomplete_field when referring to a valid field exceeding its width, since rec_fold() is already clamping the n_bytes value to the maximum allowed.
|
Please note that there are some mroonga related failures introduced by this PR that need to be addressed before merging. |
|
mroonga related failures were not fixed by the rebase. It seems the |
| static_assert(uint32_t(true) == 1); | ||
| static_assert(uint32_t(false) == 0); | ||
| const uint32_t val= (uint32_t(enabled & enabled_mask) << enabled_shift) | | ||
| (uint32_t(fields & fields_mask) << fields_shift) | |
There was a problem hiding this comment.
Precompute these value as well.
static constexpr uint32_t enabled_full_mask = enabled_mask << enabled_shift;
static constexpr uint32_t fields_full_mask = fields_mask << fields_shift;
There was a problem hiding this comment.
I prefer to keep the "mask then shift" pattern for encoding the parameters into ahi_enabled_fixed_mask, so here *_full_mask are not needed, I hope that's acceptable
| static_assert(is_bytes_set_mask == 1); | ||
| static_assert(is_left_set_mask == 1); | ||
| static_assert((0 - uint32_t(1)) == 0xFFFFFFFF); | ||
| mask= ((0 - is_fields_set) & 0x0000FFFF) | |
There was a problem hiding this comment.
For cleaner code, add constexpr for constants.
FIELDS_MASK_FULL= 0x0000FFFF;
BYTES_MASK_FULL, LEFT_MASK_FULL
There was a problem hiding this comment.
Right, but the pre-existing usages of buf_block_t::left_bytes_fields (which defined structure of fixed and mask here) are mostly "hardcoded" (shifts/masks), so I think it would be better to refactor it altogether and leave this as is for now
ha_mroonga ha_index_option_struct is different from InnoDB's ha_index_option_struct. |
c4dc88f to
65b7f36
Compare
dr-m
left a comment
There was a problem hiding this comment.
This is OK to me after reverting the sizeof(ahi) increase.
Can we rebase this to the latest main branch and squash to as few commits as possible? I guess we want to keep the original implementation in a separate commit.
To whoever who wonders why I seem to be reviewing my own work: This was joint work, and I mainly contributed ideas, not actual implementation.
| info.left_bytes_fields= left_bytes_fields= buf_block_t::LEFT_SIDE | 1; | ||
| left_bytes_fields= buf_block_t::LEFT_SIDE | 1; | ||
| /* Override with fixed values */ | ||
| ut_ad(!fixed_mask_set); | ||
| ahi_enabled_fixed_mask= info.get_ahi_enabled_fixed_mask(); | ||
| info.get_ahi_fixed_mask(ahi_enabled_fixed_mask, fixed, mask); | ||
| ut_d(fixed_mask_set= true); | ||
| left_bytes_fields= (left_bytes_fields & ~mask) | (fixed & mask); | ||
| info.left_bytes_fields= left_bytes_fields; | ||
| info.hash_analysis_reset(); |
There was a problem hiding this comment.
Can we assert and assign the fixed_mask_set next to each other? That would make the logic a little easier to follow.
On AMD64, 16-bit immediate constants can be encoded efficiently. In plain x86 or i386, instructions with a small immediate constant, such as and $0x3ff,%cx, can be encoded in 5 bytes. A 32-bit constant, such as mov $0x80000001,%edx or or $0x80000000,%edx, would be 5 or 6 bytes. The REX prefix, which is needed for using the additional AMD64 registers r8 through r15, would add 1 more byte. I don’t see any signficant room for improvement with regard to this. The generated non-debug code for these lines (the info.hash_analysis_reset() is being emitted out of order as the penultimate instruction in the output below) looks like this:
0x000000000103bc7b <+635>: mov 0x6c(%rax),%ecx
0x000000000103bc7e <+638>: mov %ecx,%edx
0x000000000103bc80 <+640>: shr $0x18,%edx
0x000000000103bc83 <+643>: mov %ecx,%esi
0x000000000103bc85 <+645>: shr $0x19,%esi
0x000000000103bc88 <+648>: and $0x1,%edx
0x000000000103bc8b <+651>: and $0x1,%esi
0x000000000103bc8e <+654>: mov %ecx,%r8d
0x000000000103bc91 <+657>: shr $0x17,%r8d
0x000000000103bc95 <+661>: neg %edx
0x000000000103bc97 <+663>: neg %esi
0x000000000103bc99 <+665>: and $0x7fff0000,%esi
0x000000000103bc9f <+671>: movzwl %dx,%edx
0x000000000103bca2 <+674>: or %esi,%edx
0x000000000103bca4 <+676>: mov %ecx,%esi
0x000000000103bca6 <+678>: shr $0x1a,%esi
0x000000000103bca9 <+681>: shl $0x1f,%esi
0x000000000103bcac <+684>: or %esi,%edx
0x000000000103bcae <+686>: mov %ecx,%esi
0x000000000103bcb0 <+688>: shr $0x2,%esi
0x000000000103bcb3 <+691>: shl $0x1f,%r8d
0x000000000103bcb7 <+695>: shl $0x7,%ecx
0x000000000103bcba <+698>: and $0x7f,%esi
0x000000000103bcbd <+701>: and $0x3fff0000,%ecx
0x000000000103bcc3 <+707>: or %r8d,%esi
0x000000000103bcc6 <+710>: or %esi,%ecx
0x000000000103bcc8 <+712>: xor $0x80000001,%ecx
0x000000000103bcce <+718>: and %ecx,%edx
0x000000000103bcd0 <+720>: xor $0x80000001,%edx
0x000000000103bcd6 <+726>: mov %edx,0x70(%rax)
0x000000000103bcd9 <+729>: movb $0x0,0x68(%rax)
0x000000000103bcdd <+733>: movb $0x1,0x69(%rax)The main thing is that there are no conditional branches here. But there is some back-and-forth arithmetic. I was trying to optimize this, but got 4 bytes more code size:
@@ -1108,7 +1123,7 @@ struct dict_index_t {
static constexpr uint32_t is_bytes_set_shift=
is_fields_set_shift + is_fields_set_bits;
static constexpr uint32_t is_left_set_shift=
- is_bytes_set_shift + is_bytes_set_bits; /* Last */
+ 31; /* Last */
static_assert(is_left_set_shift + is_left_set_bits <=
sizeof(ahi_enabled_fixed_mask) * 8,
@@ -1208,15 +1212,15 @@ struct dict_index_t {
const uint32_t left= (val >> left_shift) & left_mask;
const uint32_t bytes= (val >> bytes_shift) & bytes_mask;
const uint32_t fields= (val >> fields_shift) & fields_mask;
- const uint32_t is_left_set=
- (val >> is_left_set_shift) & is_left_set_mask;
+ static_assert(is_left_set_shift == 31);
+ mask= ~val & 1U << is_left_set_shift;
const uint32_t is_bytes_set=
(val >> is_bytes_set_shift) & is_bytes_set_mask;
const uint32_t is_fields_set=
(val >> is_fields_set_shift) & is_fields_set_mask;
ut_ad(!is_fields_set || fields <= fields_mask);
ut_ad(!is_bytes_set || bytes <= bytes_mask);
- ut_ad(!is_left_set || left <= left_mask);
+ ut_ad(mask || left <= left_mask);
/* See buf_block_t::left_bytes_fields */
static_assert(left_mask < (1U << 1));
static_assert(bytes_mask < (1U << 15));
@@ -1225,11 +1229,9 @@ struct dict_index_t {
/* Just to highlight arithmetic used below */
static_assert(is_fields_set_mask == 1);
static_assert(is_bytes_set_mask == 1);
- static_assert(is_left_set_mask == 1);
static_assert((0 - uint32_t(1)) == 0xFFFFFFFF);
- mask= ((0 - is_fields_set) & 0x0000FFFF) |
- ((0 - is_bytes_set) & 0x7FFF0000) |
- ((0 - is_left_set) & 0x80000000);
+ mask|= ((0 - is_fields_set) & 0x0000FFFF) |
+ ((0 - is_bytes_set) & 0x7FFF0000);
}
# ifdef UNIV_SEARCH_PERF_STATThe only optimization that I could came up with is the following, to use a single 16-bit store instead of two 8-bit stores, for the info.hash_analysis=0 and info.n_hash_potential=1 that the compiler emitted for this code branch. But, this code should be executed rarely, given that the code path is guarded by hash_analysis_useful(),
There was a problem hiding this comment.
Debug assertion and assignment are now next to each other.
| /* Since enabled comes from the same atomic variable, it's coherent */ | ||
| const uint8_t enabled= info.get_ahi_enabled(ahi_enabled_fixed_mask); |
There was a problem hiding this comment.
Can we replace the fixed_ with something more descriptive, such as dict_? After all, this mask is persisted in the data dictionary.
There was a problem hiding this comment.
Fixed here stands for "fixed values, as by user recommendation, for (left, bytes, fields) parameters of the AHI", so dict_ would not be an alternative. I added more documentation in the dict_index_t::ahi struct though so it may be easier to read it that way now.
Added ADAPTIVE_HASH_INDEX=DEFAULT|YES|NO table and index option to InnoDB.
The table and index options only have an effect if InnoDB adaptive hash
index feature is enabled.
- Having the ADAPTIVE_HASH_INDEX TABLE option set to NO will disable
adaptive hash index for all indexes in the table that does not have
the index option adaptive_hash_index=yes.
- Having the ADAPTIVE_HASH_INDEX TABLE option set to YES will enable the
adaptive hash index for all indexes in the table that does not have
the index option adaptive_hash_index=no.
- Using adaptive_hash_index=default deletes the old setting.
- One can also use OFF/ON as the options. This is to make it work similar
as other existing options.
- innodb.adaptive_hash_index has been changed from a bool to an enum with
values OFF, ON and IF_SPECIFIED. If IF_SPECIFIED is used, adaptive
hash index are only used for tables and indexes that specifies
adaptive_hash_index=on.
- The following new options can be used to further optimize adaptive hash
index for an index (default is unset/auto for all of them):
- complete_fields:
- 0 to the number of columns the key is defined on (max 64)
- bytes_from_incomplete_field:
- This is only usable for memcmp() comparable index fields, such as
VARBINARY or INT. For example, a 3-byte prefix on an INT will
return an identical hash value for 0‥255, another one for 256‥511,
and so on.
- Range is min 0 max 16383.
- for_equal_hash_point_to_last_record
- Default is unset/auto, NO points to the first record, known as
left_side in the code; YES points to the last record.
Example: we have an INT column with the values 1,4,10 and bytes=3,
will that hash value point to the record 1 or the record 10?
Note: all values will necessarily have the same hash value
computed on the big endian byte prefix 0x800000, for all of the
values 0x80000001, 0x80000004, 0x8000000a. InnoDB inverts the
sign bit in order to have memcmp() compatible comparison
Example:
CREATE TABLE t1 (a int primary key, b varchar(100), c int,
index (b) adaptive_hash_index=no, index (c))
engine=innodb, adaptive_hash_index=yes;
Notable changes in InnoDB
- btr_search.enabled was changed from a bool to a ulong to be
able to handle options OFF, ON as IF_ENABLED. ulong is needed
to compile with MariaDB enum variables.
- To be able to find all instances where btr_search.enabled was used
I changed all code to use btr_search.get_enabled() when accessing
the value and used btr_search.is_enabled(index) to test if AHI is
enabled for the index.
- btr_search.enable() was changed to always take two parameters,
resize and value of enabled. This was needed as enabled can now
have values 0, 1, and 2.
- store all AHI related options in per-index `dict_index_t::ahi`
bit-packed 32-bit atomic field `ahi_enabled_fixed_mask`
- static assertions and debug assertions ensure that all options fit
into the 32-bit field
- packing details:
- `enabled`, `adaptive_hash_index` (first 2 bits)
- `fields`, `complete_fields` (7 bit)
- `bytes`, `bytes_from_incomplete_field` (14 bits)
- `left`, `~for_equal_hash_point_to_last_record` (1 bit)
- `is_fields_set`, `fields` set flag (1 bit)
- `is_bytes_set`, `bytes` set flag (1 bit)
- `is_left_set`, `left` set flag (last 1 bit)
- 5 bits spare after `is_left_set`
- manipulation of the bit-packed field avoids usage of branches or
conditional instructions to minimize the performance impact of
the new options
- in `btr_search_update_hash_ref` apply the per-index AHI options
using bit-masking to override internal heuristic values with user
preferences
- add `innodb.index_ahi_option` test:
- test a combination of per-table and per-index AHI options
- use a stored procedure which checks if AHI is used during a burst of
index lookups checking delta in `adaptive_hash_searches` InnoDB
monitor variable
- test that the maximum number of fields per (secondary) index is 64
(32+32)
- add `innodb.index_ahi_option_debug` test:
- test debug builds with `index_ahi_option_debug_check` debug variable
enabled to verify that the proper per-index AHI options are applied
during index lookups
- test that illegal per-index AHI are non-destructive and just lead to
no AHI usage
Visible user changes:
- select @@global.adaptive_hash_index will now return a string instead
of 0 or 1.
Other notable changes:
- In `sql/create_options.cc`:
- In function `parse_engine_part_options` do allocate table options
in share root to avoid MSAN/ASAN errors due to use after free of
`option_struct` in test `parts.partition_special_innodb`.
- In function `set_one_value` avoid reading after the end of the
current string.
Co-authored-by: Monty <monty@mariadb.org>
Co-authored-by: Marko Mäkelä <marko.makela@mariadb.com>
Co-authored-by: Alessandro Vetere <iminelink@gmail.com>
Co-authored-by: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>
|
@dr-m I squashed toghether the commits before the review, rebased on latest main (no conflicts) and pushed a fixup commit to address your review points. You might want to review this latest commit. Thank you. |
Description
This is based on work by @montywi. Some InnoDB changes will be needed, as noted in MDEV-37070.
Release Notes
TBD
How can this PR be tested?
Basing the PR against the correct MariaDB version
mainbranch.PR quality check