ARROW-012-015: Remaining Arrow 56.0 Breaking Changes
Problem
Several smaller but critical breaking changes in Arrow 56.0 need linter coverage for complete migration support.
API Changes Details
ARROW-012: Int96::to_i64() Removal (Parquet)
-pub fn parquet::data_type::Int96::to_i64(&self) -> i64
ARROW-013: Parquet Level Enum Removal
-pub enum parquet::column::writer::Level
-pub parquet::column::writer::Level::Column
-pub parquet::column::writer::Level::Page
ARROW-014: Field Dictionary Behavior Changes
Behavioral breaking change in commit ed02131:
Field::try_merge() no longer checks dict_id equality
Field::contains() no longer compares dict_id fields
This can cause subtle bugs in schema merging logic.
ARROW-015: New Decimal Types (Enhancement Detection)
+pub arrow_schema::DataType::Decimal32(u8, i8)
+pub arrow_schema::DataType::Decimal64(u8, i8)
Implementation Task
Create src/rules/arrow_012_015_misc_breaking_changes.rs with these specifications:
Rule Implementation
use regex::Regex;
use std::path::Path;
use crate::output::{Issue, Severity};
use crate::rules::Rule;
/// ARROW-012-015: Detect miscellaneous breaking changes in Arrow 56.0
///
/// Covers Int96::to_i64 removal, Level enum removal, Field behavior changes,
/// and opportunities to use new Decimal32/64 types.
pub struct Arrow012015Rule;
impl Rule for Arrow012015Rule {
fn rule_id(&self) -> &'static str {
"ARROW-012-015"
}
fn check_rust_source(&self, file_path: &Path, content: &str) -> Result<Vec<Issue>, Box<dyn std::error::Error>> {
let mut issues = Vec::new();
for (line_num, line) in content.lines().enumerate() {
// ARROW-012: Int96::to_i64() removal
if line.contains(".to_i64()") && (line.contains("Int96") || line.contains("int96")) {
if let Some(mat) = Regex::new(r"\.to_i64\s*\(\s*\)")?.find(line) {
issues.push(Issue {
rule_id: "ARROW-012".to_string(),
severity: Severity::Error,
message: "Int96::to_i64() method was removed in Arrow 56.0".to_string(),
file_path: file_path.to_path_buf(),
line: line_num + 1,
column: mat.start() + 1,
suggestion: Some("Use manual conversion: (int96.data[0] as i64) | ((int96.data[1] as i64) << 32). Note: This loses precision if the value exceeds i64 range.".to_string()),
auto_fixable: false,
deprecated_since: Some("56.0.0".to_string()),
changelog_url: Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
migration_guide_url: Some("https://arrow.apache.org/docs/rust/migration_guide.html#int96-to-i64".to_string()),
});
}
}
// ARROW-013: Level enum usage
if let Some(mat) = Regex::new(r"Level::(Column|Page)\b")?.find(line) {
issues.push(Issue {
rule_id: "ARROW-013".to_string(),
severity: Severity::Error,
message: "parquet::column::writer::Level enum was removed in Arrow 56.0".to_string(),
file_path: file_path.to_path_buf(),
line: line_num + 1,
column: mat.start() + 1,
suggestion: Some("The Level enum and its variants (Column, Page) were removed. Use the appropriate writer configuration methods directly.".to_string()),
auto_fixable: false,
deprecated_since: Some("56.0.0".to_string()),
changelog_url: Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
migration_guide_url: Some("https://arrow.apache.org/docs/rust/migration_guide.html#level-enum-removal".to_string()),
});
}
// ARROW-014: Field merge/contains behavior change warnings
if line.contains("try_merge") || line.contains("contains") {
if line.contains("Field") || line.contains("field") {
if let Some(mat) = Regex::new(r"\.(try_merge|contains)\s*\(")?.find(line) {
issues.push(Issue {
rule_id: "ARROW-014".to_string(),
severity: Severity::Warning,
message: "Field::try_merge() and Field::contains() behavior changed in Arrow 56.0".to_string(),
file_path: file_path.to_path_buf(),
line: line_num + 1,
column: mat.start() + 1,
suggestion: Some("These methods no longer check dict_id equality. If you depend on dict_id comparison, add explicit checks: field1.dict_id == field2.dict_id".to_string()),
auto_fixable: false,
deprecated_since: Some("56.0.0 (behavior change)".to_string()),
changelog_url: Some("https://github.com/apache/arrow-rs/pull/7968".to_string()),
migration_guide_url: Some("https://arrow.apache.org/docs/rust/migration_guide.html#field-dict-id-behavior".to_string()),
});
}
}
}
// ARROW-015: Suggest new Decimal32/64 types for small precision decimals
if let Some(mat) = Regex::new(r"DataType::Decimal128\s*\(\s*([1-9]|1[0-8]),")?.find(line) {
if let Some(captures) = Regex::new(r"DataType::Decimal128\s*\(\s*(\d+),")?.captures(line) {
if let Ok(precision) = captures[1].parse::<u8>() {
let suggestion = if precision <= 9 {
"Consider using DataType::Decimal32 for better performance with precision ≤9"
} else if precision <= 18 {
"Consider using DataType::Decimal64 for better performance with precision ≤18"
} else {
continue;
};
issues.push(Issue {
rule_id: "ARROW-015".to_string(),
severity: Severity::Info,
message: "New smaller decimal types available in Arrow 56.0".to_string(),
file_path: file_path.to_path_buf(),
line: line_num + 1,
column: mat.start() + 1,
suggestion: Some(suggestion.to_string()),
auto_fixable: false,
deprecated_since: None,
changelog_url: Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
migration_guide_url: Some("https://arrow.apache.org/docs/rust/migration_guide.html#new-decimal-types".to_string()),
});
}
}
}
}
Ok(issues)
}
fn check_cargo_toml(&self, _file_path: &Path, _content: &str) -> Result<Vec<Issue>, Box<dyn std::error::Error>> {
Ok(Vec::new())
}
}
Tests Required
#[cfg(test)]
mod tests {
use super::*;
use std::path::PathBuf;
#[test]
fn test_detects_int96_to_i64() {
let rule = Arrow012015Rule;
let content = r#"
use parquet::data_type::Int96;
fn convert_int96(value: Int96) -> i64 {
value.to_i64() // Should trigger ARROW-012
}
"#;
let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
assert\!(issues.iter().any(|i| i.rule_id == "ARROW-012"));
}
#[test]
fn test_detects_level_enum() {
let rule = Arrow012015Rule;
let content = "let level = Level::Column;";
let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
assert\!(issues.iter().any(|i| i.rule_id == "ARROW-013"));
}
#[test]
fn test_warns_field_behavior_change() {
let rule = Arrow012015Rule;
let content = r#"
if field1.contains(&field2) {
field1.try_merge(&field2)?;
}
"#;
let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
assert_eq\!(issues.iter().filter(|i| i.rule_id == "ARROW-014").count(), 2);
assert\!(issues.iter().any(|i| matches\!(i.severity, Severity::Warning)));
}
#[test]
fn test_suggests_decimal32() {
let rule = Arrow012015Rule;
let content = "DataType::Decimal128(5, 2)";
let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
assert\!(issues.iter().any(|i| i.rule_id == "ARROW-015" && matches\!(i.severity, Severity::Info)));
}
#[test]
fn test_suggests_decimal64() {
let rule = Arrow012015Rule;
let content = "DataType::Decimal128(15, 4)";
let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
assert\!(issues.iter().any(|i| i.rule_id == "ARROW-015"));
}
#[test]
fn test_no_suggestion_for_large_precision() {
let rule = Arrow012015Rule;
let content = "DataType::Decimal128(25, 8)"; // Precision > 18
let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
assert\!(\!issues.iter().any(|i| i.rule_id == "ARROW-015"));
}
}
Integration Steps
-
Add to src/rules/mod.rs:
pub mod arrow_012_015_misc_breaking_changes;
// In RuleEngine::new():
rules.push(Box::new(arrow_012_015_misc_breaking_changes::Arrow012015Rule));
-
Test patterns:
int96_val.to_i64() // ARROW-012 Error
Level::Column // ARROW-013 Error
field.try_merge(&other) // ARROW-014 Warning
DataType::Decimal128(7, 2) // ARROW-015 Info
Acceptance Criteria
- ✅ Detects Int96::to_i64() usage (Error)
- ✅ Detects Level enum variants (Error)
- ✅ Warns about Field behavior changes (Warning)
- ✅ Suggests better decimal types (Info)
- ✅ Appropriate severity levels for each issue type
- ✅ Comprehensive test coverage
Migration Summary
- ARROW-012: Manual Int96 conversion required
- ARROW-013: Level enum completely removed
- ARROW-014: Behavior change in Field methods
- ARROW-015: Performance improvement opportunity
ARROW-012-015: Remaining Arrow 56.0 Breaking Changes
Problem
Several smaller but critical breaking changes in Arrow 56.0 need linter coverage for complete migration support.
API Changes Details
ARROW-012: Int96::to_i64() Removal (Parquet)
ARROW-013: Parquet Level Enum Removal
ARROW-014: Field Dictionary Behavior Changes
Behavioral breaking change in commit ed02131:
Field::try_merge()no longer checksdict_idequalityField::contains()no longer comparesdict_idfieldsThis can cause subtle bugs in schema merging logic.
ARROW-015: New Decimal Types (Enhancement Detection)
Implementation Task
Create
src/rules/arrow_012_015_misc_breaking_changes.rswith these specifications:Rule Implementation
Tests Required
Integration Steps
Add to
src/rules/mod.rs:Test patterns:
Acceptance Criteria
Migration Summary