Skip to content

ARROW-012-015: Add linter rules for remaining Arrow 56.0 breaking changes #8120

@ryanrussell

Description

@ryanrussell

ARROW-012-015: Remaining Arrow 56.0 Breaking Changes

Problem

Several smaller but critical breaking changes in Arrow 56.0 need linter coverage for complete migration support.

API Changes Details

ARROW-012: Int96::to_i64() Removal (Parquet)

-pub fn parquet::data_type::Int96::to_i64(&self) -> i64

ARROW-013: Parquet Level Enum Removal

-pub enum parquet::column::writer::Level
-pub parquet::column::writer::Level::Column  
-pub parquet::column::writer::Level::Page

ARROW-014: Field Dictionary Behavior Changes

Behavioral breaking change in commit ed02131:

  • Field::try_merge() no longer checks dict_id equality
  • Field::contains() no longer compares dict_id fields
    This can cause subtle bugs in schema merging logic.

ARROW-015: New Decimal Types (Enhancement Detection)

+pub arrow_schema::DataType::Decimal32(u8, i8)
+pub arrow_schema::DataType::Decimal64(u8, i8)

Implementation Task

Create src/rules/arrow_012_015_misc_breaking_changes.rs with these specifications:

Rule Implementation

use regex::Regex;
use std::path::Path;
use crate::output::{Issue, Severity};
use crate::rules::Rule;

/// ARROW-012-015: Detect miscellaneous breaking changes in Arrow 56.0
/// 
/// Covers Int96::to_i64 removal, Level enum removal, Field behavior changes,
/// and opportunities to use new Decimal32/64 types.
pub struct Arrow012015Rule;

impl Rule for Arrow012015Rule {
    fn rule_id(&self) -> &'static str {
        "ARROW-012-015"
    }

    fn check_rust_source(&self, file_path: &Path, content: &str) -> Result<Vec<Issue>, Box<dyn std::error::Error>> {
        let mut issues = Vec::new();
        
        for (line_num, line) in content.lines().enumerate() {
            // ARROW-012: Int96::to_i64() removal
            if line.contains(".to_i64()") && (line.contains("Int96") || line.contains("int96")) {
                if let Some(mat) = Regex::new(r"\.to_i64\s*\(\s*\)")?.find(line) {
                    issues.push(Issue {
                        rule_id: "ARROW-012".to_string(),
                        severity: Severity::Error,
                        message: "Int96::to_i64() method was removed in Arrow 56.0".to_string(),
                        file_path: file_path.to_path_buf(),
                        line: line_num + 1,
                        column: mat.start() + 1,
                        suggestion: Some("Use manual conversion: (int96.data[0] as i64) | ((int96.data[1] as i64) << 32). Note: This loses precision if the value exceeds i64 range.".to_string()),
                        auto_fixable: false,
                        deprecated_since: Some("56.0.0".to_string()),
                        changelog_url: Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
                        migration_guide_url: Some("https://arrow.apache.org/docs/rust/migration_guide.html#int96-to-i64".to_string()),
                    });
                }
            }
            
            // ARROW-013: Level enum usage
            if let Some(mat) = Regex::new(r"Level::(Column|Page)\b")?.find(line) {
                issues.push(Issue {
                    rule_id: "ARROW-013".to_string(),
                    severity: Severity::Error,
                    message: "parquet::column::writer::Level enum was removed in Arrow 56.0".to_string(),
                    file_path: file_path.to_path_buf(),
                    line: line_num + 1,
                    column: mat.start() + 1,
                    suggestion: Some("The Level enum and its variants (Column, Page) were removed. Use the appropriate writer configuration methods directly.".to_string()),
                    auto_fixable: false,
                    deprecated_since: Some("56.0.0".to_string()),
                    changelog_url: Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
                    migration_guide_url: Some("https://arrow.apache.org/docs/rust/migration_guide.html#level-enum-removal".to_string()),
                });
            }
            
            // ARROW-014: Field merge/contains behavior change warnings
            if line.contains("try_merge") || line.contains("contains") {
                if line.contains("Field") || line.contains("field") {
                    if let Some(mat) = Regex::new(r"\.(try_merge|contains)\s*\(")?.find(line) {
                        issues.push(Issue {
                            rule_id: "ARROW-014".to_string(),
                            severity: Severity::Warning,
                            message: "Field::try_merge() and Field::contains() behavior changed in Arrow 56.0".to_string(),
                            file_path: file_path.to_path_buf(),
                            line: line_num + 1,
                            column: mat.start() + 1,
                            suggestion: Some("These methods no longer check dict_id equality. If you depend on dict_id comparison, add explicit checks: field1.dict_id == field2.dict_id".to_string()),
                            auto_fixable: false,
                            deprecated_since: Some("56.0.0 (behavior change)".to_string()),
                            changelog_url: Some("https://github.com/apache/arrow-rs/pull/7968".to_string()),
                            migration_guide_url: Some("https://arrow.apache.org/docs/rust/migration_guide.html#field-dict-id-behavior".to_string()),
                        });
                    }
                }
            }
            
            // ARROW-015: Suggest new Decimal32/64 types for small precision decimals
            if let Some(mat) = Regex::new(r"DataType::Decimal128\s*\(\s*([1-9]|1[0-8]),")?.find(line) {
                if let Some(captures) = Regex::new(r"DataType::Decimal128\s*\(\s*(\d+),")?.captures(line) {
                    if let Ok(precision) = captures[1].parse::<u8>() {
                        let suggestion = if precision <= 9 {
                            "Consider using DataType::Decimal32 for better performance with precision ≤9"
                        } else if precision <= 18 {
                            "Consider using DataType::Decimal64 for better performance with precision ≤18"  
                        } else {
                            continue;
                        };
                        
                        issues.push(Issue {
                            rule_id: "ARROW-015".to_string(),
                            severity: Severity::Info,
                            message: "New smaller decimal types available in Arrow 56.0".to_string(),
                            file_path: file_path.to_path_buf(),
                            line: line_num + 1,
                            column: mat.start() + 1,
                            suggestion: Some(suggestion.to_string()),
                            auto_fixable: false,
                            deprecated_since: None,
                            changelog_url: Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
                            migration_guide_url: Some("https://arrow.apache.org/docs/rust/migration_guide.html#new-decimal-types".to_string()),
                        });
                    }
                }
            }
        }
        
        Ok(issues)
    }

    fn check_cargo_toml(&self, _file_path: &Path, _content: &str) -> Result<Vec<Issue>, Box<dyn std::error::Error>> {
        Ok(Vec::new())
    }
}

Tests Required

#[cfg(test)]
mod tests {
    use super::*;
    use std::path::PathBuf;

    #[test]
    fn test_detects_int96_to_i64() {
        let rule = Arrow012015Rule;
        let content = r#"
use parquet::data_type::Int96;

fn convert_int96(value: Int96) -> i64 {
    value.to_i64() // Should trigger ARROW-012
}
"#;
        let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
        assert\!(issues.iter().any(|i| i.rule_id == "ARROW-012"));
    }

    #[test]
    fn test_detects_level_enum() {
        let rule = Arrow012015Rule;
        let content = "let level = Level::Column;";
        let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
        assert\!(issues.iter().any(|i| i.rule_id == "ARROW-013"));
    }

    #[test]
    fn test_warns_field_behavior_change() {
        let rule = Arrow012015Rule;
        let content = r#"
if field1.contains(&field2) {
    field1.try_merge(&field2)?;
}
"#;
        let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
        assert_eq\!(issues.iter().filter(|i| i.rule_id == "ARROW-014").count(), 2);
        assert\!(issues.iter().any(|i| matches\!(i.severity, Severity::Warning)));
    }

    #[test]
    fn test_suggests_decimal32() {
        let rule = Arrow012015Rule;
        let content = "DataType::Decimal128(5, 2)";
        let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
        assert\!(issues.iter().any(|i| i.rule_id == "ARROW-015" && matches\!(i.severity, Severity::Info)));
    }

    #[test]
    fn test_suggests_decimal64() {
        let rule = Arrow012015Rule;
        let content = "DataType::Decimal128(15, 4)";
        let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
        assert\!(issues.iter().any(|i| i.rule_id == "ARROW-015"));
    }

    #[test]
    fn test_no_suggestion_for_large_precision() {
        let rule = Arrow012015Rule;
        let content = "DataType::Decimal128(25, 8)"; // Precision > 18
        let issues = rule.check_rust_source(&PathBuf::from("test.rs"), content).unwrap();
        assert\!(\!issues.iter().any(|i| i.rule_id == "ARROW-015"));
    }
}

Integration Steps

  1. Add to src/rules/mod.rs:

    pub mod arrow_012_015_misc_breaking_changes;
    // In RuleEngine::new():
    rules.push(Box::new(arrow_012_015_misc_breaking_changes::Arrow012015Rule));
  2. Test patterns:

    int96_val.to_i64()                    // ARROW-012 Error
    Level::Column                         // ARROW-013 Error  
    field.try_merge(&other)              // ARROW-014 Warning
    DataType::Decimal128(7, 2)           // ARROW-015 Info

Acceptance Criteria

  • ✅ Detects Int96::to_i64() usage (Error)
  • ✅ Detects Level enum variants (Error)
  • ✅ Warns about Field behavior changes (Warning)
  • ✅ Suggests better decimal types (Info)
  • ✅ Appropriate severity levels for each issue type
  • ✅ Comprehensive test coverage

Migration Summary

  • ARROW-012: Manual Int96 conversion required
  • ARROW-013: Level enum completely removed
  • ARROW-014: Behavior change in Field methods
  • ARROW-015: Performance improvement opportunity

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions