Skip to content

Add StreamReader module for efficient file reading #144

@hbisneto

Description

@hbisneto

Implement a new module StreamReader in the filesystem package to provide efficient, scalable, and memory-safe file reading capabilities.


Motivation

Currently, filesystem.file does not provide a structured or scalable way to read files, especially large ones.

Reading entire files into memory (read_all) can lead to:

  • High memory usage
  • Application crashes with large files
  • Poor performance in data processing scenarios

The StreamReader module will address these limitations by enabling streaming-based file access.


Proposed Features

  • Context manager support (with statement)
  • Line-by-line iteration (memory efficient)
  • Chunk-based reading
  • Optional full file read (explicit and with caution)
  • Encoding support (including future auto-detection)

Proposed API

from filesystem.streamreader import StreamReader

with StreamReader("file.txt", encoding="utf-8") as sr:
    for line in sr:
        print(line)

Additional methods:

sr.read()         # Read entire file (use with caution)
sr.readline()     # Read single line
sr.readlines()    # List
sr.iterlines()    # Generator (streaming)
sr.read_chunks()  # Generator yielding file chunks

Suggested Structure

filesystem/
├── file/
│   └── __init__.py
├── streamreader/
│   └── __init__.py


Implementation Notes

  • Use __enter__ and __exit__ for context management
  • Implement __iter__ to allow direct iteration over lines
  • __iter__ should yield lines lazily without loading the entire file into memory
  • Avoid loading entire file into memory unless explicitly requested
  • Ensure file is always properly closed
  • Prepare for future integration with encoding detection module
  • Design the class to be extendable for binary reading in the future

Considerations

  • Do not mix return types (string vs generator) in the same method
  • Keep API explicit to avoid misuse
  • Maintain consistency with Python naming conventions
  • Avoid implicit behavior based on file size (no automatic switching between full read and streaming)

Warning

Using read() on very large files may lead to high memory usage.
Prefer iteration (for line in sr) for large files.


Goal

Provide a robust, scalable, and Pythonic alternative to naive file reading methods, enabling FileSystemPro to handle both small and very large files efficiently.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions