Implement a new module StreamReader in the filesystem package to provide efficient, scalable, and memory-safe file reading capabilities.
Motivation
Currently, filesystem.file does not provide a structured or scalable way to read files, especially large ones.
Reading entire files into memory (read_all) can lead to:
- High memory usage
- Application crashes with large files
- Poor performance in data processing scenarios
The StreamReader module will address these limitations by enabling streaming-based file access.
Proposed Features
- Context manager support (
with statement)
- Line-by-line iteration (memory efficient)
- Chunk-based reading
- Optional full file read (explicit and with caution)
- Encoding support (including future auto-detection)
Proposed API
from filesystem.streamreader import StreamReader
with StreamReader("file.txt", encoding="utf-8") as sr:
for line in sr:
print(line)
Additional methods:
sr.read() # Read entire file (use with caution)
sr.readline() # Read single line
sr.readlines() # List
sr.iterlines() # Generator (streaming)
sr.read_chunks() # Generator yielding file chunks
Suggested Structure
filesystem/
├── file/
│ └── __init__.py
├── streamreader/
│ └── __init__.py
Implementation Notes
- Use
__enter__ and __exit__ for context management
- Implement
__iter__ to allow direct iteration over lines
__iter__ should yield lines lazily without loading the entire file into memory
- Avoid loading entire file into memory unless explicitly requested
- Ensure file is always properly closed
- Prepare for future integration with encoding detection module
- Design the class to be extendable for binary reading in the future
Considerations
- Do not mix return types (string vs generator) in the same method
- Keep API explicit to avoid misuse
- Maintain consistency with Python naming conventions
- Avoid implicit behavior based on file size (no automatic switching between full read and streaming)
Warning
Using read() on very large files may lead to high memory usage.
Prefer iteration (for line in sr) for large files.
Goal
Provide a robust, scalable, and Pythonic alternative to naive file reading methods, enabling FileSystemPro to handle both small and very large files efficiently.
Implement a new module
StreamReaderin thefilesystempackage to provide efficient, scalable, and memory-safe file reading capabilities.Motivation
Currently,
filesystem.filedoes not provide a structured or scalable way to read files, especially large ones.Reading entire files into memory (
read_all) can lead to:The
StreamReadermodule will address these limitations by enabling streaming-based file access.Proposed Features
withstatement)Proposed API
Additional methods:
Suggested Structure
Implementation Notes
__enter__and__exit__for context management__iter__to allow direct iteration over lines__iter__should yield lines lazily without loading the entire file into memoryConsiderations
Warning
Using
read()on very large files may lead to high memory usage.Prefer iteration (
for line in sr) for large files.Goal
Provide a robust, scalable, and Pythonic alternative to naive file reading methods, enabling FileSystemPro to handle both small and very large files efficiently.