Here’s a concise roadmap of three approaches to implement a tail‑like function in Python.
- Full‑file read: Read the entire file into memory and slice the last
nlines; simple but usesO(file_size)memory. - Fixed‑size queue: Stream lines one by one, retaining only the last
nin acollections.deque(maxlen=n), so memory is bounded byO(n). - Backward block buffer: Seek from the file’s end in binary mode, read fixed‑size blocks backward, count newlines until enough are found, then decode and print - uses minimal memory even for very large files.
1. Naïve Full‑File Read
Code Example
def tail_simple(num_lines: int, file_path: str = "./file.log") -> None: with open(file_path, "r", encoding="utf-8", errors="replace") as f:
lines = f.readlines()
for line in lines[-num_lines:]: print(line.rstrip("\n"))Nuances
- Memory usage:
f.readlines()loads every line into a list, so memory scales with file size. - Performance: Fast for small to medium files, but will crash or swap if the file is huge.
2. Bounded Memory via a Queue
Code Example
from collections import deque
def tail_queue(num_lines: int, file_path: str = "./file.log") -> None:
buffer = deque(maxlen=num_lines) with open(file_path, "r", encoding="utf-8", errors="replace") as f: for line in f:
buffer.append(line.rstrip("\n")) for line in buffer: print(line)Nuances
- Memory bound: Uses at most n lines of memory, regardless of file size.
- Streaming: Reads line by line; useful for very large files or when you don’t know file size in advance.
- Overhead: Python’s generator overhead per line; slightly slower than block reads but generally fine.
3. Block‑Buffered Reverse Read
When num_lines is small relative to file size, reading backwards in chunks can be far more efficient.
import os
def tail(num_lines: int, file_path: str = "./file.log") -> None:
with open(file_path, "rb") as f: f.seek(0, os.SEEK_END) file_size = f.tell() buffer = bytearray() lines_found = 0 block_size = 1024
while file_size > 0 and lines_found <= num_lines: read_size = min(block_size, file_size) f.seek(file_size - read_size) data = f.read(read_size) buffer[:0] = data lines_found = buffer.count(b"\n") file_size -= read_size
lines = buffer.splitlines()[-num_lines:] for line in lines: print(line.decode("utf-8"))Nuances
- Binary mode avoids issues with variable‑length encodings when seeking.
- Block size: Choose based on typical line length; too small and you incur many seeks, too large and you waste memory.
- Byte vs. text splitting: Counts
b"\n"and splits on raw bytes, then decodes. - Edge cases: Handles files smaller than
block_size, and files with fewer thannum_lineslines gracefully.
Conclusion
- Full‑file read is easiest to implement but not scalable.
- Deque‑based offers
O(n)memory bound and simple streaming. - Block‑buffered reverse read provides near‑constant memory overhead and is ideal for very large files.
Choose the approach that best fits your file sizes and memory constraints.