Here’s a concise roadmap of three approaches to implement a tail‑like function in Python.
- Full‑file read: Read the entire file into memory and slice the last
nlines; simple but usesO(file_size)memory. - Fixed‑size queue: Stream lines one by one, retaining only the last
nin acollections.deque(maxlen=n), so memory is bounded byO(n). - Backward block buffer: Seek from the file’s end in binary mode, read fixed‑size blocks backward, count newlines until enough are found, then decode and print—uses minimal memory even for very large files.
1. Naïve Full‑File Read
Code Example
def tail_simple(num_lines: int, file_path: str = "./file.log") -> None: with open(file_path, "r", encoding="utf-8", errors="replace") as f:
lines = f.readlines()
for line in lines[-num_lines:]: print(line.rstrip("\n"))Nuances
- Memory usage:
f.readlines()loads every line into a list, so memory scales with file size. - Performance: Fast for small to medium files, but will crash or swap if the file is huge.
2. Bounded Memory via a Queue
Code Example
from collections import deque
def tail_queue(num_lines: int, file_path: str = "./file.log") -> None:
buffer = deque(maxlen=num_lines) with open(file_path, "r", encoding="utf-8", errors="replace") as f: for line in f:
buffer.append(line.rstrip("\n")) for line in buffer: print(line)Nuances
- Memory bound: Uses at most n lines of memory, regardless of file size.
- Streaming: Reads line by line; useful for very large files or when you don’t know file size in advance.
- Overhead: Python’s generator overhead per line; slightly slower than block reads but generally fine.
3. Block‑Buffered Reverse Read
When num_lines is small relative to file size, reading backwards in chunks can be far more efficient.
import os
def tail(num_lines: int, file_path: str = "./file.log") -> None:
with open(file_path, "rb") as f: f.seek(0, os.SEEK_END) file_size = f.tell() buffer = bytearray() lines_found = 0 block_size = 1024
while file_size > 0 and lines_found <= num_lines: read_size = min(block_size, file_size) f.seek(file_size - read_size) data = f.read(read_size) buffer[:0] = data lines_found = buffer.count(b"\n") file_size -= read_size
lines = buffer.splitlines()[-num_lines:] for line in lines: print(line.decode("utf-8"))Nuances
- Binary mode avoids issues with variable‑length encodings when seeking.
- Block size: Choose based on typical line length; too small and you incur many seeks, too large and you waste memory.
- Byte vs. text splitting: Counts
b"\n"and splits on raw bytes, then decodes. - Edge cases: Handles files smaller than
block_size, and files with fewer thannum_lineslines gracefully.
Conclusion
- Full‑file read is easiest to implement but not scalable.
- Deque‑based offers
O(n)memory bound and simple streaming. - Block‑buffered reverse read provides near‑constant memory overhead and is ideal for very large files.
Choose the approach that best fits your file sizes and memory constraints.