Logo
Overview
Implementing a Tail-like Function

Implementing a Tail-like Function

Jun 10, 2025
3 min read

Here’s a concise roadmap of three approaches to implement a tail‑like function in Python.

  • Full‑file read: Read the entire file into memory and slice the last n lines; simple but uses O(file_size) memory.
  • Fixed‑size queue: Stream lines one by one, retaining only the last n in a collections.deque(maxlen=n), so memory is bounded by O(n).
  • Backward block buffer: Seek from the file’s end in binary mode, read fixed‑size blocks backward, count newlines until enough are found, then decode and print - uses minimal memory even for very large files.

1. Naïve Full‑File Read

Code Example

def tail_simple(num_lines: int, file_path: str = "./file.log") -> None:
with open(file_path, "r", encoding="utf-8", errors="replace") as f:
lines = f.readlines()
for line in lines[-num_lines:]:
print(line.rstrip("\n"))

Nuances

  • Memory usage: f.readlines() loads every line into a list, so memory scales with file size.
  • Performance: Fast for small to medium files, but will crash or swap if the file is huge.

2. Bounded Memory via a Queue

Code Example

from collections import deque
def tail_queue(num_lines: int, file_path: str = "./file.log") -> None:
buffer = deque(maxlen=num_lines)
with open(file_path, "r", encoding="utf-8", errors="replace") as f:
for line in f:
buffer.append(line.rstrip("\n"))
for line in buffer:
print(line)

Nuances

  • Memory bound: Uses at most n lines of memory, regardless of file size.
  • Streaming: Reads line by line; useful for very large files or when you don’t know file size in advance.
  • Overhead: Python’s generator overhead per line; slightly slower than block reads but generally fine.

3. Block‑Buffered Reverse Read

When num_lines is small relative to file size, reading backwards in chunks can be far more efficient.

import os
def tail(num_lines: int, file_path: str = "./file.log") -> None:
with open(file_path, "rb") as f:
f.seek(0, os.SEEK_END)
file_size = f.tell()
buffer = bytearray()
lines_found = 0
block_size = 1024
while file_size > 0 and lines_found <= num_lines:
read_size = min(block_size, file_size)
f.seek(file_size - read_size)
data = f.read(read_size)
buffer[:0] = data
lines_found = buffer.count(b"\n")
file_size -= read_size
lines = buffer.splitlines()[-num_lines:]
for line in lines:
print(line.decode("utf-8"))

Nuances

  • Binary mode avoids issues with variable‑length encodings when seeking.
  • Block size: Choose based on typical line length; too small and you incur many seeks, too large and you waste memory.
  • Byte vs. text splitting: Counts b"\n" and splits on raw bytes, then decodes.
  • Edge cases: Handles files smaller than block_size, and files with fewer than num_lines lines gracefully.

Conclusion

  • Full‑file read is easiest to implement but not scalable.
  • Deque‑based offers O(n) memory bound and simple streaming.
  • Block‑buffered reverse read provides near‑constant memory overhead and is ideal for very large files.

Choose the approach that best fits your file sizes and memory constraints.