Logo
Overview
Implementing a Tail-like Function

Implementing a Tail-like Function

Jun 10, 2025
2 min read

Here’s a concise roadmap of three approaches to implement a tail‑like function in Python.

  • Full‑file read: Read the entire file into memory and slice the last n lines; simple but uses O(file_size) memory.
  • Fixed‑size queue: Stream lines one by one, retaining only the last n in a collections.deque(maxlen=n), so memory is bounded by O(n).
  • Backward block buffer: Seek from the file’s end in binary mode, read fixed‑size blocks backward, count newlines until enough are found, then decode and print—uses minimal memory even for very large files.

1. Naïve Full‑File Read

Code Example

def tail_simple(num_lines: int, file_path: str = "./file.log") -> None:
with open(file_path, "r", encoding="utf-8", errors="replace") as f:
lines = f.readlines()
for line in lines[-num_lines:]:
print(line.rstrip("\n"))

Nuances

  • Memory usage: f.readlines() loads every line into a list, so memory scales with file size.
  • Performance: Fast for small to medium files, but will crash or swap if the file is huge.

2. Bounded Memory via a Queue

Code Example

from collections import deque
def tail_queue(num_lines: int, file_path: str = "./file.log") -> None:
buffer = deque(maxlen=num_lines)
with open(file_path, "r", encoding="utf-8", errors="replace") as f:
for line in f:
buffer.append(line.rstrip("\n"))
for line in buffer:
print(line)

Nuances

  • Memory bound: Uses at most n lines of memory, regardless of file size.
  • Streaming: Reads line by line; useful for very large files or when you don’t know file size in advance.
  • Overhead: Python’s generator overhead per line; slightly slower than block reads but generally fine.

3. Block‑Buffered Reverse Read

When num_lines is small relative to file size, reading backwards in chunks can be far more efficient.

import os
def tail(num_lines: int, file_path: str = "./file.log") -> None:
with open(file_path, "rb") as f:
f.seek(0, os.SEEK_END)
file_size = f.tell()
buffer = bytearray()
lines_found = 0
block_size = 1024
while file_size > 0 and lines_found <= num_lines:
read_size = min(block_size, file_size)
f.seek(file_size - read_size)
data = f.read(read_size)
buffer[:0] = data
lines_found = buffer.count(b"\n")
file_size -= read_size
lines = buffer.splitlines()[-num_lines:]
for line in lines:
print(line.decode("utf-8"))

Nuances

  • Binary mode avoids issues with variable‑length encodings when seeking.
  • Block size: Choose based on typical line length; too small and you incur many seeks, too large and you waste memory.
  • Byte vs. text splitting: Counts b"\n" and splits on raw bytes, then decodes.
  • Edge cases: Handles files smaller than block_size, and files with fewer than num_lines lines gracefully.

Conclusion

  • Full‑file read is easiest to implement but not scalable.
  • Deque‑based offers O(n) memory bound and simple streaming.
  • Block‑buffered reverse read provides near‑constant memory overhead and is ideal for very large files.

Choose the approach that best fits your file sizes and memory constraints.