Implementing a Tail-like Function

Here’s a concise roadmap of three approaches to implement a tail‑like function in Python.

Full‑file read: Read the entire file into memory and slice the last n lines; simple but uses O(file_size) memory.
Fixed‑size queue: Stream lines one by one, retaining only the last n in a collections.deque(maxlen=n), so memory is bounded by O(n).
Backward block buffer: Seek from the file’s end in binary mode, read fixed‑size blocks backward, count newlines until enough are found, then decode and print - uses minimal memory even for very large files.

1. Naïve Full‑File Read

Code Example

1
def tail_simple(num_lines: int, file_path: str = "./file.log") -> None:
2
    with open(file_path, "r", encoding="utf-8", errors="replace") as f:
3

4
        lines = f.readlines()
5

6
        for line in lines[-num_lines:]:
7
            print(line.rstrip("\n"))

Nuances

Memory usage: f.readlines() loads every line into a list, so memory scales with file size.
Performance: Fast for small to medium files, but will crash or swap if the file is huge.

2. Bounded Memory via a Queue

Code Example

1
from collections import deque
2

3
def tail_queue(num_lines: int, file_path: str = "./file.log") -> None:
4

5
    buffer = deque(maxlen=num_lines)
6
    with open(file_path, "r", encoding="utf-8", errors="replace") as f:
7
        for line in f:
8

9
            buffer.append(line.rstrip("\n"))
10
    for line in buffer:
11
        print(line)

Nuances

Memory bound: Uses at most n lines of memory, regardless of file size.
Streaming: Reads line by line; useful for very large files or when you don’t know file size in advance.
Overhead: Python’s generator overhead per line; slightly slower than block reads but generally fine.

3. Block‑Buffered Reverse Read

When num_lines is small relative to file size, reading backwards in chunks can be far more efficient.

1
import os
2

3
def tail(num_lines: int, file_path: str = "./file.log") -> None:
4

5

6
    with open(file_path, "rb") as f:
7
        f.seek(0, os.SEEK_END)
8
        file_size = f.tell()
9
        buffer = bytearray()
10
        lines_found = 0
11
        block_size = 1024
12

13

14
        while file_size > 0 and lines_found <= num_lines:
15
            read_size = min(block_size, file_size)
16
            f.seek(file_size - read_size)
17
            data = f.read(read_size)
18
            buffer[:0] = data
19
            lines_found = buffer.count(b"\n")
20
            file_size -= read_size
21

22

23
        lines = buffer.splitlines()[-num_lines:]
24
        for line in lines:
25
            print(line.decode("utf-8"))

Nuances

Binary mode avoids issues with variable‑length encodings when seeking.
Block size: Choose based on typical line length; too small and you incur many seeks, too large and you waste memory.
Byte vs. text splitting: Counts b"\n" and splits on raw bytes, then decodes.
Edge cases: Handles files smaller than block_size, and files with fewer than num_lines lines gracefully.

Conclusion

Full‑file read is easiest to implement but not scalable.
Deque‑based offers O(n) memory bound and simple streaming.
Block‑buffered reverse read provides near‑constant memory overhead and is ideal for very large files.

Choose the approach that best fits your file sizes and memory constraints.