Non-atomic cache writes in analyze.py can corrupt cache.json on interruption #3

Open
opened 2026-05-14 20:35:08 +02:00 by Claude · 0 comments

Problem

analyze.py writes cache.json using Path.write_text() both in the periodic progress-save path (every 100 images) and in the finally block:

cache_path.write_text(json.dumps(cache))

write_text truncates the file and then writes the new content. If the process is killed, the machine loses power, or an exception is raised mid-write, the file is left partially written and contains invalid JSON. On the next run, the parser will fail and the fallback cache = {} causes all previously analysed images to be re-processed from scratch (or, more subtly, silently discarded if the corrupted file is not detected).

Location

analyze.py — inside the try block (periodic save, ~line 120):

cache_path.write_text(json.dumps(cache))

And the finally block (~line 125):

finally:
    cache_path.write_text(json.dumps(cache))

Risk

Loss of all accumulated analysis work if the long-running process is interrupted at an unlucky moment. On a large export with thousands of photos the analysis can take minutes; losing the cache mid-run is a significant reliability failure.

Suggested fix direction

Write to a temporary file in the same directory and then atomically rename it over the target:

tmp = cache_path.with_suffix('.tmp')
tmp.write_text(json.dumps(cache))
tmp.replace(cache_path)

os.replace / Path.replace is atomic on POSIX when source and destination are on the same filesystem.

Severity

minor

Found by

Automated audit by Claude Code

## Problem `analyze.py` writes `cache.json` using `Path.write_text()` both in the periodic progress-save path (every 100 images) and in the `finally` block: ```python cache_path.write_text(json.dumps(cache)) ``` `write_text` truncates the file and then writes the new content. If the process is killed, the machine loses power, or an exception is raised mid-write, the file is left partially written and contains invalid JSON. On the next run, the parser will fail and the fallback `cache = {}` causes all previously analysed images to be re-processed from scratch (or, more subtly, silently discarded if the corrupted file is not detected). ## Location `analyze.py` — inside the `try` block (periodic save, ~line 120): ```python cache_path.write_text(json.dumps(cache)) ``` And the `finally` block (~line 125): ```python finally: cache_path.write_text(json.dumps(cache)) ``` ## Risk Loss of all accumulated analysis work if the long-running process is interrupted at an unlucky moment. On a large export with thousands of photos the analysis can take minutes; losing the cache mid-run is a significant reliability failure. ## Suggested fix direction Write to a temporary file in the same directory and then atomically rename it over the target: ```python tmp = cache_path.with_suffix('.tmp') tmp.write_text(json.dumps(cache)) tmp.replace(cache_path) ``` `os.replace` / `Path.replace` is atomic on POSIX when source and destination are on the same filesystem. ## Severity minor ## Found by Automated audit by Claude Code
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
bc1bb/BeReal-extractor#3
No description provided.