Add pause/resume tasks

2025-09-17 13:06:37 +03:00 · 2025-09-17 13:06:37 +03:00 · 6e90ab2673
commit 6e90ab2673
parent 2786f5ba72
11 changed files with 308 additions and 133 deletions
--- a/airflow/README.md
+++ b/airflow/README.md
@ -0,0 +1,108 @@
+# Airflow Cluster for YT-DLP Operations
+
+This directory contains the configuration and deployment files for an Apache Airflow cluster designed to manage distributed YouTube video downloading tasks using the `ytdlp-ops` service.
+
+## Overview
+
+The cluster consists of:
+- **Master Node:** Runs the Airflow webserver, scheduler, and Flower (Celery monitoring). It also hosts shared services like Redis (broker/backend) and MinIO (artifact storage).
+- **Worker Nodes:** Run Celery workers that execute download tasks. Each worker node also runs the `ytdlp-ops-service` (Thrift API server), Envoy proxy (load balancer for Thrift traffic), and Camoufox (remote browser instances for token generation).
+
+## Key Components
+
+### Airflow DAGs
+
+- `ytdlp_ops_dispatcher.py`: The "Sensor" part of a Sensor/Worker pattern. It monitors a Redis queue for URLs to process and triggers a `ytdlp_ops_worker_per_url` DAG run for each URL.
+- `ytdlp_ops_worker_per_url.py`: The "Worker" DAG. It processes a single URL passed via DAG run configuration. It implements worker affinity (all tasks for a URL run on the same machine) and handles account management (retrying with different accounts, banning failed accounts based on sliding window checks).
+
+### Configuration Files
+
+- `airflow.cfg`: Main Airflow configuration file.
+- `config/airflow_local_settings.py`: Contains the `task_instance_mutation_hook` which implements worker affinity by dynamically assigning tasks to queues based on the worker node's hostname.
+- `config/custom_task_hooks.py`: Contains the `task_instance_mutation_hook` (duplicated here, but `airflow_local_settings.py` is the active one).
+- `config/redis_default_conn.json.j2`: Jinja2 template for the Airflow Redis connection configuration.
+- `config/minio_default_conn.json.j2`: Jinja2 template for the Airflow MinIO connection configuration.
+
+### Docker & Compose
+
+- `Dockerfile`: Defines the Airflow image, including necessary dependencies like `yt-dlp`, `ffmpeg`, and Python packages.
+- `Dockerfile.caddy`: Defines a Caddy image used as a reverse proxy for serving Airflow static assets.
+- `configs/docker-compose-master.yaml.j2`: Jinja2 template for the Docker Compose configuration on the Airflow master node.
+- `configs/docker-compose-dl.yaml.j2`: Jinja2 template for the Docker Compose configuration on the Airflow worker nodes.
+- `configs/docker-compose-ytdlp-ops.yaml.j2`: Jinja2 template for the Docker Compose configuration for the `ytdlp-ops` services (Thrift API, Envoy, Camoufox) on both master (management role) and worker nodes.
+- `configs/docker-compose.camoufox.yaml.j2`: Jinja2 template (auto-generated by `generate_envoy_config.py`) for the Camoufox browser service definitions.
+- `configs/docker-compose.config-generate.yaml`: Docker Compose file used to run the `generate_envoy_config.py` script in a container to create the final service configuration files.
+- `generate_envoy_config.py`: Script that generates `envoy.yaml`, `docker-compose.camoufox.yaml`, and `camoufox_endpoints.json` based on environment variables.
+- `configs/envoy.yaml.j2`: Jinja2 template (used by `generate_envoy_config.py`) for the Envoy proxy configuration.
+
+### Camoufox (Remote Browsers)
+
+- `camoufox/`: Directory containing the Camoufox browser setup.
+  - `Dockerfile`: Defines the Camoufox image.
+  - `requirements.txt`: Python dependencies for the Camoufox server.
+  - `camoufox_server.py`: The core server logic for managing remote browser instances.
+  - `start_camoufox.sh`: Wrapper script to start the Camoufox server with Xvfb and VNC.
+  - `*.xpi`: Browser extensions used by Camoufox.
+
+## Deployment Process
+
+Deployment is managed by Ansible playbooks located in the `ansible/` directory.
+
+1.  **Inventory Generation:** The `tools/generate-inventory.py` script reads `cluster.yml` and generates `ansible/inventory.ini`, `ansible/host_vars/`, and `ansible/group_vars/all/generated_vars.yml`.
+2.  **Full Deployment:** `ansible-playbook playbook-full.yml` is the main command.
+    - Installs prerequisites (Docker, pipx, Glances).
+    - Ensures the `airflow_proxynet` Docker network exists.
+    - Imports and runs `playbook-master.yml` for the master node.
+    - Imports and runs `playbook-worker.yml` for worker nodes.
+3.  **Master Deployment (`playbook-master.yml`):**
+    - Sets system configurations (timezone, NTP, swap, sysctl).
+    - Calls `airflow-master` role:
+        - Syncs files to `/srv/airflow_master/`.
+        - Templates `configs/docker-compose-master.yaml`.
+        - Builds the Airflow image.
+        - Extracts static assets and builds the Caddy image.
+        - Starts services using `docker compose`.
+    - Calls `ytdlp-master` role:
+        - Syncs `generate_envoy_config.py` and templates.
+        - Creates `.env` file.
+        - Runs `generate_envoy_config.py` to create service configs.
+        - Creates a dummy `docker-compose.camoufox.yaml`.
+        - Starts `ytdlp-ops` management services using `docker compose`.
+4.  **Worker Deployment (`playbook-worker.yml`):**
+    - Sets system configurations (timezone, NTP, swap, sysctl, system limits).
+    - Calls `ytdlp-worker` role:
+        - Syncs files (including `camoufox/` directory) to `/srv/airflow_dl_worker/`.
+        - Creates `.env` file.
+        - Runs `generate_envoy_config.py` to create service configs (including `docker-compose.camoufox.yaml`).
+        - Builds the Camoufox image.
+        - Starts `ytdlp-ops` worker services using `docker compose`.
+    - Calls `airflow-worker` role:
+        - Syncs files to `/srv/airflow_dl_worker/`.
+        - Templates `configs/docker-compose-dl.yaml`.
+        - Builds the Airflow image.
+        - Starts services using `docker compose`.
+    - Verifies Camoufox services are running.
+
+## Service Interaction Flow (Worker Node)
+
+1.  **Airflow Worker:** Pulls tasks from the Redis queue.
+2.  **`ytdlp_ops_worker_per_url` DAG:** Executes tasks on the local worker node.
+3.  **Thrift Client (in DAG task):** Connects to `localhost:9080` (Envoy's public port).
+4.  **Envoy Proxy:** Listens on `:9080`, load balances Thrift requests across internal ports (`9090`, `9091`, `9092` - based on `YTDLP_WORKERS`) of the local `ytdlp-ops-service`.
+5.  **`ytdlp-ops-service`:** Receives the Thrift request.
+6.  **Token Generation:** If needed, `ytdlp-ops-service` connects to a local Camoufox instance via WebSocket (using `camoufox_endpoints.json` for the address) to generate tokens.
+7.  **Camoufox:** Runs a headless Firefox browser, potentially using a SOCKS5 proxy, to interact with YouTube and generate the required tokens.
+8.  **Download:** The DAG task uses the token (via `info.json`) and potentially the SOCKS5 proxy to run `yt-dlp` for the actual download.
+
+## Environment Variables
+
+Key environment variables used in `.env` files (generated by Ansible templates) control service behavior:
+- `HOSTNAME`: The Ansible inventory hostname.
+- `SERVICE_ROLE`: `management` (master) or `worker`.
+- `SERVER_IDENTITY`: Unique identifier for the `ytdlp-ops-service` instance.
+- `YTDLP_WORKERS`: Number of internal Thrift worker endpoints and Camoufox browser instances.
+- `CAMOUFOX_PROXIES`: Comma-separated list of SOCKS5 proxy URLs for Camoufox.
+- `MASTER_HOST_IP`: IP address of the Airflow master node (for connecting back to Redis).
+- Various passwords and ports.
+
+This setup allows for a scalable and robust system for managing YouTube downloads with account rotation and proxy usage.
--- a/airflow/dags/README.ru.md
+++ b/airflow/dags/README.ru.md
@ -86,3 +86,78 @@
 -   **`handle_generic_failure`**: Если любая из основных задач завершается с неисправимой ошибкой, эта задача записывает подробную информацию об ошибке в хэш `_fail` в Redis.
 -   **`decide_what_to_do_next`**: Задача-развилка, которая запускается после успеха или неудачи. Она решает, продолжать ли цикл.
 -   **`trigger_self_run`**: Задача, которая фактически запускает следующий экземпляр DAG, создавая непрерывный цикл.
+
+## Управление Воркерами (Пауза/Возобновление)
+
+Система предоставляет механизм для "охлаждения" или временной приостановки работы воркера. Это полезно для проведения технического обслуживания, безопасного выключения машины или уменьшения нагрузки на кластер без генерации ошибок.
+
+### Принцип работы
+
+Механизм основан на файле-блокировке (`lock file`), который создается на узле воркера с помощью Ansible.
+
+1.  **Пауза:** Администратор запускает Ansible-плейбук, который создает пустой файл `AIRFLOW.PREVENT_URL_PULL.lock` в рабочей директории воркера (`/srv/airflow_dl_worker`).
+2.  **Проверка:** DAG `ytdlp_ops_dispatcher`, который отвечает за распределение URL-адресов, перед тем как взять новую задачу из Redis, проверяет наличие этого файла.
+3.  **Пропуск задачи:** Если файл существует, `dispatcher` логирует, что воркер на паузе, и завершает свою задачу со статусом `skipped`. Это предотвращает получение новых URL-адресов этим воркером, но не влияет на уже запущенные задачи.
+4.  **Возобновление:** Администратор запускает другой Ansible-плейбук, который переименовывает файл блокировки (добавляя временную метку), тем самым "разблокируя" воркер. При следующем запуске `dispatcher` не найдет файл и продолжит работу в обычном режиме.
+
+### Команды для управления
+
+Для управления состоянием воркера используются специальные Ansible-плейбуки. Команды следует выполнять из корневой директории проекта.
+
+**Поставить воркер на паузу:**
+(Замените `"hostname"` на имя хоста из вашего inventory-файла)
+```bash
+ansible-playbook ansible/playbooks/pause_worker.yml --limit "hostname"
+```
+
+**Возобновить работу воркера:**
+```bash
+ansible-playbook ansible/playbooks/resume_worker.yml --limit "hostname"
+```
+
+## Механизм привязки воркеров к конкретным машинам (Worker Pinning / Affinity)
+
+Для обеспечения того, чтобы все задачи, связанные с обработкой одного конкретного URL, выполнялись на одной и той же машине (воркере), система использует комбинацию из трех компонентов: Оркестратора, Диспетчера и специального хука Airflow.
+
+### 1. `ytdlp_ops_orchestrator` (Оркестратор)
+
+-   **Роль:** Инициирует процесс обработки.
+-   **Действие:** При запуске он создает несколько DAG-запусков `ytdlp_ops_dispatcher`. Каждый такой запуск предназначен для обработки одного URL.
+-   **Передача параметров:** Оркестратор передает свои параметры конфигурации (например, `account_pool`, `redis_conn_id`, `service_ip`) каждому запуску диспетчера.
+
+### 2. `ytdlp_ops_dispatcher` (Диспетчер)
+
+-   **Роль:** Основной механизм обеспечения привязки.
+-   **Действие:**
+    1.  **Получает URL:** Извлекает один URL из очереди Redis (`_inbox`).
+    2.  **Определяет воркер:** Использует `socket.gethostname()` для определения имени текущей машины (воркера), на которой он выполняется.
+    3.  **Формирует имя очереди:** Создает уникальное имя очереди для этого воркера, например, `queue-dl-dl-worker-1`.
+    4.  **Запускает Worker DAG:** Инициирует запуск DAG `ytdlp_ops_worker_per_url`, передавая ему:
+        *   Извлеченный `url_to_process`.
+        *   Сформированное имя очереди `worker_queue` через параметр `conf`.
+        *   Все остальные параметры, полученные от оркестратора.
+-   **Ключевой момент:** Именно на этом этапе устанавливается связь между конкретным URL и конкретным воркером, на котором началась обработка этого URL.
+
+### 3. `task_instance_mutation_hook` (Хук изменения задач)
+
+-   **Расположение:** `airflow/config/custom_task_hooks.py`
+-   **Роль:** Является механизмом, который обеспечивает выполнение *всех* задач Worker DAG на нужной машине.
+-   **Как это работает:**
+    1.  **Регистрация:** Хук регистрируется в конфигурации Airflow и вызывается перед запуском *каждой* задачи.
+    2.  **Проверка DAG ID:** Хук проверяет, принадлежит ли задача (`TaskInstance`) DAG `ytdlp_ops_worker_per_url`.
+    3.  **Извлечение `conf`:** Если да, он безопасно извлекает `conf` из `DagRun`, связанного с этой задачей.
+    4.  **Изменение очереди:**
+        *   Если в `conf` найден ключ `worker_queue` (что будет true для всех запусков, инициированных диспетчером), хук *переопределяет* стандартную очередь задачи на это значение.
+        *   Это означает, что Airflow планировщик поставит эту задачу именно в ту очередь, которая прослушивается нужным воркером.
+    5.  **Резервный вариант:** Если `worker_queue` не найден (например, DAG запущен вручную), задача возвращается в стандартную очередь `queue-dl`.
+-   **Ключевой момент:** Этот хук гарантирует, что *все последующие задачи* в рамках одного запуска `ytdlp_ops_worker_per_url` (например, `get_token`, `download_and_probe`, `mark_url_as_success`) будут выполнены на том же воркере, который изначально получил URL в диспетчере.
+
+### Резюме
+
+Комбинация `Оркестратор -> Диспетчер -> Хук` эффективно реализует привязку задач к воркерам:
+
+1.  **Оркестратор** запускает процесс.
+2.  **Диспетчер** связывает конкретный URL с конкретным воркером, определяя его имя хоста и передавая его как `worker_queue` в Worker DAG.
+3.  **Хук** гарантирует, что все задачи Worker DAG выполняются в очереди, соответствующей этому воркеру.
+
+Это позволяет системе использовать локальные ресурсы воркера (например, кэш, временные файлы) эффективно и предсказуемо для обработки каждого отдельного URL.
--- a/airflow/dags/ytdlp_ops_dispatcher.py
+++ b/airflow/dags/ytdlp_ops_dispatcher.py
@ -6,6 +6,7 @@ It pulls a URL from Redis and triggers a worker with a pinned queue.

 from __future__ import annotations
 import logging
+import os
 import socket
 from datetime import timedelta

@ -29,6 +30,16 @@ def dispatch_url_to_worker(**context):
    Pulls one URL from Redis, determines the current worker's dedicated queue,
    and triggers the main worker DAG to process the URL on that specific queue.
    """
+    # --- Check for worker pause lock file ---
+    # This path must be consistent with the Ansible playbook.
+    lock_file_path = '/srv/airflow_dl_worker/AIRFLOW.PREVENT_URL_PULL.lock'
+    hostname = socket.gethostname()
+    if os.path.exists(lock_file_path):
+        logger.info(f"Worker '{hostname}' is paused. Lock file found at '{lock_file_path}'. Skipping URL pull.")
+        raise AirflowSkipException(f"Worker '{hostname}' is paused.")
+    else:
+        logger.info(f"Worker '{hostname}' is active (no lock file found at '{lock_file_path}'). Proceeding to pull URL.")
+
    params = context['params']
    redis_conn_id = params['redis_conn_id']
    queue_name = params['queue_name']
--- a/ansible/MIGRATION.md
+++ b/ansible/MIGRATION.md
@ -1,9 +0,0 @@
-# Migration Notes
-
-This document tracks the process of migrating the Ansible deployment.
-
-## Guiding Principles
-
- No changes to business logic or core functionality are permitted during this phase.
- The focus is solely on resolving file path issues, dependency errors, and structural inconsistencies resulting from the migration of a subset of files.
- All changes should be aimed at making the existing playbooks runnable in the new environment.
--- a/ansible/README-yt.md
+++ b/ansible/README-yt.md
@ -1,120 +0,0 @@
-# Ansible-driven YT-DLP / Airflow Cluster – Quick-Start & Cheat-Sheet
-
-> One playbook = one command to **deploy**, **update**, **restart**, or **re-configure** the entire cluster.
-
---
-
-## 0. Prerequisites (run once on the **tower** server)
-
-```
-
---
-
-## 1. Ansible Vault Setup (run once on your **local machine**)
-
-This project uses Ansible Vault to encrypt sensitive data like passwords and API keys. To run the playbooks, you need to provide the vault password. The recommended way is to create a file named `.vault_pass` in the root of the project directory.
-
-1.  **Create the Vault Password File:**
-    From the project's root directory (e.g., `/opt/yt-ops-services`), create the file. The file should contain only your vault password on a single line.
-
-    ```bash
-    # Replace 'your_secret_password_here' with your actual vault password
-    echo "your_secret_password_here" > .vault_pass
-    ```
-
-2.  **Secure the File:**
-    It's good practice to restrict permissions on this file so only you can read it.
-
-    ```bash
-    chmod 600 .vault_pass
-    ```
-
-The `ansible.cfg` file is configured to automatically look for this `.vault_pass` file in the project root.
-
---
-
-## 1.5. Cluster & Inventory Management
-
-The Ansible inventory (`ansible/inventory.ini`), host-specific variables (`ansible/host_vars/`), and the master `docker-compose.yaml` are dynamically generated from a central cluster definition file (e.g., `cluster.yml`).
-
-**Whenever you add, remove, or change the IP of a node in your `cluster.yml`, you must re-run the generator script.**
-
-1.  **Install Script Dependencies (run once):**
-    The generator script requires `PyYAML` and `Jinja2`. Install them using pip:
-    ```bash
-    pip3 install PyYAML Jinja2
-    ```
-
-2.  **Edit Your Cluster Definition:**
-    Modify your `cluster.yml` file (located in the project root) to define your master and worker nodes.
-
-3.  **Run the Generator Script:**
-    From the project's root directory, run the following command to update all generated files:
-
-    ```bash
-    # Make sure the script is executable first: chmod +x tools/generate-inventory.py
-    ./tools/generate-inventory.py cluster.yml
-    ```
-
-This ensures that Ansible has the correct host information and that the master node's Docker Compose configuration includes the correct `extra_hosts` for log fetching from workers.
-
---
-
-## 2. Setup and Basic Usage
-
-### Running Ansible Commands
-
-**IMPORTANT:** All `ansible-playbook` commands should be run from within the `ansible/` directory. This allows Ansible to automatically find the `ansible.cfg` and `inventory.ini` files.
-
-```bash
-cd ansible
-ansible-playbook <playbook_name>.yml
-```
-
-The `ansible.cfg` file is configured to automatically use the `.vault_pass` file located in the project root (one level above `ansible/`). This means you **do not** need to manually specify `--vault-password-file ../.vault_pass` in your commands. Ensure your `.vault_pass` file is located in the project root.
-
-If you run `ansible-playbook` from the project root instead of the `ansible/` directory, you will see warnings about the inventory not being parsed, because Ansible does not automatically find `ansible/ansible.cfg`.
-
---
-
-## 3. Deployment Scenarios
-
-### Full Cluster Deployment
-
-To deploy or update the entire cluster (master and all workers), run the main playbook. This will build/pull images and restart all services.
-
-```bash
-# Run from inside the ansible/ directory
-ansible-playbook playbook-full.yml
-```
-
-### Targeted & Fast Deployments
-
-For faster development cycles, you can deploy changes to specific parts of the cluster without rebuilding or re-pulling Docker images.
-
-#### Updating Only the Master Node (Fast Deploy)
-
-To sync configuration, code, and restart services on the master node *without* rebuilding the Airflow image or pulling the `ytdlp-ops-server` image, use the `fast_deploy` flag with the master playbook. This is ideal for pushing changes to DAGs, Python code, or config files.
-
-```bash
-# Run from inside the ansible/ directory
-ansible-playbook playbook-master.yml --extra-vars "fast_deploy=true"
-```
-
-#### Updating Only a Specific Worker Node (Fast Deploy)
-
-Similarly, you can update a single worker node. Replace `dl001` with the hostname of the worker you want to target from your `inventory.ini`.
-
-```bash
-# Run from inside the ansible/ directory
-ansible-playbook playbook-worker.yml --limit dl001 --extra-vars "fast_deploy=true"
-```
-
-#### Updating Only DAGs and Configs
-
-If you have only changed DAGs or configuration files and don't need to restart any services, you can run a much faster playbook that only syncs the `dags/` and `config/` directories.
-
-```bash
-# Run from inside the ansible/ directory
-ansible-playbook playbook-dags.yml
-```
--- a/ansible/README.md
+++ b/ansible/README.md
@ -0,0 +1,61 @@
+# Ansible for YT-DLP Cluster
+
+This directory contains the Ansible playbooks, roles, and configurations for deploying and managing the YT-DLP Airflow cluster.
+
+## Full Deployment
+
+### Deploy entire cluster with proxies (recommended for new setups):
+
+```bash
+ansible-playbook playbook-full-with-proxies.yml
+```
+
+### Deploy cluster without proxies:
+
+```bash
+ansible-playbook playbook-full.yml
+```
+
+## Targeted Deployments
+
+### Deploy only to master node:
+
+```bash
+ansible-playbook playbook-master.yml --limit="af-test"
+```
+
+### Deploy only to worker nodes:
+
+```bash
+ansible-playbook playbook-worker.yml
+```
+
+## DAGs Only Deployment
+
+To update only DAG files and configurations:
+
+```bash
+ansible-playbook playbook-dags.yml
+```
+
+## Managing Worker State (Pause/Resume)
+
+The system allows for gracefully pausing a worker to prevent it from picking up new tasks. This is useful for maintenance or decommissioning a node. The mechanism uses a lock file (`AIRFLOW.PREVENT_URL_PULL.lock`) on the worker host.
+
+### To Pause a Worker
+
+This command creates the lock file, causing the `ytdlp_ops_dispatcher` DAG to skip task execution on this host.
+
+```bash
+# Replace "worker-hostname" with the target host from your inventory
+ansible-playbook playbooks/pause_worker.yml --limit "worker-hostname"
+```
+
+### To Resume a Worker
+
+This command removes the lock file, allowing the worker to resume picking up tasks.
+
+```bash
+# Replace "worker-hostname" with the target host from your inventory
+ansible-playbook playbooks/resume_worker.yml --limit "worker-hostname"
+```
--- a/ansible/ansible.cfg
+++ b/ansible/ansible.cfg
@ -1,10 +1,9 @@
 [defaults]
-inventory = inventory.ini
-remote_user = alex_p
-roles_path = ./roles
+inventory = ansible/inventory.ini
+roles_path = ansible/roles
 retry_files_enabled = False
 host_key_checking = False
-vault_password_file = ../.vault_pass
+vault_password_file = .vault_pass

 [inventory]
 enable_plugins = ini
--- a/ansible/playbook-dl.yml
+++ b/ansible/playbook-dl.yml
--- a/ansible/playbooks/pause_worker.yml
+++ b/ansible/playbooks/pause_worker.yml
@ -0,0 +1,14 @@
+---
+- hosts: airflow_workers
+  gather_facts: no
+  vars_files:
+    - ../group_vars/all.yml
+  tasks:
+    - name: "Create lock file to pause worker"
+      file:
+        path: "{{ airflow_worker_dir }}/AIRFLOW.PREVENT_URL_PULL.lock"
+        state: touch
+        owner: "{{ ssh_user }}"
+        group: "{{ deploy_group }}"
+        mode: '0644'
+      become: yes
--- a/ansible/playbooks/resume_worker.yml
+++ b/ansible/playbooks/resume_worker.yml
@ -0,0 +1,13 @@
+---
+- hosts: airflow_workers
+  gather_facts: yes
+  vars_files:
+    - ../group_vars/all.yml
+  tasks:
+    - name: "Archive lock file to resume worker"
+      command: >
+        mv {{ airflow_worker_dir }}/AIRFLOW.PREVENT_URL_PULL.lock
+        {{ airflow_worker_dir }}/AIRFLOW.PREVENT_URL_PULL.lock.removed-{{ ansible_date_time.year }}{{ '%02d' | format(ansible_date_time.month) }}{{ '%02d' | format(ansible_date_time.day) }}-{{ '%02d' | format(ansible_date_time.hour) }}{{ '%02d' | format(ansible_date_time.minute) }}
+      args:
+        removes: "{{ airflow_worker_dir }}/AIRFLOW.PREVENT_URL_PULL.lock"
+      become: yes
--- a/ansible/roles/ytdlp-worker/tasks/main.yml
+++ b/ansible/roles/ytdlp-worker/tasks/main.yml
@ -1,4 +1,27 @@
 ---
+- name: Ensure worker is not paused on deploy (remove .lock file)
+  file:
+    path: "{{ airflow_worker_dir }}/AIRFLOW.PREVENT_URL_PULL.lock"
+    state: absent
+  become: yes
+
+- name: Clean up old renamed lock files (older than 7 days)
+  ansible.builtin.find:
+    paths: "{{ airflow_worker_dir }}"
+    patterns: "AIRFLOW.PREVENT_URL_PULL.lock.removed-*"
+    age: "7d"
+    use_regex: false
+  register: old_lock_files
+  become: yes
+
+- name: Remove found old lock files
+  ansible.builtin.file:
+    path: "{{ item.path }}"
+    state: absent
+  loop: "{{ old_lock_files.files }}"
+  become: yes
+  when: old_lock_files.files | length > 0
+
 - name: Check if YT-DLP worker deployment directory exists
  stat:
    path: "{{ airflow_worker_dir }}"