Provide updates on ytdlp dags

2025-08-06 18:02:44 +03:00 · 2025-08-06 18:02:44 +03:00 · 274bef5370
commit 274bef5370
parent 61906a57ef
9 changed files with 1617 additions and 862 deletions
--- a/dags/README.ru.md
+++ b/dags/README.ru.md
@ -1,46 +1,78 @@
 # Архитектура и описание YTDLP Airflow DAGs
-Этот документ описывает архитектуру и назначение DAG'ов, используемых для скачивания видео с YouTube. Система построена по паттерну "Сенсор/Воркер" для обеспечения непрерывной и параллельной обработки.
+Этот документ описывает архитектуру и назначение DAG'ов, используемых для скачивания видео с YouTube. Система построена на модели непрерывного, самоподдерживающегося цикла для параллельной и отказоустойчивой обработки.
 ## Основной цикл обработки
-### `ytdlp_sensor_redis_queue` (Сенсор)
+Обработка выполняется двумя основными DAG'ами, которые работают в паре: оркестратор и воркер.
- **Назначение:** Забирает URL на скачивание из очереди Redis и запускает воркеры для их обработки.
+### `ytdlp_ops_orchestrator` (Система "зажигания")
 - **Принцип работы (Запуск по триггеру):**
    - **По триггеру:** Когда воркер `ytdlp_worker_per_url` успешно завершает работу, он немедленно запускает сенсор. Это обеспечивает непрерывную обработку без задержек. Запуск по расписанию отключен, чтобы избежать повторного запуска задач для заблокированных аккаунтов.
    - **Логика:** Извлекает из Redis (`_inbox` лист) пачку URL. Если очередь пуста, DAG успешно завершается до следующего запуска по триггеру.
-### `ytdlp_worker_per_url` (Воркер)
+- **Назначение:** Этот DAG действует как "система зажигания" для запуска обработки. Он запускается вручную для старта указанного количества параллельных циклов-воркеров.
 - **Назначение:** Обрабатывает один URL, скачивает видео и продолжает цикл.
 - **Принцип работы:**
-    - Получает один URL от сенсора.
+    - Он **не** обрабатывает URL-адреса самостоятельно.
-    - Обращается к сервису `ytdlp-ops-auth` для получения `info.json` и `socks5` прокси.
+    - Его единственная задача — запустить сконфигурированное количество DAG'ов `ytdlp_ops_worker_per_url`.
-    - Скачивает видео, используя полученные данные. (TODO: заменить вызов `yt-dlp` как команды на вызов библиотеки).
+    - Он передает всю необходимую конфигурацию (пул аккаунтов, подключение к Redis и т.д.) воркерам.
-    - В зависимости от статуса (успех/неуспех), помещает результат в соответствующий хэш Redis (`_result` или `_fail`).
+
-    - В случае успеха, повторно запускает сенсор `ytdlp_sensor_redis_queue` для продолжения цикла обработки. В случае ошибки цикл останавливается для ручной диагностики.
+### `ytdlp_ops_worker_per_url` (Самоподдерживающийся воркер)
 - **Назначение:** Этот DAG обрабатывает один URL и спроектирован для работы в непрерывном цикле.
 - **Принцип работы:**
    1. **Запуск:** Начальный запуск инициируется `ytdlp_ops_orchestrator`.
    2. **Получение задачи:** Воркер извлекает один URL из очереди `_inbox` в Redis. Если очередь пуста, выполнение воркера завершается, и его "линия" обработки останавливается.
    3. **Обработка:** Он взаимодействует с сервисом `ytdlp-ops-server` для получения `info.json` и прокси, после чего скачивает видео.
    4. **Продолжение или остановка:**
        - **В случае успеха:** Он запускает новый экземпляр самого себя, создавая непрерывный цикл для обработки следующего URL.
        - **В случае сбоя:** Цикл прерывается (если `stop_on_failure` установлено в `True`), останавливая эту "линию" обработки. Это предотвращает остановку всей системы из-за одного проблемного URL или аккаунта.
 ## Управляющие DAG'и
-Эти DAG'и предназначены для ручного управления очередями и не участвуют в автоматическом цикле.
+### `ytdlp_mgmt_proxy_account`
- **`ytdlp_mgmt_queue_add_and_verify`**: Добавление URL в очередь задач (`_inbox`) и последующая проверка статуса этой очереди.
+- **Назначение:** Это основной инструмент для мониторинга и управления состоянием ресурсов, используемых `ytdlp-ops-server`.
- **`ytdlp_mgmt_queues_check_status`**: Просмотр состояния и содержимого всех ключевых очередей (`_inbox`, `_progress`, `_result`, `_fail`). Помогает отслеживать процесс обработки.
+- **Функциональность:**
- **`ytdlp_mgmt_queue_clear`**: Очистка (полное удаление) указанной очереди Redis. **Использовать с осторожностью**, так как операция необратима.
+    - **Просмотр статусов:** Позволяет увидеть текущий статус всех прокси и аккаунтов (например, `ACTIVE`, `BANNED`, `RESTING`).
    - **Управление прокси:** Позволяет вручную банить, разбанивать или сбрасывать статус прокси.
    - **Управление аккаунтами:** Позволяет вручную банить или разбанивать аккаунты.
 ## Стратегия управления ресурсами (Прокси и Аккаунты)
 Система использует интеллектуальную стратегию для управления жизненным циклом и состоянием аккаунтов и прокси, чтобы максимизировать процент успеха и минимизировать блокировки.
 -   **Жизненный цикл аккаунта ("Cooldown"):**
    -   Чтобы предотвратить "выгорание", аккаунты автоматически переходят в состояние "отдыха" (`RESTING`) после периода интенсивного использования.
    -   По истечении периода отдыха они автоматически возвращаются в `ACTIVE` и снова становятся доступными для воркеров.
 -   **Умная стратегия банов:**
    -   **Сначала бан аккаунта:** При возникновении серьезной ошибки (например, `BOT_DETECTED`) система наказывает **только аккаунт**, который вызвал сбой. Прокси при этом продолжает работать.
    -   **Бан прокси по "скользящему окну":** Прокси банится автоматически, только если он демонстрирует **систематические сбои с РАЗНЫМИ аккаунтами** за короткий промежуток времени. Это является надежным индикатором того, что проблема именно в прокси.
 -   **Мониторинг:**
    -   DAG `ytdlp_mgmt_proxy_account` является основным инструментом для мониторинга. Он показывает текущий статус всех ресурсов, включая время, оставшееся до активации забаненных или отдыхающих аккаунтов.
    -   Граф выполнения DAG `ytdlp_ops_worker_per_url` теперь явно показывает шаги, такие как `assign_account`, `get_token`, `ban_account`, `retry_get_token`, что делает процесс отладки более наглядным.
 ## Внешние сервисы
-### `ytdlp-ops-auth` (Thrift Service)
+### `ytdlp-ops-server` (Thrift Service)
 - **Назначение:** Внешний сервис, который предоставляет аутентификационные данные (токены, cookies, proxy) для скачивания видео.
- **Взаимодействие:** Worker DAG (`ytdlp_worker_per_url`) обращается к этому сервису перед началом загрузки для получения необходимых данных для `yt-dlp`.
+- **Взаимодействие:** Worker DAG (`ytdlp_ops_worker_per_url`) обращается к этому сервису перед началом загрузки для получения необходимых данных для `yt-dlp`.
-## TODO (Планы на доработку)
+## Логика работы Worker DAG (`ytdlp_ops_worker_per_url`)
- **Реализовать механизм "Circuit Breaker" (автоматического выключателя):**
+Этот DAG является "рабочей лошадкой" системы. Он спроектирован как самоподдерживающийся цикл для обработки одного URL за запуск.
-  - **Проблема:** Если воркер падает с ошибкой (например, из-за бана аккаунта), сенсор, запускаемый по расписанию, продолжает создавать новые задачи для этого же аккаунта, усугубляя проблему.
+
-  - **Решение:**
+### Задачи и их назначение:
-    1. **Воркер (`ytdlp_worker_per_url`):** При сбое задачи, воркер должен устанавливать в Redis флаг временной блокировки для своего `account_id` (например, на 5-10 минут).
+
-    2. **Сенсор (`ytdlp_sensor_redis_queue`):** Перед проверкой очереди, сенсор должен проверять наличие флага блокировки для своего `account_id`. Если аккаунт заблокирован, сенсор должен пропустить выполнение, предотвращая запуск новых воркеров для проблемного аккаунта.
+-   **`pull_url_from_redis`**: Извлекает один URL из очереди `_inbox` в Redis. Если очередь пуста, DAG завершается со статусом `skipped`, останавливая эту "линию" обработки.
-  - **Результат:** Это предотвратит многократные повторные запросы к заблокированному аккаунту и даст системе время на восстановление.
+-   **`assign_account`**: Выбирает аккаунт для выполнения задачи. Он будет повторно использовать тот же аккаунт, который был успешно использован в предыдущем запуске в своей "линии" (привязка аккаунта). Если это первый запуск, он выбирает случайный аккаунт.
 -   **`get_token`**: Основная задача. Она обращается к `ytdlp-ops-server` для получения `info.json`.
 -   **`handle_bannable_error_branch`**: Если `get_token` завершается с ошибкой, требующей бана, эта задача-развилка решает, что делать дальше, в зависимости от политики `on_bannable_failure`.
 -   **`ban_account_and_prepare_for_retry`**: Если политика разрешает повтор, эта задача банит сбойный аккаунт и выбирает новый для повторной попытки.
 -   **`retry_get_token`**: Выполняет вторую попытку получить токен с новым аккаунтом.
 -   **`ban_second_account_and_proxy`**: Если и вторая попытка неудачна, эта задача банит второй аккаунт и использованный прокси.
 -   **`download_and_probe`**: Если `get_token` (или `retry_get_token`) завершилась успешно, эта задача использует `yt-dlp` для скачивания медиа и `ffmpeg` для проверки целостности скачанного файла.
 -   **`mark_url_as_success`**: Если `download_and_probe` завершилась успешно, эта задача записывает результат в хэш `_result` в Redis.
 -   **`handle_generic_failure`**: Если любая из основных задач завершается с неисправимой ошибкой, эта задача записывает подробную информацию об ошибке в хэш `_fail` в Redis.
 -   **`decide_what_to_do_next`**: Задача-развилка, которая запускается после успеха или неудачи. Она решает, продолжать ли цикл.
 -   **`trigger_self_run`**: Задача, которая фактически запускает следующий экземпляр DAG, создавая непрерывный цикл.
--- a/dags/ytdlp_mgmt_proxy.py
+++ b/dags/ytdlp_mgmt_proxy.py
@ -1,197 +0,0 @@
 """
 DAG to manage the state of proxies used by the ytdlp-ops-server.
 """
 from __future__ import annotations
 import logging
 from datetime import datetime
 from airflow.models.dag import DAG
 from airflow.models.param import Param
 from airflow.operators.python import PythonOperator
 from airflow.utils.dates import days_ago
 # Configure logging
 logger = logging.getLogger(__name__)
 # Import and apply Thrift exceptions patch for Airflow compatibility
 try:
    from thrift_exceptions_patch import patch_thrift_exceptions
    patch_thrift_exceptions()
    logger.info("Applied Thrift exceptions patch for Airflow compatibility.")
 except ImportError:
    logger.warning("Could not import thrift_exceptions_patch. Compatibility may be affected.")
 except Exception as e:
    logger.error(f"Error applying Thrift exceptions patch: {e}")
 # Thrift imports
 try:
    from thrift.transport import TSocket, TTransport
    from thrift.protocol import TBinaryProtocol
    from pangramia.yt.tokens_ops import YTTokenOpService
    from pangramia.yt.exceptions.ttypes import PBServiceException, PBUserException
 except ImportError as e:
    logger.critical(f"Could not import Thrift modules: {e}. Ensure ytdlp-ops-auth package is installed.")
    # Fail DAG parsing if thrift modules are not available
    raise
 def format_timestamp(ts_str: str) -> str:
    """Formats a string timestamp into a human-readable date string."""
    if not ts_str:
        return ""
    try:
        ts_float = float(ts_str)
        if ts_float <= 0:
            return ""
        # Use datetime from the imported 'from datetime import datetime'
        dt_obj = datetime.fromtimestamp(ts_float)
        return dt_obj.strftime('%Y-%m-%d %H:%M:%S')
    except (ValueError, TypeError):
        return ts_str  # Return original string if conversion fails
 def get_thrift_client(host: str, port: int):
    """Helper function to create and connect a Thrift client."""
    transport = TSocket.TSocket(host, port)
    transport = TTransport.TFramedTransport(transport)
    protocol = TBinaryProtocol.TBinaryProtocol(transport)
    client = YTTokenOpService.Client(protocol)
    transport.open()
    logger.info(f"Connected to Thrift server at {host}:{port}")
    return client, transport
 def manage_proxies_callable(**context):
    """Main callable to interact with the proxy management endpoints."""
    params = context["params"]
    action = params["action"]
    host = params["host"]
    port = params["port"]
    server_identity = params.get("server_identity")
    proxy_url = params.get("proxy_url")
    if not server_identity and action in ["ban", "unban", "reset_all"]:
        raise ValueError(f"A 'server_identity' is required for the '{action}' action.")
    client, transport = None, None
    try:
        client, transport = get_thrift_client(host, port)
        if action == "list":
            logger.info(f"Listing proxy statuses for server: {server_identity or 'ALL'}")
            statuses = client.getProxyStatus(server_identity)
            if not statuses:
                logger.info("No proxy statuses found.")
                print("No proxy statuses found.")
            else:
                from tabulate import tabulate
                status_list = [
                    {
                        "Server": s.serverIdentity,
                        "Proxy URL": s.proxyUrl,
                        "Status": s.status,
                        "Success": s.successCount,
                        "Failures": s.failureCount,
                        "Last Success": format_timestamp(s.lastSuccessTimestamp),
                        "Last Failure": format_timestamp(s.lastFailureTimestamp),
                    }
                    for s in statuses
                ]
                print("\n--- Proxy Statuses ---")
                print(tabulate(status_list, headers="keys", tablefmt="grid"))
                print("----------------------\n")
        elif action == "ban":
            if not proxy_url:
                raise ValueError("A 'proxy_url' is required to ban a proxy.")
            logger.info(f"Banning proxy '{proxy_url}' for server '{server_identity}'...")
            success = client.banProxy(proxy_url, server_identity)
            if success:
                logger.info("Successfully banned proxy.")
                print(f"Successfully banned proxy '{proxy_url}' for server '{server_identity}'.")
            else:
                logger.error("Failed to ban proxy.")
                raise Exception("Server returned failure for banProxy operation.")
        elif action == "unban":
            if not proxy_url:
                raise ValueError("A 'proxy_url' is required to unban a proxy.")
            logger.info(f"Unbanning proxy '{proxy_url}' for server '{server_identity}'...")
            success = client.unbanProxy(proxy_url, server_identity)
            if success:
                logger.info("Successfully unbanned proxy.")
                print(f"Successfully unbanned proxy '{proxy_url}' for server '{server_identity}'.")
            else:
                logger.error("Failed to unban proxy.")
                raise Exception("Server returned failure for unbanProxy operation.")
        elif action == "reset_all":
            logger.info(f"Resetting all proxy statuses for server '{server_identity}'...")
            success = client.resetAllProxyStatuses(server_identity)
            if success:
                logger.info("Successfully reset all proxy statuses.")
                print(f"Successfully reset all proxy statuses for server '{server_identity}'.")
            else:
                logger.error("Failed to reset all proxy statuses.")
                raise Exception("Server returned failure for resetAllProxyStatuses operation.")
        else:
            raise ValueError(f"Invalid action: {action}")
    except (PBServiceException, PBUserException) as e:
        logger.error(f"Thrift error performing action '{action}': {e.message}", exc_info=True)
        raise
    except Exception as e:
        logger.error(f"Error performing action '{action}': {e}", exc_info=True)
        raise
    finally:
        if transport and transport.isOpen():
            transport.close()
            logger.info("Thrift connection closed.")
 with DAG(
    dag_id="ytdlp_mgmt_proxy",
    start_date=days_ago(1),
    schedule=None,
    catchup=False,
    tags=["ytdlp", "utility", "proxy"],
    doc_md="""
    ### YT-DLP Proxy Manager DAG
    This DAG provides tools to manage the state of proxies used by the `ytdlp-ops-server`.
    You can view statuses, and manually ban, unban, or reset proxies for a specific server instance.
    **Parameters:**
    - `host`: The hostname or IP of the `ytdlp-ops-server` Thrift service.
    - `port`: The port of the Thrift service.
    - `action`: The operation to perform.
        - `list`: List proxy statuses. Provide a `server_identity` to query a specific server, or leave it blank to query the server instance you are connected to.
        - `ban`: Ban a specific proxy. Requires `server_identity` and `proxy_url`.
        - `unban`: Un-ban a specific proxy. Requires `server_identity` and `proxy_url`.
        - `reset_all`: Reset all proxies for a server to `ACTIVE`. Requires `server_identity`.
    - `server_identity`: The unique identifier for the server instance (e.g., `ytdlp-ops-airflow-service`).
    - `proxy_url`: The full URL of the proxy to act upon (e.g., `socks5://host:port`).
    """,
    params={
        "host": Param("89.253.221.173", type="string", description="The hostname of the ytdlp-ops-server service."),
        "port": Param(9090, type="integer", description="The port of the ytdlp-ops-server service."),
        "action": Param(
            "list",
            type="string",
            enum=["list", "ban", "unban", "reset_all"],
            description="The management action to perform.",
        ),
        "server_identity": Param(
            "ytdlp-ops-airflow-service",
            type=["null", "string"],
            description="The identity of the server to manage. Leave blank to query the connected server instance.",
        ),
        "proxy_url": Param(
            None,
            type=["null", "string"],
            description="The proxy URL to ban/unban (e.g., 'socks5://host:port').",
        ),
    },
 ) as dag:
    proxy_management_task = PythonOperator(
        task_id="proxy_management_task",
        python_callable=manage_proxies_callable,
    )
--- a/dags/ytdlp_mgmt_proxy_account.py
+++ b/dags/ytdlp_mgmt_proxy_account.py
@ -0,0 +1,405 @@
 """
 DAG to manage the state of proxies and accounts used by the ytdlp-ops-server.
 """
 from __future__ import annotations
 import logging
 from datetime import datetime
 import socket
 from airflow.exceptions import AirflowException
 from airflow.models.dag import DAG
 from airflow.models.param import Param
 from airflow.operators.python import PythonOperator
 from airflow.utils.dates import days_ago
 from airflow.models.variable import Variable
 from airflow.providers.redis.hooks.redis import RedisHook
 # Configure logging
 logger = logging.getLogger(__name__)
 # Import and apply Thrift exceptions patch for Airflow compatibility
 try:
    from thrift_exceptions_patch import patch_thrift_exceptions
    patch_thrift_exceptions()
    logger.info("Applied Thrift exceptions patch for Airflow compatibility.")
 except ImportError:
    logger.warning("Could not import thrift_exceptions_patch. Compatibility may be affected.")
 except Exception as e:
    logger.error(f"Error applying Thrift exceptions patch: {e}")
 # Thrift imports
 try:
    from thrift.transport import TSocket, TTransport
    from thrift.protocol import TBinaryProtocol
    from pangramia.yt.tokens_ops import YTTokenOpService
    from pangramia.yt.exceptions.ttypes import PBServiceException, PBUserException
 except ImportError as e:
    logger.critical(f"Could not import Thrift modules: {e}. Ensure ytdlp-ops-auth package is installed.")
    # Fail DAG parsing if thrift modules are not available
    raise
 DEFAULT_YT_AUTH_SERVICE_IP = Variable.get("YT_AUTH_SERVICE_IP", default_var="16.162.82.212")
 DEFAULT_YT_AUTH_SERVICE_PORT = Variable.get("YT_AUTH_SERVICE_PORT", default_var=9080)
 DEFAULT_REDIS_CONN_ID = "redis_default"
 # Helper function to connect to Redis, similar to other DAGs
 def _get_redis_client(redis_conn_id: str):
    """Gets a Redis client from an Airflow connection."""
    try:
        # Use the imported RedisHook
        redis_hook = RedisHook(redis_conn_id=redis_conn_id)
        # get_conn returns a redis.Redis client
        return redis_hook.get_conn()
    except Exception as e:
        logger.error(f"Failed to connect to Redis using connection '{redis_conn_id}': {e}")
        # Use the imported AirflowException
        raise AirflowException(f"Redis connection failed: {e}")
 def format_timestamp(ts_str: str) -> str:
    """Formats a string timestamp into a human-readable date string."""
    if not ts_str:
        return ""
    try:
        ts_float = float(ts_str)
        if ts_float <= 0:
            return ""
        # Use datetime from the imported 'from datetime import datetime'
        dt_obj = datetime.fromtimestamp(ts_float)
        return dt_obj.strftime('%Y-%m-%d %H:%M:%S')
    except (ValueError, TypeError):
        return ts_str  # Return original string if conversion fails
 def get_thrift_client(host: str, port: int):
    """Helper function to create and connect a Thrift client."""
    transport = TSocket.TSocket(host, port)
    transport.setTimeout(30 * 1000)  # 30s timeout
    transport = TTransport.TFramedTransport(transport)
    protocol = TBinaryProtocol.TBinaryProtocol(transport)
    client = YTTokenOpService.Client(protocol)
    transport.open()
    logger.info(f"Connected to Thrift server at {host}:{port}")
    return client, transport
 def _list_proxy_statuses(client, server_identity):
    """Lists the status of proxies."""
    logger.info(f"Listing proxy statuses for server: {server_identity or 'ALL'}")
    statuses = client.getProxyStatus(server_identity)
    if not statuses:
        logger.info("No proxy statuses found.")
        print("No proxy statuses found.")
        return
    from tabulate import tabulate
    status_list = []
    # This is forward-compatible: it checks for new attributes before using them.
    has_extended_info = hasattr(statuses[0], 'recentAccounts') or hasattr(statuses[0], 'recentMachines')
    headers = ["Server", "Proxy URL", "Status", "Success", "Failures", "Last Success", "Last Failure"]
    if has_extended_info:
        headers.extend(["Recent Accounts", "Recent Machines"])
    for s in statuses:
        status_item = {
            "Server": s.serverIdentity,
            "Proxy URL": s.proxyUrl,
            "Status": s.status,
            "Success": s.successCount,
            "Failures": s.failureCount,
            "Last Success": format_timestamp(s.lastSuccessTimestamp),
            "Last Failure": format_timestamp(s.lastFailureTimestamp),
        }
        if has_extended_info:
            recent_accounts = getattr(s, 'recentAccounts', [])
            recent_machines = getattr(s, 'recentMachines', [])
            status_item["Recent Accounts"] = "\n".join(recent_accounts) if recent_accounts else "N/A"
            status_item["Recent Machines"] = "\n".join(recent_machines) if recent_machines else "N/A"
        status_list.append(status_item)
    print("\n--- Proxy Statuses ---")
    # The f-string with a newline ensures the table starts on a new line in the logs.
    print(f"\n{tabulate(status_list, headers='keys', tablefmt='grid')}")
    print("----------------------\n")
    if not has_extended_info:
        logger.warning("Server does not seem to support 'recentAccounts' or 'recentMachines' fields yet.")
        print("NOTE: To see Recent Accounts/Machines, the server's `getProxyStatus` method must be updated to return these fields.")
 def _list_account_statuses(client, account_id):
    """Lists the status of accounts."""
    logger.info(f"Listing account statuses for account: {account_id or 'ALL'}")
    try:
        # The thrift method takes accountId (specific) or accountPrefix.
        # If account_id is provided, we use it. If not, we get all by leaving both params as None.
        statuses = client.getAccountStatus(accountId=account_id, accountPrefix=None)
        if not statuses:
            logger.info("No account statuses found.")
            print("\n--- Account Statuses ---\nNo account statuses found.\n------------------------\n")
            return
        from tabulate import tabulate
        status_list = []
        for s in statuses:
            # Determine the last activity timestamp for sorting
            last_success = float(s.lastSuccessTimestamp) if s.lastSuccessTimestamp else 0
            last_failure = float(s.lastFailureTimestamp) if s.lastFailureTimestamp else 0
            last_activity = max(last_success, last_failure)
            status_item = {
                "Account ID": s.accountId,
                "Status": s.status,
                "Success": s.successCount,
                "Failures": s.failureCount,
                "Last Success": format_timestamp(s.lastSuccessTimestamp),
                "Last Failure": format_timestamp(s.lastFailureTimestamp),
                "Last Proxy": s.lastUsedProxy or "N/A",
                "Last Machine": s.lastUsedMachine or "N/A",
                "_last_activity": last_activity,  # Add a temporary key for sorting
            }
            status_list.append(status_item)
        # Sort the list by the last activity timestamp in descending order
        status_list.sort(key=lambda item: item.get('_last_activity', 0), reverse=True)
        # Remove the temporary sort key before printing
        for item in status_list:
            del item['_last_activity']
        print("\n--- Account Statuses ---")
        # The f-string with a newline ensures the table starts on a new line in the logs.
        print(f"\n{tabulate(status_list, headers='keys', tablefmt='grid')}")
        print("------------------------\n")
    except (PBServiceException, PBUserException) as e:
        logger.error(f"Failed to get account statuses: {e.message}", exc_info=True)
        print(f"\nERROR: Could not retrieve account statuses. Server returned: {e.message}\n")
    except Exception as e:
        logger.error(f"An unexpected error occurred while getting account statuses: {e}", exc_info=True)
        print(f"\nERROR: An unexpected error occurred: {e}\n")
 def manage_system_callable(**context):
    """Main callable to interact with the system management endpoints."""
    params = context["params"]
    entity = params["entity"]
    action = params["action"]
    host = params["host"]
    port = params["port"]
    server_identity = params.get("server_identity")
    proxy_url = params.get("proxy_url")
    account_id = params.get("account_id")
    if action in ["ban", "unban", "reset_all"] and entity == "proxy" and not server_identity:
        raise ValueError(f"A 'server_identity' is required for proxy action '{action}'.")
    if action in ["ban", "unban"] and entity == "account" and not account_id:
        raise ValueError(f"An 'account_id' is required for account action '{action}'.")
    # Handle direct Redis action separately to avoid creating an unnecessary Thrift connection.
    if entity == "account" and action == "remove_all":
        confirm = params.get("confirm_remove_all_accounts", False)
        if not confirm:
            message = "FATAL: 'remove_all' action requires 'confirm_remove_all_accounts' to be set to True. No accounts were removed."
            logger.error(message)
            print(f"\nERROR: {message}\n")
            raise ValueError(message)
        redis_conn_id = params["redis_conn_id"]
        account_prefix = params.get("account_id")  # Repurpose account_id param as an optional prefix
        redis_client = _get_redis_client(redis_conn_id)
        pattern = f"account_status:{account_prefix}*" if account_prefix else "account_status:*"
        logger.warning(f"Searching for account status keys in Redis with pattern: '{pattern}'")
        # scan_iter returns bytes, so we don't need to decode for deletion
        keys_to_delete = [key for key in redis_client.scan_iter(pattern)]
        if not keys_to_delete:
            logger.info(f"No account keys found matching pattern '{pattern}'. Nothing to do.")
            print(f"\nNo accounts found matching pattern '{pattern}'.\n")
            return
        logger.warning(f"Found {len(keys_to_delete)} account keys to delete. This is a destructive operation!")
        print(f"\nWARNING: Found {len(keys_to_delete)} accounts to remove from Redis.")
        # Decode for printing
        for key in keys_to_delete[:10]:
            print(f"  - {key.decode('utf-8')}")
        if len(keys_to_delete) > 10:
            print(f"  ... and {len(keys_to_delete) - 10} more.")
        deleted_count = redis_client.delete(*keys_to_delete)
        logger.info(f"Successfully deleted {deleted_count} account keys from Redis.")
        print(f"\nSuccessfully removed {deleted_count} accounts from Redis.\n")
        return  # End execution for this action
    client, transport = None, None
    try:
        client, transport = get_thrift_client(host, port)
        if entity == "proxy":
            if action == "list":
                _list_proxy_statuses(client, server_identity)
            elif action == "ban":
                if not proxy_url: raise ValueError("A 'proxy_url' is required.")
                logger.info(f"Banning proxy '{proxy_url}' for server '{server_identity}'...")
                client.banProxy(proxy_url, server_identity)
                print(f"Successfully sent request to ban proxy '{proxy_url}'.")
            elif action == "unban":
                if not proxy_url: raise ValueError("A 'proxy_url' is required.")
                logger.info(f"Unbanning proxy '{proxy_url}' for server '{server_identity}'...")
                client.unbanProxy(proxy_url, server_identity)
                print(f"Successfully sent request to unban proxy '{proxy_url}'.")
            elif action == "reset_all":
                logger.info(f"Resetting all proxy statuses for server '{server_identity}'...")
                client.resetAllProxyStatuses(server_identity)
                print(f"Successfully sent request to reset all proxy statuses for '{server_identity}'.")
            else:
                raise ValueError(f"Invalid action '{action}' for entity 'proxy'.")
        elif entity == "account":
            if action == "list":
                _list_account_statuses(client, account_id)
            elif action == "ban":
                if not account_id: raise ValueError("An 'account_id' is required.")
                reason = f"Manual ban from Airflow mgmt DAG by {socket.gethostname()}"
                logger.info(f"Banning account '{account_id}'...")
                client.banAccount(accountId=account_id, reason=reason)
                print(f"Successfully sent request to ban account '{account_id}'.")
            elif action == "unban":
                if not account_id: raise ValueError("An 'account_id' is required.")
                reason = f"Manual un-ban from Airflow mgmt DAG by {socket.gethostname()}"
                logger.info(f"Unbanning account '{account_id}'...")
                client.unbanAccount(accountId=account_id, reason=reason)
                print(f"Successfully sent request to unban account '{account_id}'.")
            elif action == "reset_all":
                account_prefix = account_id  # Repurpose account_id param as an optional prefix
                logger.info(f"Resetting all account statuses to ACTIVE (prefix: '{account_prefix or 'ALL'}')...")
                all_statuses = client.getAccountStatus(accountId=None, accountPrefix=account_prefix)
                if not all_statuses:
                    print(f"No accounts found with prefix '{account_prefix or 'ALL'}' to reset.")
                    return
                accounts_to_reset = [s.accountId for s in all_statuses]
                logger.info(f"Found {len(accounts_to_reset)} accounts to reset.")
                print(f"Found {len(accounts_to_reset)} accounts. Sending unban request for each...")
                reset_count = 0
                fail_count = 0
                for acc_id in accounts_to_reset:
                    try:
                        reason = f"Manual reset from Airflow mgmt DAG by {socket.gethostname()}"
                        client.unbanAccount(accountId=acc_id, reason=reason)
                        logger.info(f"  - Sent reset (unban) for '{acc_id}'.")
                        reset_count += 1
                    except Exception as e:
                        logger.error(f"  - Failed to reset account '{acc_id}': {e}")
                        fail_count += 1
                print(f"\nSuccessfully sent reset requests for {reset_count} accounts.")
                if fail_count > 0:
                    print(f"Failed to send reset requests for {fail_count} accounts. See logs for details.")
                # Optionally, list statuses again to confirm
                print("\n--- Listing statuses after reset ---")
                _list_account_statuses(client, account_prefix)
            else:
                raise ValueError(f"Invalid action '{action}' for entity 'account'.")
        elif entity == "all":
            if action == "list":
                print("\nListing all entities...")
                _list_proxy_statuses(client, server_identity)
                _list_account_statuses(client, account_id)
            else:
                raise ValueError(f"Action '{action}' is not supported for entity 'all'. Only 'list' is supported.")
    except (PBServiceException, PBUserException) as e:
        logger.error(f"Thrift error performing action '{action}': {e.message}", exc_info=True)
        raise
    except NotImplementedError as e:
        logger.error(f"Feature not implemented: {e}", exc_info=True)
        raise
    except Exception as e:
        logger.error(f"Error performing action '{action}': {e}", exc_info=True)
        raise
    finally:
        if transport and transport.isOpen():
            transport.close()
            logger.info("Thrift connection closed.")
 with DAG(
    dag_id="ytdlp_mgmt_proxy_account",
    start_date=days_ago(1),
    schedule=None,
    catchup=False,
    tags=["ytdlp", "utility", "proxy", "account", "management"],
    doc_md="""
    ### YT-DLP Proxy and Account Manager DAG
    This DAG provides tools to manage the state of **proxies and accounts** used by the `ytdlp-ops-server`.
    **Parameters:**
    - `host`, `port`: Connection details for the `ytdlp-ops-server` Thrift service.
    - `entity`: The type of resource to manage (`proxy`, `account`, or `all`).
    - `action`: The operation to perform.
        - `list`: View statuses. For `entity: all`, lists both proxies and accounts.
        - `ban`: Ban a specific proxy or account.
        - `unban`: Un-ban a specific proxy or account.
        - `reset_all`: Reset all proxies for a server (or all accounts) to `ACTIVE`.
        - `remove_all`: **Deletes all account status keys** from Redis for a given prefix. This is a destructive action.
    - `server_identity`: Required for most proxy actions.
    - `proxy_url`: Required for banning/unbanning a specific proxy.
    - `account_id`: Required for managing a specific account. For `action: reset_all` or `remove_all` on `entity: account`, this can be used as an optional prefix to filter which accounts to act on.
    - `confirm_remove_all_accounts`: **Required for `remove_all` action.** Must be set to `True` to confirm deletion.
    """,
    params={
        "host": Param(DEFAULT_YT_AUTH_SERVICE_IP, type="string", description="The hostname of the ytdlp-ops-server service. Default is from Airflow variable YT_AUTH_SERVICE_IP or hardcoded."),
        "port": Param(DEFAULT_YT_AUTH_SERVICE_PORT, type="integer", description="The port of the ytdlp-ops-server service (Envoy load balancer). Default is from Airflow variable YT_AUTH_SERVICE_PORT or hardcoded."),
        "entity": Param(
            "all",
            type="string",
            enum=["proxy", "account", "all"],
            description="The type of entity to manage. Use 'all' with action 'list' to see both.",
        ),
        "action": Param(
            "list",
            type="string",
            enum=["list", "ban", "unban", "reset_all", "remove_all"],
            description="The management action to perform. `reset_all` for proxies/accounts. `remove_all` for accounts only.",
        ),
        "server_identity": Param(
            "ytdlp-ops-airflow-service",
            type=["null", "string"],
            description="The identity of the server instance (for proxy management).",
        ),
        "proxy_url": Param(
            None,
            type=["null", "string"],
            description="The proxy URL to act upon (e.g., 'socks5://host:port').",
        ),
        "account_id": Param(
            None,
            type=["null", "string"],
            description="The account ID to act upon. For `reset_all` or `remove_all` on accounts, this can be an optional prefix.",
        ),
        "confirm_remove_all_accounts": Param(
            False,
            type="boolean",
            title="[remove_all] Confirm Deletion",
            description="Must be set to True to execute the 'remove_all' action for accounts. This is a destructive operation.",
        ),
        "redis_conn_id": Param(
            DEFAULT_REDIS_CONN_ID,
            type="string",
            title="Redis Connection ID",
            description="The Airflow connection ID for the Redis server (used for 'remove_all').",
        ),
    },
 ) as dag:
    system_management_task = PythonOperator(
        task_id="system_management_task",
        python_callable=manage_system_callable,
    )
--- a/dags/ytdlp_mgmt_queues.py
+++ b/dags/ytdlp_mgmt_queues.py
@ -164,7 +164,7 @@ def clear_queue_callable(**context):
    redis_conn_id = params['redis_conn_id']
    queue_to_clear = params['queue_to_clear']
    dump_queues = params['dump_queues']
-    # Get the rendered dump_dir from the templates_dict passed to the operator
+    # The value from templates_dict is already rendered by Airflow.
    dump_dir = context['templates_dict']['dump_dir']
    dump_patterns = params['dump_patterns'].split(',') if params.get('dump_patterns') else []
@ -191,34 +191,43 @@ def clear_queue_callable(**context):
 def list_contents_callable(**context):
-    """Lists the contents of the specified Redis key (list or hash)."""
+    """Lists the contents of the specified Redis key(s) (list or hash)."""
    params = context['params']
    redis_conn_id = params['redis_conn_id']
-    queue_to_list = params['queue_to_list']
+    queues_to_list_str = params.get('queue_to_list')
    max_items = params.get('max_items', 10)
-    if not queue_to_list:
+    if not queues_to_list_str:
        raise ValueError("Parameter 'queue_to_list' cannot be empty.")
-    logger.info(f"Attempting to list contents of Redis key '{queue_to_list}' (max: {max_items}) using connection '{redis_conn_id}'.")
+    queues_to_list = [q.strip() for q in queues_to_list_str.split(',') if q.strip()]
-    try:
+    
    if not queues_to_list:
        logger.info("No valid queue names provided in 'queue_to_list'. Nothing to do.")
        return
    logger.info(f"Attempting to list contents for {len(queues_to_list)} Redis key(s): {queues_to_list}")
    redis_client = _get_redis_client(redis_conn_id)
    for queue_to_list in queues_to_list:
        # Add a newline for better separation in logs
        logger.info(f"\n--- Listing contents of Redis key '{queue_to_list}' (max: {max_items}) ---")
        try:
            key_type_bytes = redis_client.type(queue_to_list)
            key_type = key_type_bytes.decode('utf-8') # Decode type
            if key_type == 'list':
                list_length = redis_client.llen(queue_to_list)
            # Get the last N items, which are the most recently added with rpush
                items_to_fetch = min(max_items, list_length)
            # lrange with negative indices gets items from the end of the list.
            # -N to -1 gets the last N items.
                contents_bytes = redis_client.lrange(queue_to_list, -items_to_fetch, -1)
                contents = [item.decode('utf-8') for item in contents_bytes]
            # Reverse the list so the absolute most recent item is printed first
                contents.reverse()
-            logger.info(f"--- Contents of Redis List '{queue_to_list}' (showing most recent {len(contents)} of {list_length}) ---")
+                logger.info(f"--- Contents of Redis List '{queue_to_list}' ---")
                logger.info(f"Total items in list: {list_length}")
                if contents:
                    logger.info(f"Showing most recent {len(contents)} item(s):")
                    for i, item in enumerate(contents):
                # The index here is just for display, 0 is the most recent
                        logger.info(f"  [recent_{i}]: {item}")
                    if list_length > len(contents):
                        logger.info(f"  ... ({list_length - len(contents)} older items not shown)")
@ -226,26 +235,25 @@ def list_contents_callable(**context):
            elif key_type == 'hash':
                hash_size = redis_client.hlen(queue_to_list)
-            # HGETALL can be risky for large hashes. Consider HSCAN for production.
+                if hash_size > max_items * 2:
            # For manual inspection, HGETALL is often acceptable.
            if hash_size > max_items * 2: # Heuristic: avoid huge HGETALL
                     logger.warning(f"Hash '{queue_to_list}' has {hash_size} fields, which is large. Listing might be slow or incomplete. Consider using redis-cli HSCAN.")
            # hgetall returns dict of bytes keys and bytes values, decode them
                contents_bytes = redis_client.hgetall(queue_to_list)
                contents = {k.decode('utf-8'): v.decode('utf-8') for k, v in contents_bytes.items()}
-            logger.info(f"--- Contents of Redis Hash '{queue_to_list}' ({len(contents)} fields) ---")
+                logger.info(f"--- Contents of Redis Hash '{queue_to_list}' ---")
                logger.info(f"Total fields in hash: {hash_size}")
                if contents:
                    logger.info(f"Showing up to {max_items} item(s):")
                    item_count = 0
-            for key, value in contents.items(): # key and value are now strings
+                    for key, value in contents.items():
                        if item_count >= max_items:
                            logger.info(f"  ... (stopped listing after {max_items} items of {hash_size})")
                            break
                # Attempt to pretty-print if value is JSON
                        try:
                            parsed_value = json.loads(value)
                            pretty_value = json.dumps(parsed_value, indent=2)
                            logger.info(f"  '{key}':\n{pretty_value}")
                        except json.JSONDecodeError:
-                    logger.info(f"  '{key}': {value}") # Print as string if not JSON
+                            logger.info(f"  '{key}': {value}")
                        item_count += 1
                logger.info(f"--- End of Hash Contents ---")
@ -256,7 +264,7 @@ def list_contents_callable(**context):
        except Exception as e:
            logger.error(f"Failed to list contents of Redis key '{queue_to_list}': {e}", exc_info=True)
-        raise AirflowException(f"Failed to list Redis key contents: {e}")
+            # Continue to the next key in the list instead of failing the whole task
 def check_status_callable(**context):
@ -292,6 +300,63 @@ def check_status_callable(**context):
        raise AirflowException(f"Failed to check queue status: {e}")
 def requeue_failed_callable(**context):
    """
    Copies all URLs from the fail hash to the inbox list and optionally clears the fail hash.
    """
    params = context['params']
    redis_conn_id = params['redis_conn_id']
    queue_name = params['queue_name_for_requeue']
    clear_fail_queue = params['clear_fail_queue_after_requeue']
    fail_queue_name = f"{queue_name}_fail"
    inbox_queue_name = f"{queue_name}_inbox"
    logger.info(f"Requeuing failed URLs from '{fail_queue_name}' to '{inbox_queue_name}'.")
    print(f"Requeuing failed URLs from '{fail_queue_name}' to '{inbox_queue_name}'.")
    redis_client = _get_redis_client(redis_conn_id)
    try:
        # The fail queue is a hash. The keys are the URLs.
        failed_urls_bytes = redis_client.hkeys(fail_queue_name)
        if not failed_urls_bytes:
            logger.info(f"Fail queue '{fail_queue_name}' is empty. Nothing to requeue.")
            print(f"Fail queue '{fail_queue_name}' is empty. Nothing to requeue.")
            return
        failed_urls = [url.decode('utf-8') for url in failed_urls_bytes]
        logger.info(f"Found {len(failed_urls)} URLs to requeue.")
        print(f"Found {len(failed_urls)} URLs to requeue:")
        for url in failed_urls:
            print(f"  - {url}")
        # Add URLs to the inbox list
        if failed_urls:
            with redis_client.pipeline() as pipe:
                pipe.rpush(inbox_queue_name, *failed_urls)
                if clear_fail_queue:
                    pipe.delete(fail_queue_name)
                pipe.execute()
        final_list_length = redis_client.llen(inbox_queue_name)
        success_message = (
            f"Successfully requeued {len(failed_urls)} URLs to '{inbox_queue_name}'. "
            f"The list now contains {final_list_length} items."
        )
        logger.info(success_message)
        print(f"\n{success_message}")
        if clear_fail_queue:
            logger.info(f"Successfully cleared fail queue '{fail_queue_name}'.")
        else:
            logger.info(f"Fail queue '{fail_queue_name}' was not cleared as per configuration.")
    except Exception as e:
        logger.error(f"Failed to requeue failed URLs: {e}", exc_info=True)
        raise AirflowException(f"Failed to requeue failed URLs: {e}")
 def add_videos_to_queue_callable(**context):
    """
    Parses video inputs, normalizes them to URLs, and adds them to a Redis queue.
@ -381,13 +446,14 @@ with DAG(
    - `add_videos`: Add one or more YouTube videos to a queue.
    - `clear_queue`: Dump and/or delete a specific Redis key.
    - `list_contents`: View the contents of a Redis key (list or hash).
-    - `check_status`: (Placeholder) Check the overall status of the queues.
+    - `check_status`: Check the overall status of the queues.
    - `requeue_failed`: Copy all URLs from the `_fail` hash to the `_inbox` list and clear the `_fail` hash.
    """,
    params={
        "action": Param(
            "add_videos",
            type="string",
-            enum=["add_videos", "clear_queue", "list_contents", "check_status"],
+            enum=["add_videos", "clear_queue", "list_contents", "check_status", "requeue_failed"],
            title="Action",
            description="The management action to perform.",
        ),
@ -437,10 +503,10 @@ with DAG(
        ),
        # --- Params for 'list_contents' ---
        "queue_to_list": Param(
-            'video_queue_inbox',
+            'video_queue_inbox,video_queue_fail',
            type="string",
-            title="[list_contents] Queue to List",
+            title="[list_contents] Queues to List",
-            description="Exact name of the Redis key to list.",
+            description="Comma-separated list of exact Redis key names to list.",
        ),
        "max_items": Param(
            10,
@ -455,6 +521,19 @@ with DAG(
            title="[check_status] Base Queue Name",
            description="Base name of the queues to check (e.g., 'video_queue').",
        ),
        # --- Params for 'requeue_failed' ---
        "queue_name_for_requeue": Param(
            DEFAULT_QUEUE_NAME,
            type="string",
            title="[requeue_failed] Base Queue Name",
            description="Base name of the queues to requeue from (e.g., 'video_queue' will use 'video_queue_fail').",
        ),
        "clear_fail_queue_after_requeue": Param(
            True,
            type="boolean",
            title="[requeue_failed] Clear Fail Queue",
            description="If True, deletes the `_fail` hash after requeueing items.",
        ),
        # --- Common Params ---
        "redis_conn_id": Param(
            DEFAULT_REDIS_CONN_ID,
@ -489,5 +568,16 @@ with DAG(
        python_callable=check_status_callable,
    )
-    # --- Placeholder Tasks ---
+    action_requeue_failed = PythonOperator(
-    branch_on_action >> [action_add_videos, action_clear_queue, action_list_contents, action_check_status]
+        task_id="action_requeue_failed",
        python_callable=requeue_failed_callable,
    )
    # --- Wire up tasks ---
    branch_on_action >> [
        action_add_videos,
        action_clear_queue,
        action_list_contents,
        action_check_status,
        action_requeue_failed,
    ]
--- a/dags/ytdlp_ops_orchestrator.py
+++ b/dags/ytdlp_ops_orchestrator.py
@ -0,0 +1,194 @@
 # -*- coding: utf-8 -*-
 # vim:fenc=utf-8
 #
 # Copyright © 2024 rl <rl@rlmbp>
 #
 # Distributed under terms of the MIT license.
 """
 DAG to orchestrate ytdlp_ops_worker_per_url DAG runs based on a defined policy.
 It fetches URLs from a Redis queue and launches workers in controlled bunches.
 """
 from airflow import DAG
 from airflow.exceptions import AirflowException, AirflowSkipException
 from airflow.operators.python import PythonOperator
 from airflow.models.param import Param
 from airflow.models.variable import Variable
 from airflow.utils.dates import days_ago
 from airflow.api.common.trigger_dag import trigger_dag
 from airflow.models.dagrun import DagRun
 from airflow.models.dag import DagModel
 from datetime import timedelta
 import logging
 import random
 import time
 # Import utility functions
 from utils.redis_utils import _get_redis_client
 # Import Thrift modules for proxy status check
 from pangramia.yt.tokens_ops import YTTokenOpService
 from thrift.protocol import TBinaryProtocol
 from thrift.transport import TSocket, TTransport
 # Configure logging
 logger = logging.getLogger(__name__)
 # Default settings
 DEFAULT_QUEUE_NAME = 'video_queue'
 DEFAULT_REDIS_CONN_ID = 'redis_default'
 DEFAULT_TOTAL_WORKERS = 3
 DEFAULT_WORKERS_PER_BUNCH = 1
 DEFAULT_WORKER_DELAY_S = 5
 DEFAULT_BUNCH_DELAY_S = 20
 DEFAULT_YT_AUTH_SERVICE_IP = Variable.get("YT_AUTH_SERVICE_IP", default_var="16.162.82.212")
 DEFAULT_YT_AUTH_SERVICE_PORT = Variable.get("YT_AUTH_SERVICE_PORT", default_var=9080)
 # --- Helper Functions ---
 # --- Main Orchestration Callable ---
 def orchestrate_workers_ignition_callable(**context):
    """
    Main orchestration logic. Triggers a specified number of worker DAGs
    to initiate self-sustaining processing loops.
    """
    params = context['params']
    logger.info("Starting worker ignition sequence.")
    worker_dag_id = 'ytdlp_ops_worker_per_url'
    dag_model = DagModel.get_dagmodel(worker_dag_id)
    if dag_model and dag_model.is_paused:
        raise AirflowException(f"Worker DAG '{worker_dag_id}' is paused. Cannot start worker loops.")
    total_workers = int(params['total_workers'])
    workers_per_bunch = int(params['workers_per_bunch'])
    worker_delay = int(params['delay_between_workers_s'])
    bunch_delay = int(params['delay_between_bunches_s'])
    # Create a list of worker numbers to trigger
    worker_indices = list(range(total_workers))
    bunches = [worker_indices[i:i + workers_per_bunch] for i in range(0, len(worker_indices), workers_per_bunch)]
    logger.info(f"Plan: Starting {total_workers} total workers in {len(bunches)} bunches.")
    dag_run_id = context['dag_run'].run_id
    total_triggered = 0
    # Pass all orchestrator params to the worker so it has the full context for its loop.
    conf_to_pass = {p: params[p] for p in params}
    # The worker pulls its own URL, so we don't pass one.
    if 'url' in conf_to_pass:
        del conf_to_pass['url']
    for i, bunch in enumerate(bunches):
        logger.info(f"--- Igniting Bunch {i+1}/{len(bunches)} (contains {len(bunch)} worker(s)) ---")
        for j, _ in enumerate(bunch):
            # Create a unique run_id for each worker loop starter
            run_id = f"ignited_{dag_run_id}_{total_triggered}"
            logger.info(f"Igniting worker {j+1}/{len(bunch)} in bunch {i+1} (loop {total_triggered + 1}/{total_workers}) (Run ID: {run_id})")
            logger.debug(f"Full conf for worker loop {run_id}: {conf_to_pass}")
            trigger_dag(
                dag_id=worker_dag_id,
                run_id=run_id,
                conf=conf_to_pass,
                replace_microseconds=False
            )
            total_triggered += 1
            # Delay between workers in a bunch
            if j < len(bunch) - 1:
                logger.info(f"Waiting {worker_delay}s before next worker in bunch...")
                time.sleep(worker_delay)
        # Delay between bunches
        if i < len(bunches) - 1:
            logger.info(f"--- Bunch {i+1} ignited. Waiting {bunch_delay}s before next bunch... ---")
            time.sleep(bunch_delay)
    logger.info(f"--- Ignition sequence complete. Total worker loops started: {total_triggered}. ---")
 # =============================================================================
 # DAG Definition
 # =============================================================================
 default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
    'start_date': days_ago(1),
 }
 with DAG(
    dag_id='ytdlp_ops_orchestrator',
    default_args=default_args,
    schedule_interval=None, # This DAG runs only when triggered.
    max_active_runs=1, # Only one ignition process should run at a time.
    catchup=False,
    description='Ignition system for ytdlp_ops_worker_per_url DAGs. Starts self-sustaining worker loops.',
    doc_md="""
    ### YT-DLP Worker Ignition System
    This DAG acts as an "ignition system" to start one or more self-sustaining worker loops.
    It does **not** process URLs itself. Its only job is to trigger a specified number of `ytdlp_ops_worker_per_url` DAGs.
    #### How it Works:
    1.  **Manual Trigger:** You manually trigger this DAG with parameters defining how many worker loops to start (`total_workers`), in what configuration (`workers_per_bunch`, delays).
    2.  **Ignition:** The orchestrator triggers the initial set of worker DAGs in a "fire-and-forget" manner, passing all its configuration parameters to them.
    3.  **Completion:** Once all initial workers have been triggered, the orchestrator's job is complete.
    The workers then take over, each running its own continuous processing loop.
    """,
    tags=['ytdlp', 'orchestrator', 'ignition'],
    params={
        # --- Ignition Control Parameters ---
        'total_workers': Param(DEFAULT_TOTAL_WORKERS, type="integer", description="Total number of worker loops to start."),
        'workers_per_bunch': Param(DEFAULT_WORKERS_PER_BUNCH, type="integer", description="Number of workers to start in each bunch."),
        'delay_between_workers_s': Param(DEFAULT_WORKER_DELAY_S, type="integer", description="Delay in seconds between starting each worker within a bunch."),
        'delay_between_bunches_s': Param(DEFAULT_BUNCH_DELAY_S, type="integer", description="Delay in seconds between starting each bunch."),
        # --- Worker Passthrough Parameters ---
        'on_bannable_failure': Param(
            'retry_with_new_account',
            type="string",
            enum=['stop_loop', 'retry_with_new_account'],
            title="[Worker Param] On Bannable Failure Policy",
            description="Policy for a worker when a bannable error occurs. "
                        "'stop_loop': Ban the account, mark URL as failed, and stop the worker's loop. "
                        "'retry_with_new_account': Ban the failed account, retry ONCE with a new account. If retry fails, ban the second account and proxy, then stop."
        ),
        'queue_name': Param(DEFAULT_QUEUE_NAME, type="string", description="[Worker Param] Base name for Redis queues."),
        'redis_conn_id': Param(DEFAULT_REDIS_CONN_ID, type="string", description="[Worker Param] Airflow Redis connection ID."),
        'clients': Param('mweb,ios,android', type="string", description="[Worker Param] Comma-separated list of clients for token generation."),
        'account_pool': Param('ytdlp_account', type="string", description="[Worker Param] Account pool prefix or comma-separated list."),
        'account_pool_size': Param(10, type=["integer", "null"], description="[Worker Param] If using a prefix for 'account_pool', this specifies the number of accounts to generate (e.g., 10 for 'prefix_01' through 'prefix_10'). Required when using a prefix."),
        'service_ip': Param(DEFAULT_YT_AUTH_SERVICE_IP, type="string", description="[Worker Param] IP of the ytdlp-ops-server. Default is from Airflow variable YT_AUTH_SERVICE_IP or hardcoded."),
        'service_port': Param(DEFAULT_YT_AUTH_SERVICE_PORT, type="integer", description="[Worker Param] Port of the Envoy load balancer. Default is from Airflow variable YT_AUTH_SERVICE_PORT or hardcoded."),
        'machine_id': Param("ytdlp-ops-airflow-service", type="string", description="[Worker Param] Identifier for the client machine."),
        'auto_create_new_accounts_on_exhaustion': Param(True, type="boolean", description="[Worker Param] If True and all accounts in a prefix-based pool are exhausted, create a new one automatically."),
        'retrigger_delay_on_empty_s': Param(60, type="integer", description="[Worker Param] Delay in seconds before a worker re-triggers itself if the queue is empty. Set to -1 to stop the loop."),
    }
 ) as dag:
    orchestrate_task = PythonOperator(
        task_id='start_worker_loops',
        python_callable=orchestrate_workers_ignition_callable,
    )
    orchestrate_task.doc_md = """
    ### Start Worker Loops
    This is the main task that executes the ignition policy.
    - It triggers `ytdlp_ops_worker_per_url` DAGs according to the batch settings.
    - It passes all its parameters down to the workers, which will use them to run their continuous loops.
    """
--- a/dags/ytdlp_ops_sensor_queue.py
+++ b/dags/ytdlp_ops_sensor_queue.py
@ -1,215 +0,0 @@
 # -*- coding: utf-8 -*-
 # vim:fenc=utf-8
 #
 # Copyright © 2024 rl <rl@rlmbp>
 #
 # Distributed under terms of the MIT license.
 """
 DAG to sense a Redis queue for new URLs and trigger the ytdlp_worker_per_url DAG.
 This is the "Sensor" part of a Sensor/Worker pattern.
 """
 from airflow import DAG
 from airflow.exceptions import AirflowException, AirflowSkipException
 from airflow.operators.python import PythonOperator
 from airflow.operators.trigger_dagrun import TriggerDagRunOperator
 from airflow.providers.redis.hooks.redis import RedisHook
 from airflow.models.param import Param
 from airflow.utils.dates import days_ago
 from datetime import timedelta
 import logging
 import redis
 # Import utility functions
 from utils.redis_utils import _get_redis_client
 # Configure logging
 logger = logging.getLogger(__name__)
 # Default settings
 DEFAULT_QUEUE_NAME = 'video_queue'
 DEFAULT_REDIS_CONN_ID = 'redis_default'
 DEFAULT_TIMEOUT = 30
 DEFAULT_MAX_URLS = '1' # Default number of URLs to process per run
 # --- Task Callables ---
 def select_account_callable(**context):
    """
    Placeholder task for future logic to dynamically select an account.
    For now, it just passes through the account_id from the DAG params.
    """
    params = context['params']
    account_id = params.get('account_id', 'default_account')
    logger.info(f"Selected account for this run: {account_id}")
    # This task could push the selected account_id to XComs in the future.
    # For now, the next task will just read it from params.
    return account_id
 def log_trigger_info_callable(**context):
    """Logs information about how the DAG run was triggered."""
    dag_run = context['dag_run']
    trigger_type = dag_run.run_type
    logger.info(f"Sensor DAG triggered. Run ID: {dag_run.run_id}, Type: {trigger_type}")
    if trigger_type == 'manual':
        logger.info("Trigger source: Manual execution from Airflow UI or CLI.")
    elif trigger_type == 'scheduled':
        logger.info("Trigger source: Scheduled run (periodic check).")
    elif trigger_type == 'dag_run':
        # In Airflow 2.2+ we can get the triggering run object
        try:
            triggering_dag_run = dag_run.get_triggering_dagrun()
            if triggering_dag_run:
                triggering_dag_id = triggering_dag_run.dag_id
                triggering_run_id = triggering_dag_run.run_id
                logger.info(f"Trigger source: DAG Run from '{triggering_dag_id}' (Run ID: {triggering_run_id}).")
                # Check if it's a worker by looking at the conf keys
                conf = dag_run.conf or {}
                if all(k in conf for k in ['queue_name', 'redis_conn_id', 'max_urls_per_run']):
                     logger.info("This appears to be a standard trigger from a worker DAG continuing the loop.")
                else:
                     logger.warning(f"Triggered by another DAG but conf does not match worker pattern. Conf: {conf}")
            else:
                logger.warning("Trigger type is 'dag_run' but could not retrieve triggering DAG run details.")
        except Exception as e:
            logger.error(f"Could not get triggering DAG run details: {e}")
    else:
        logger.info(f"Trigger source: {trigger_type}")
 def check_queue_for_urls_batch(**context):
    """
    Pops a batch of URLs from the inbox queue.
    Returns a list of configuration dictionaries for the TriggerDagRunOperator.
    If the queue is empty, it raises AirflowSkipException.
    """
    params = context['params']
    queue_name = params['queue_name']
    inbox_queue = f"{queue_name}_inbox"
    redis_conn_id = params.get('redis_conn_id', DEFAULT_REDIS_CONN_ID)
    max_urls_raw = params.get('max_urls_per_run', DEFAULT_MAX_URLS)
    try:
        max_urls = int(max_urls_raw)
    except (ValueError, TypeError):
        logger.warning(f"Invalid value for max_urls_per_run: '{max_urls_raw}'. Using default: {DEFAULT_MAX_URLS}")
        max_urls = DEFAULT_MAX_URLS
    urls_to_process = []
    try:
        client = _get_redis_client(redis_conn_id)
        current_queue_size = client.llen(inbox_queue)
        logger.info(f"Queue '{inbox_queue}' has {current_queue_size} URLs. Attempting to pop up to {max_urls}.")
        for _ in range(max_urls):
            url_bytes = client.lpop(inbox_queue)
            if url_bytes:
                url = url_bytes.decode('utf-8') if isinstance(url_bytes, bytes) else url_bytes
                logger.info(f"  - Popped URL: {url}")
                urls_to_process.append(url)
            else:
                # Queue is empty, stop trying to pop
                break
        if urls_to_process:
            logger.info(f"Found {len(urls_to_process)} URLs in queue. Generating trigger configurations.")
            # Create a list of 'conf' objects for the trigger operator to expand
            trigger_configs = []
            for url in urls_to_process:
                # The worker DAG will use its own default params for its operations.
                # We only need to provide the URL for processing, and the sensor's own
                # params so the worker can trigger the sensor again to continue the loop.
                worker_conf = {
                    'url': url,
                    'queue_name': queue_name,
                    'redis_conn_id': redis_conn_id,
                    'max_urls_per_run': int(max_urls),
                    'stop_on_failure': params.get('stop_on_failure', True),
                    'account_id': params.get('account_id', 'default_account')
                }
                trigger_configs.append(worker_conf)
            return trigger_configs
        else:
            logger.info(f"Queue '{inbox_queue}' is empty. Skipping trigger.")
            raise AirflowSkipException(f"Redis queue '{inbox_queue}' is empty.")
    except AirflowSkipException:
        raise
    except Exception as e:
        logger.error(f"Error popping URLs from Redis queue '{inbox_queue}': {e}", exc_info=True)
        raise AirflowException(f"Failed to pop URLs from Redis: {e}")
 # =============================================================================
 # DAG Definition
 # =============================================================================
 default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 0, # The sensor itself should not retry on failure, it will run again on schedule
    'start_date': days_ago(1),
 }
 with DAG(
    dag_id='ytdlp_ops_sensor_queue',
    default_args=default_args,
    schedule_interval=None, # Runs only on trigger, not on a schedule.
    max_active_runs=1, # Prevent multiple sensors from running at once
    catchup=False,
    description='Polls Redis queue on trigger for URLs and starts worker DAGs.',
    tags=['ytdlp', 'sensor', 'queue', 'redis', 'batch'],
    params={
        'queue_name': Param(DEFAULT_QUEUE_NAME, type="string", description="Base name for Redis queues."),
        'redis_conn_id': Param(DEFAULT_REDIS_CONN_ID, type="string", description="Airflow Redis connection ID."),
        'max_urls_per_run': Param(DEFAULT_MAX_URLS, type="string", description="Maximum number of URLs to process in one batch."),
        'stop_on_failure': Param(True, type="boolean", description="If True, a worker failure will stop the entire processing loop."),
        'account_id': Param('default_account', type="string", description="The account ID to use for processing the batch."),
    }
 ) as dag:
    log_trigger_info_task = PythonOperator(
        task_id='log_trigger_info',
        python_callable=log_trigger_info_callable,
    )
    log_trigger_info_task.doc_md = """
    ### Log Trigger Information
    Logs details about how this DAG run was initiated (e.g., manually or by a worker DAG).
    This provides visibility into the processing loop.
    """
    poll_redis_task = PythonOperator(
        task_id='check_queue_for_urls_batch',
        python_callable=check_queue_for_urls_batch,
    )
    poll_redis_task.doc_md = """
    ### Poll Redis Queue for Batch
    Checks the Redis inbox queue for a batch of new URLs (up to `max_urls_per_run`).
    - **On Success (URLs found):** Returns a list of configuration objects for the trigger task.
    - **On Skip (Queue empty):** Skips this task and the trigger task. The DAG run succeeds.
    """
    # This operator will be dynamically expanded based on the output of poll_redis_task
    trigger_worker_dags = TriggerDagRunOperator.partial(
        task_id='trigger_worker_dags',
        trigger_dag_id='ytdlp_ops_worker_per_url',
        wait_for_completion=False, # Fire and forget
        doc_md="""
 ### Trigger Worker DAGs (Dynamically Mapped)
 Triggers one `ytdlp_worker_per_url` DAG run for each URL found by the polling task.
 Each triggered DAG receives its own specific configuration (including the URL).
 This task is skipped if the polling task finds no URLs.
 """
    ).expand(
        conf=poll_redis_task.output
    )
    select_account_task = PythonOperator(
        task_id='select_account',
        python_callable=select_account_callable,
    )
    select_account_task.doc_md = "### Select Account\n(Placeholder for future dynamic account selection logic)"
    log_trigger_info_task >> select_account_task >> poll_redis_task >> trigger_worker_dags
--- a/dags/ytdlp_ops_worker_per_url.py
+++ b/dags/ytdlp_ops_worker_per_url.py
--- a/docker-compose-ytdlp-ops.yaml
+++ b/docker-compose-ytdlp-ops.yaml
@ -1,4 +1,42 @@
 services:
  config-generator:
    image: python:3.9-slim
    container_name: ytdlp-ops-config-generator
    working_dir: /app
    volumes:
      # Mount the current directory to access the template, .env, and script
      - .:/app
    env_file:
      - ./.env
    environment:
      ENVOY_CLUSTER_TYPE: STRICT_DNS
      # Pass worker count and base port to ensure Envoy config matches the workers
      YTDLP_WORKERS: ${YTDLP_WORKERS:-3}
      YTDLP_BASE_PORT: ${YTDLP_BASE_PORT:-9090}
    # This command cleans up old runs, installs jinja2, and generates the config.
    command: >
      sh -c "rm -rf ./envoy.yaml && 
             pip install --no-cache-dir -q jinja2 && 
             python3 ./generate_envoy_config.py"
  envoy:
    image: envoyproxy/envoy:v1.29-latest
    container_name: envoy-thrift-lb
    restart: unless-stopped
    volumes:
      # Mount the generated config file from the host
      - ./envoy.yaml:/etc/envoy/envoy.yaml:ro
    ports:
      # This is the single public port for all Thrift traffic
      - "${ENVOY_PORT:-9080}:${ENVOY_PORT:-9080}"
    networks:
      - airflow_prod_proxynet
    depends_on:
      config-generator:
        condition: service_completed_successfully
      ytdlp-ops:
        condition: service_started
  camoufox:
    build:
      context: ./camoufox # Path relative to the docker-compose file
@ -15,9 +53,8 @@ services:
      "--ws-host", "0.0.0.0",
      "--port", "12345",
      "--ws-path", "mypath",
-      "--proxy-url", "socks5://sslocal-rust-1084:1084",
+      "--proxy-url", "socks5://${SOCKS5_SOCK_SERVER_IP:-89.253.221.173}:1084",
      "--locale", "en-US",
      "--geoip",
      "--extensions", "/app/extensions/google_sign_in_popup_blocker-1.0.2.xpi,/app/extensions/spoof_timezone-0.3.4.xpi,/app/extensions/youtube_ad_auto_skipper-0.6.0.xpi"
    ]
    restart: unless-stopped
@ -25,25 +62,36 @@ services:
  ytdlp-ops: 
    image: pangramia/ytdlp-ops-server:latest # Don't comment out or remove, build is performed externally
    container_name: ytdlp-ops-workers # Renamed for clarity
    depends_on:
      - camoufox # Ensure camoufox starts first
-    ports:
+    # Ports are no longer exposed directly. Envoy will connect to them on the internal network.
-      - "9090:9090" # Main RPC port
+    env_file:
-      - "9091:9091" # Health check port
+      - ./.env # Path is relative to the compose file
    volumes:
      - context-data:/app/context-data
      # Mount the plugin source code for live updates without rebuilding the image.
      # Assumes the plugin source is in a 'bgutil-ytdlp-pot-provider' directory
      # next to your docker-compose.yaml file.
      #- ./bgutil-ytdlp-pot-provider:/app/bgutil-ytdlp-pot-provider
    networks:
      - airflow_prod_proxynet
    command:
      - "--script-dir"
      - "/app"
      - "--context-dir"
      - "/app/context-data"
      # Use environment variables for port and worker count
      - "--port"
-      - "9090"
+      - "${YTDLP_BASE_PORT:-9090}"
      - "--workers"
      - "${YTDLP_WORKERS:-3}"
      - "--clients"
      # Add 'web' client since we now have camoufox, test firstly
      - "web,ios,android,mweb"
      - "--proxies"
-      - "socks5://sslocal-rust-1081:1081,socks5://sslocal-rust-1082:1082,socks5://sslocal-rust-1083:1083,socks5://sslocal-rust-1084:1084,socks5://sslocal-rust-1085:1085"
+      #- "socks5://sslocal-rust-1081:1081,socks5://sslocal-rust-1082:1082,socks5://sslocal-rust-1083:1083,socks5://sslocal-rust-1084:1084,socks5://sslocal-rust-1085:1085"
      - "socks5://${SOCKS5_SOCK_SERVER_IP:-89.253.221.173}:1084"
      #      
      # Add the endpoint argument pointing to the camoufox service
      - "--endpoint"
      - "ws://camoufox:12345/mypath"
@ -61,6 +109,13 @@ services:
      - "${REDIS_PORT:-6379}"
      - "--redis-password"
      - "${REDIS_PASSWORD}"
      # Add account cooldown parameters (values are in minutes)
      - "--account-active-duration-min"
      - "${ACCOUNT_ACTIVE_DURATION_MIN:-30}"
      - "--account-cooldown-duration-min"
      - "${ACCOUNT_COOLDOWN_DURATION_MIN:-60}"
      # Add flag to clean context directory on start
      - "--clean-context-dir"
    restart: unless-stopped
    pull_policy: always
@ -69,5 +124,4 @@ volumes:
    name: context-data
 networks:
-  airflow_prod_proxynet:
+  airflow_prod_proxynet: {}
    external: true
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,9 @@
 thrift>=0.16.0,<=0.20.0
 backoff>=2.2.1
 python-dotenv==1.0.1
 psutil>=5.9.0
 docker>=6.0.0
 apache-airflow-providers-docker
 redis
 ffprobe3
 ffmpeg-python