# YTDLP Airflow DAGs This document describes the Airflow DAGs used for interacting with the YTDLP Ops service and managing processing queues. ## DAG Descriptions ### `ytdlp_client_dag_v2.1` * **File:** `airflow/dags/ytdlp_client_dag_v2.1.py` * **Purpose:** Provides a way to test the YTDLP Ops Thrift service interaction for a *single* video URL. Useful for debugging connection issues, testing specific account IDs, or verifying the service response for a particular URL independently of the queueing system. * **Parameters (Defaults):** * `url` (`'https://www.youtube.com/watch?v=sOlTX9uxUtM'`): The video URL to process. * `redis_enabled` (`False`): Use Redis for service discovery? * `service_ip` (`'85.192.30.55'`): Service IP if `redis_enabled=False`. * `service_port` (`9090`): Service port if `redis_enabled=False`. * `account_id` (`'account_fr_2025-04-03T1220_anonomyous_2ssdfsf2342afga09'`): Account ID for lookup or call. * `timeout` (`30`): Timeout in seconds for Thrift connection. * `info_json_dir` (`"{{ var.value.get('DOWNLOADS_TEMP', '/opt/airflow/downloadfiles') }}"`): Directory to save `info.json`. * **Results:** * Connects to the YTDLP Ops service using the specified method (Redis or direct IP). * Retrieves token data for the given URL and account ID. * Saves the video's `info.json` metadata to the specified directory. * Extracts the SOCKS proxy (if available). * Pushes `info_json_path`, `socks_proxy`, and the original `ytdlp_command` (with tokens) to XCom. * Optionally stores detailed results in a Redis hash (`token_info:`). ### `ytdlp_mgmt_queue_add_urls` * **File:** `airflow/dags/ytdlp_mgmt_queue_add_urls.py` * **Purpose:** Manually add video URLs to a specific YTDLP inbox queue (Redis List). * **Parameters (Defaults):** * `redis_conn_id` (`'redis_default'`): Airflow Redis connection ID. * `queue_name` (`'video_queue_inbox_account_fr_2025-04-03T1220_anonomyous_2ssdfsf2342afga09'`): Target Redis list (inbox queue). * `urls` (`""`): Multiline string of video URLs to add. * **Results:** * Parses the multiline `urls` parameter. * Adds each valid URL to the end of the Redis list specified by `queue_name`. * Logs the number of URLs added. ### `ytdlp_mgmt_queue_clear` * **File:** `airflow/dags/ytdlp_mgmt_queue_clear.py` * **Purpose:** Manually delete a specific Redis key used by the YTDLP queues. * **Parameters (Defaults):** * `redis_conn_id` (`'redis_default'`): Airflow Redis connection ID. * `queue_to_clear` (`'PLEASE_SPECIFY_QUEUE_TO_CLEAR'`): Exact name of the Redis key to clear. **Must be changed by user.** * **Results:** * Deletes the Redis key specified by the `queue_to_clear` parameter. * **Warning:** This operation is destructive and irreversible. Use with extreme caution. Ensure you specify the correct key name (e.g., `video_queue_inbox_account_xyz`, `video_queue_progress`, `video_queue_result`, `video_queue_fail`). ### `ytdlp_mgmt_queue_check_status` * **File:** `airflow/dags/ytdlp_mgmt_queue_check_status.py` * **Purpose:** Manually check the type and size of a specific YTDLP Redis queue/key. * **Parameters (Defaults):** * `redis_conn_id` (`'redis_default'`): Airflow Redis connection ID. * `queue_to_check` (`'video_queue_inbox_account_fr_2025-04-03T1220_anonomyous_2ssdfsf2342afga09'`): Exact name of the Redis key to check. * **Results:** * Connects to Redis and determines the type of the key specified by `queue_to_check`. * Determines the size (length for lists, number of fields for hashes). * Logs the key type and size. * Pushes `queue_key_type` and `queue_size` to XCom. ### `ytdlp_mgmt_queue_list_contents` * **File:** `airflow/dags/ytdlp_mgmt_queue_list_contents.py` * **Purpose:** Manually list the contents of a specific YTDLP Redis queue/key (list or hash). Useful for inspecting queue state or results. * **Parameters (Defaults):** * `redis_conn_id` (`'redis_default'`): Airflow Redis connection ID. * `queue_to_list` (`'video_queue_inbox_account_fr_2025-04-03T1220_anonomyous_2ssdfsf2342afga09'`): Exact name of the Redis key to list. * `max_items` (`100`): Maximum number of items/fields to list. * **Results:** * Connects to Redis and identifies the type of the key specified by `queue_to_list`. * If it's a List, logs the first `max_items` elements. * If it's a Hash, logs up to `max_items` key-value pairs, attempting to pretty-print JSON values. * Logs warnings for very large hashes. ### `ytdlp_proc_sequential_processor` * **File:** `airflow/dags/ytdlp_proc_sequential_processor.py` * **Purpose:** Processes YouTube URLs sequentially from a Redis queue. Designed for batch processing. Pops a URL, gets token/metadata via YTDLP Ops service, downloads the media using `yt-dlp`, and records the result. * **Parameters (Defaults):** * `queue_name` (`'video_queue'`): Base name for Redis queues (e.g., `video_queue_inbox`, `video_queue_progress`). * `redis_conn_id` (`'redis_default'`): Airflow Redis connection ID. * `redis_enabled` (`False`): Use Redis for service discovery? If False, uses `service_ip`/`port`. * `service_ip` (`None`): Required Service IP if `redis_enabled=False`. * `service_port` (`None`): Required Service port if `redis_enabled=False`. * `account_id` (`'default_account'`): Account ID for the API call (used for Redis lookup if `redis_enabled=True`). * `timeout` (`30`): Timeout in seconds for the Thrift connection. * `download_format` (`'ba[ext=m4a]/bestaudio/best'`): yt-dlp format selection string. * `output_path_template` (`"{{ var.value.get('DOWNLOADS_TEMP', '/opt/airflow/downloads') }}/%(title)s [%(id)s].%(ext)s"`): yt-dlp output template. Uses Airflow Variable `DOWNLOADS_TEMP`. * `info_json_dir` (`"{{ var.value.get('DOWNLOADS_TEMP', '/opt/airflow/downloadfiles') }}"`): Directory to save `info.json`. Uses Airflow Variable `DOWNLOADS_TEMP`. * **Results:** * Pops one URL from the `{{ params.queue_name }}_inbox` Redis list. * If a URL is popped, it's added to the `{{ params.queue_name }}_progress` Redis hash. * The `YtdlpOpsOperator` (`get_token` task) attempts to get token data (including `info.json`, proxy, command) for the URL using the specified connection method and account ID. * If token retrieval succeeds, the `download_video` task executes `yt-dlp` using the retrieved `info.json`, proxy, the `download_format` parameter, and the `output_path_template` parameter to download the actual media. * **On Successful Download:** The URL is removed from the progress hash and added to the `{{ params.queue_name }}_result` hash along with results (`info_json_path`, `socks_proxy`, `ytdlp_command`). * **On Failure (Token Retrieval or Download):** The URL is removed from the progress hash and added to the `{{ params.queue_name }}_fail` hash along with error details (message, traceback). * If the inbox queue is empty, the DAG run skips processing via `AirflowSkipException`.