Added dags on queue management and camoufox support

2025-04-13 18:14:42 +03:00 · 2025-04-13 18:14:42 +03:00 · 6989d49da3
commit 6989d49da3
parent 1f092d6f80
18 changed files with 2024 additions and 3041 deletions
--- a/README-ytdlp-ops-auth.md
+++ b/README-ytdlp-ops-auth.md
@ -0,0 +1,97 @@
 # YTDLP Client Side Integration
 This document describes how to integrate and use the YTDLP client with the token service.   
 ## Build
 1. **Pull, configure and start server if needed:**
   ```bash
   cd /srv/airflow_worker/  
   docker login pangramia # It used to be performed  beforehand otherwise ask pull password
   docker compose -f docker-compose-ytdlp-ops.yaml up -d
   docker compose -f docker-compose-ytdlp-ops.yaml logs -f
   ```
   The server is bound to a certain proxy, like "socks5://sslocal-rust-1084:1084".
   Also check that redis in bind to 0.0.0.0 in config
 2. **Build airflow-worker with custom dependencies:**
   ```bash
   cd /srv/airflow_worker/  
   docker compose build airflow-worker
   docker compose down airflow-worker
   docker compose up -d --no-deps airflow-worker
   ```
 3. **Test the built-in client:**
   ```bash
   # Show client help
   docker compose exec airflow-worker python /app/ytdlp_ops_client.py --help 
   # Get token and info.json 
   docker compose exec airflow-worker python /app/ytdlp_ops_client.py --host 85.192.30.55 --port 9090 getToken --url 'https://www.youtube.com/watch?v=vKTVLpmvznI'
   # List formats using saved info.json 
   docker compose exec airflow-worker yt-dlp --load-info-json "latest.json"  -F
   # Simulate download using saved info.json 
   docker compose exec airflow-worker yt-dlp --load-info-json "latest.json" --proxy "socks5://sslocal-rust-1084:1084" --simulate --verbose 
   # Extract metadata and download URLs using jq 
   docker compose exec airflow-worker jq -r '"Title: \(.title)", "Date: \(.upload_date | strptime("%Y%m%d") | strftime("%Y-%m-%d"))", "Author: \(.uploader)", "Length: \(.duration_string)", "", "Download URLs:", (.formats[] | select(.vcodec != "none" or .acodec != "none") | .url)' latest.json            
   ```
 4. **Test Airflow task:**
   To run the `ytdlp_client_dag_v2.1` DAG:
   Set up required Airflow variables 
    ```bash                                                                                                                                                        
    docker compose exec airflow-worker airflow variables set DOWNLOAD_OPTIONS '{"formats": ["bestvideo[height<=1080]+bestaudio/best[height<=1080]"]}'                                                  
    docker compose exec airflow-worker airflow variables set DOWNLOADS_TEMP '/opt/airflow/downloadfiles'
    docker compose exec airflow-worker airflow variables set DOWNLOADS_PATH '/opt/airflow/downloadfiles'     
    docker compose exec airflow-worker airflow variables  list
    docker compose exec airflow-worker airflow variables set TOKEN_TIMEOUT '300'  
    docker compose exec airflow-worker airflow  connections import /opt/airflow/config/docker_hub_repo.json
    docker compose exec airflow-worker airflow  connections delete redis_default
    docker compose exec airflow-worker airflow  connections import /opt/airflow/config/redis_default_conn.json
   ```
   **Using direct connection with task test:**
   ```bash
   docker compose exec airflow-worker airflow db reset
   docker compose exec airflow-worker airflow dags reserialize
   docker compose exec airflow-worker airflow dags list
   docker compose exec airflow-worker airflow  dags list-import-errors
   docker compose exec airflow-worker airflow tasks test ytdlp_client_dag_v2.1 get_token $(date -u +"%Y-%m-%dT%H:%M:%S+00:00") --task-params '{"url": "https://www.youtube.com/watch?v=sOlTX9uxUtM", "redis_enabled": false, "service_ip": "85.192.30.55", "service_port": 9090}'
   docker compose exec airflow-worker  yt-dlp --load-info-json /opt/airflow/downloadfiles/latest.json --proxy "socks5://sslocal-rust-1084:1084" --verbose --simulate
   docker compose exec airflow-worker  airflow dags list-runs -d ytdlp_client_dag 
   ```
   or deploy using trigger
    ```bash
   docker compose exec airflow-worker airflow dags list
   docker compose exec airflow-worker airflow dags unpause ytdlp_client_dag_v2.1
   // Try UI or recheck if works from server deploy
   docker compose exec airflow-worker airflow dags trigger ytdlp_client_dag_v2.1 -c '{"url": "https://www.youtube.com/watch?v=sOlTX9uxUtM", "redis_enabled": false, "service_ip": "85.192.30.55", "service_port": 9090}'                                              
    ```
   Check Redis for stored data by videoID
   ```bash  
    docker compose exec  redis redis-cli -a XXXXXX -h 89.253.221.173 -p 52909 HGETALL "token_info:sOlTX9uxUtM" | jq -R -s 'split("\n") | del(.[] | select(. == "")) | [.[range(0;length;2)]]'
   ```
--- a/README.md
+++ b/README.md
@ -1,97 +1,100 @@
-# YTDLP Client Side Integration
+# YTDLP Airflow DAGs
-This document describes how to integrate and use the YTDLP client with the token service.   
+This document describes the Airflow DAGs used for interacting with the YTDLP Ops service and managing processing queues.
-## Build
+## DAG Descriptions
-1. **Pull, configure and start server if needed:**
+### `ytdlp_client_dag_v2.1`
   ```bash
   cd /srv/airflow_worker/  
   docker login pangramia # It used to be performed  beforehand otherwise ask pull password
   docker compose -f docker-compose-ytdlp-ops.yaml up -d
   docker compose -f docker-compose-ytdlp-ops.yaml logs -f
   ```
   The server is bound to a certain proxy, like "socks5://sslocal-rust-1084:1084".
-   Also check that redis in bind to 0.0.0.0 in config
+*   **File:** `airflow/dags/ytdlp_client_dag_v2.1.py`
 *   **Purpose:** Provides a way to test the YTDLP Ops Thrift service interaction for a *single* video URL. Useful for debugging connection issues, testing specific account IDs, or verifying the service response for a particular URL independently of the queueing system.
 *   **Parameters (Defaults):**
        *   `url` (`'https://www.youtube.com/watch?v=sOlTX9uxUtM'`): The video URL to process.
        *   `redis_enabled` (`False`): Use Redis for service discovery?
        *   `service_ip` (`'85.192.30.55'`): Service IP if `redis_enabled=False`.
        *   `service_port` (`9090`): Service port if `redis_enabled=False`.
        *   `account_id` (`'account_fr_2025-04-03T1220_anonomyous_2ssdfsf2342afga09'`): Account ID for lookup or call.
        *   `timeout` (`30`): Timeout in seconds for Thrift connection.
        *   `info_json_dir` (`"{{ var.value.get('DOWNLOADS_TEMP', '/opt/airflow/downloadfiles') }}"`): Directory to save `info.json`.
 *   **Results:**
    *   Connects to the YTDLP Ops service using the specified method (Redis or direct IP).
    *   Retrieves token data for the given URL and account ID.
    *   Saves the video's `info.json` metadata to the specified directory.
    *   Extracts the SOCKS proxy (if available).
    *   Pushes `info_json_path`, `socks_proxy`, and the original `ytdlp_command` (with tokens) to XCom.
    *   Optionally stores detailed results in a Redis hash (`token_info:<video_id>`).
-2. **Build airflow-worker with custom dependencies:**
+### `ytdlp_mgmt_queue_add_urls`
   ```bash
   cd /srv/airflow_worker/  
   docker compose build airflow-worker
   docker compose down airflow-worker
   docker compose up -d --no-deps airflow-worker
   ```
-3. **Test the built-in client:**
+*   **File:** `airflow/dags/ytdlp_mgmt_queue_add_urls.py`
-   ```bash
+*   **Purpose:** Manually add video URLs to a specific YTDLP inbox queue (Redis List).
-   # Show client help
+*   **Parameters (Defaults):**
-   docker compose exec airflow-worker python /app/ytdlp_ops_client.py --help 
+        *   `redis_conn_id` (`'redis_default'`): Airflow Redis connection ID.
        *   `queue_name` (`'video_queue_inbox_account_fr_2025-04-03T1220_anonomyous_2ssdfsf2342afga09'`): Target Redis list (inbox queue).
        *   `urls` (`""`): Multiline string of video URLs to add.
 *   **Results:**
    *   Parses the multiline `urls` parameter.
    *   Adds each valid URL to the end of the Redis list specified by `queue_name`.
    *   Logs the number of URLs added.
-   # Get token and info.json 
+### `ytdlp_mgmt_queue_clear`
   docker compose exec airflow-worker python /app/ytdlp_ops_client.py --host 85.192.30.55 --port 9090 getToken --url 'https://www.youtube.com/watch?v=vKTVLpmvznI'
-   # List formats using saved info.json 
+*   **File:** `airflow/dags/ytdlp_mgmt_queue_clear.py`
-   docker compose exec airflow-worker yt-dlp --load-info-json "latest.json"  -F
+*   **Purpose:** Manually delete a specific Redis key used by the YTDLP queues.
 *   **Parameters (Defaults):**
        *   `redis_conn_id` (`'redis_default'`): Airflow Redis connection ID.
        *   `queue_to_clear` (`'PLEASE_SPECIFY_QUEUE_TO_CLEAR'`): Exact name of the Redis key to clear. **Must be changed by user.**
 *   **Results:**
    *   Deletes the Redis key specified by the `queue_to_clear` parameter.
    *   **Warning:** This operation is destructive and irreversible. Use with extreme caution. Ensure you specify the correct key name (e.g., `video_queue_inbox_account_xyz`, `video_queue_progress`, `video_queue_result`, `video_queue_fail`).
-   # Simulate download using saved info.json 
+### `ytdlp_mgmt_queue_check_status`
   docker compose exec airflow-worker yt-dlp --load-info-json "latest.json" --proxy "socks5://sslocal-rust-1084:1084" --simulate --verbose 
-   # Extract metadata and download URLs using jq 
+*   **File:** `airflow/dags/ytdlp_mgmt_queue_check_status.py`
-   docker compose exec airflow-worker jq -r '"Title: \(.title)", "Date: \(.upload_date | strptime("%Y%m%d") | strftime("%Y-%m-%d"))", "Author: \(.uploader)", "Length: \(.duration_string)", "", "Download URLs:", (.formats[] | select(.vcodec != "none" or .acodec != "none") | .url)' latest.json            
+*   **Purpose:** Manually check the type and size of a specific YTDLP Redis queue/key.
-   ```
+*   **Parameters (Defaults):**
        *   `redis_conn_id` (`'redis_default'`): Airflow Redis connection ID.
        *   `queue_to_check` (`'video_queue_inbox_account_fr_2025-04-03T1220_anonomyous_2ssdfsf2342afga09'`): Exact name of the Redis key to check.
 *   **Results:**
    *   Connects to Redis and determines the type of the key specified by `queue_to_check`.
    *   Determines the size (length for lists, number of fields for hashes).
    *   Logs the key type and size.
    *   Pushes `queue_key_type` and `queue_size` to XCom.
-4. **Test Airflow task:**
+### `ytdlp_mgmt_queue_list_contents`
-   To run the `ytdlp_client_dag_v2.1` DAG:
+*   **File:** `airflow/dags/ytdlp_mgmt_queue_list_contents.py`
 *   **Purpose:** Manually list the contents of a specific YTDLP Redis queue/key (list or hash). Useful for inspecting queue state or results.
 *   **Parameters (Defaults):**
        *   `redis_conn_id` (`'redis_default'`): Airflow Redis connection ID.
        *   `queue_to_list` (`'video_queue_inbox_account_fr_2025-04-03T1220_anonomyous_2ssdfsf2342afga09'`): Exact name of the Redis key to list.
        *   `max_items` (`100`): Maximum number of items/fields to list.
 *   **Results:**
    *   Connects to Redis and identifies the type of the key specified by `queue_to_list`.
    *   If it's a List, logs the first `max_items` elements.
    *   If it's a Hash, logs up to `max_items` key-value pairs, attempting to pretty-print JSON values.
    *   Logs warnings for very large hashes.
-   Set up required Airflow variables 
+### `ytdlp_proc_sequential_processor`
    ```bash                                                                                                                                                        
    docker compose exec airflow-worker airflow variables set DOWNLOAD_OPTIONS '{"formats": ["bestvideo[height<=1080]+bestaudio/best[height<=1080]"]}'                                                  
    docker compose exec airflow-worker airflow variables set DOWNLOADS_TEMP '/opt/airflow/downloadfiles'
    docker compose exec airflow-worker airflow variables set DOWNLOADS_PATH '/opt/airflow/downloadfiles'     
    docker compose exec airflow-worker airflow variables  list
    docker compose exec airflow-worker airflow variables set TOKEN_TIMEOUT '300'  
-    docker compose exec airflow-worker airflow  connections import /opt/airflow/config/docker_hub_repo.json
+*   **File:** `airflow/dags/ytdlp_proc_sequential_processor.py`
-    docker compose exec airflow-worker airflow  connections delete redis_default
+*   **Purpose:** Processes YouTube URLs sequentially from a Redis queue. Designed for batch processing. Pops a URL, gets token/metadata via YTDLP Ops service, downloads the media using `yt-dlp`, and records the result.
-    docker compose exec airflow-worker airflow  connections import /opt/airflow/config/redis_default_conn.json
+*   **Parameters (Defaults):**
-   ```
+        *   `queue_name` (`'video_queue'`): Base name for Redis queues (e.g., `video_queue_inbox`, `video_queue_progress`).
- 
+        *   `redis_conn_id` (`'redis_default'`): Airflow Redis connection ID.
-
+        *   `redis_enabled` (`False`): Use Redis for service discovery? If False, uses `service_ip`/`port`.
-   **Using direct connection with task test:**
+        *   `service_ip` (`None`): Required Service IP if `redis_enabled=False`.
-   ```bash
+        *   `service_port` (`None`): Required Service port if `redis_enabled=False`.
-   docker compose exec airflow-worker airflow db reset
+        *   `account_id` (`'default_account'`): Account ID for the API call (used for Redis lookup if `redis_enabled=True`).
-   docker compose exec airflow-worker airflow dags reserialize
+        *   `timeout` (`30`): Timeout in seconds for the Thrift connection.
-
+        *   `download_format` (`'ba[ext=m4a]/bestaudio/best'`): yt-dlp format selection string.
-   docker compose exec airflow-worker airflow dags list
+        *   `output_path_template` (`"{{ var.value.get('DOWNLOADS_TEMP', '/opt/airflow/downloads') }}/%(title)s [%(id)s].%(ext)s"`): yt-dlp output template. Uses Airflow Variable `DOWNLOADS_TEMP`.
-   docker compose exec airflow-worker airflow  dags list-import-errors
+        *   `info_json_dir` (`"{{ var.value.get('DOWNLOADS_TEMP', '/opt/airflow/downloadfiles') }}"`): Directory to save `info.json`. Uses Airflow Variable `DOWNLOADS_TEMP`.
-   docker compose exec airflow-worker airflow tasks test ytdlp_client_dag_v2.1 get_token $(date -u +"%Y-%m-%dT%H:%M:%S+00:00") --task-params '{"url": "https://www.youtube.com/watch?v=sOlTX9uxUtM", "redis_enabled": false, "service_ip": "85.192.30.55", "service_port": 9090}'
+*   **Results:**
-   docker compose exec airflow-worker  yt-dlp --load-info-json /opt/airflow/downloadfiles/latest.json --proxy "socks5://sslocal-rust-1084:1084" --verbose --simulate
+    *   Pops one URL from the `{{ params.queue_name }}_inbox` Redis list.
-
+    *   If a URL is popped, it's added to the `{{ params.queue_name }}_progress` Redis hash.
-   docker compose exec airflow-worker  airflow dags list-runs -d ytdlp_client_dag 
+    *   The `YtdlpOpsOperator` (`get_token` task) attempts to get token data (including `info.json`, proxy, command) for the URL using the specified connection method and account ID.
-                                                                             
+    *   If token retrieval succeeds, the `download_video` task executes `yt-dlp` using the retrieved `info.json`, proxy, the `download_format` parameter, and the `output_path_template` parameter to download the actual media.
-
+    *   **On Successful Download:** The URL is removed from the progress hash and added to the `{{ params.queue_name }}_result` hash along with results (`info_json_path`, `socks_proxy`, `ytdlp_command`).
-
+    *   **On Failure (Token Retrieval or Download):** The URL is removed from the progress hash and added to the `{{ params.queue_name }}_fail` hash along with error details (message, traceback).
-
+    *   If the inbox queue is empty, the DAG run skips processing via `AirflowSkipException`.
   ```
   or deploy using trigger
    ```bash
   docker compose exec airflow-worker airflow dags list
   docker compose exec airflow-worker airflow dags unpause ytdlp_client_dag_v2.1
   // Try UI or recheck if works from server deploy
   docker compose exec airflow-worker airflow dags trigger ytdlp_client_dag_v2.1 -c '{"url": "https://www.youtube.com/watch?v=sOlTX9uxUtM", "redis_enabled": false, "service_ip": "85.192.30.55", "service_port": 9090}'                                              
    ```
   Check Redis for stored data by videoID
   ```bash  
    docker compose exec  redis redis-cli -a XXXXXX -h 89.253.221.173 -p 52909 HGETALL "token_info:sOlTX9uxUtM" | jq -R -s 'split("\n") | del(.[] | select(. == "")) | [.[range(0;length;2)]]'
   ```
--- a/camoufox/Dockerfile
+++ b/camoufox/Dockerfile
@ -0,0 +1,42 @@
 # Use a base Python image
 FROM python:3.11-slim
 # Set working directory
 WORKDIR /app
 # Install necessary system packages for Playwright, GeoIP, and Xvfb
 RUN apt-get update && apt-get install -y --no-install-recommends \
    libgeoip1 \
    # Xvfb for headless browser display
    xvfb \
    # Playwright browser dependencies
    libnss3 libnspr4 libdbus-1-3 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2 \
    && rm -rf /var/lib/apt/lists/*
 # Install Python dependencies: camoufox with geoip support and playwright==1.49
 # Using --no-cache-dir to reduce image size
 RUN pip install --no-cache-dir "camoufox[geoip]" playwright==1.49
 # Install Playwright browsers for version 1.49
 RUN playwright install --with-deps
 # Copy the server script into the image
 COPY camoufox_server.py .
 # Create directory for extensions and copy them
 RUN mkdir /app/extensions
 COPY google_sign_in_popup_blocker-1.0.2.xpi /app/extensions/
 COPY spoof_timezone-0.3.4.xpi /app/extensions/
 COPY youtube_ad_auto_skipper-0.6.0.xpi /app/extensions/
 # Expose the default port Camoufox might use (adjust if needed)
 # This is informational; the actual port mapping is in docker-compose.
 EXPOSE 12345
 # Copy the wrapper script and make it executable
 COPY start_camoufox.sh /app/
 RUN chmod +x /app/start_camoufox.sh
 # Default command executes the wrapper script.
 # Arguments for camoufox_server.py will be passed via docker-compose command section.
 ENTRYPOINT ["/app/start_camoufox.sh"]
--- a/camoufox/camoufox_server.py
+++ b/camoufox/camoufox_server.py
@ -0,0 +1,190 @@
 #!/usr/bin/env python3
 import re
 import argparse
 import atexit  # Import atexit
 import shutil  # Import shutil for directory removal
 import logging  # Import the logging module
 import sys      # Import sys for stdout
 import os       # Import os module
 from camoufox.server import launch_server
 def parse_proxy_url(url):
    """Parse proxy URL in format proto://user:pass@host:port"""
    pattern = r'([^:]+)://(?:([^:]+):([^@]+)@)?([^:]+):(\d+)'
    match = re.match(pattern, url)
    if not match:
        raise ValueError('Invalid proxy URL format. Expected proto://[user:pass@]host:port')
    proto, username, password, host, port = match.groups()
    # Ensure username and password are strings, not None
    proxy_config = {
        'server': f'{proto}://{host}:{port}',
        'username': username or '',
        'password': password or ''
    }
    # Remove empty credentials
    if not proxy_config['username']:
        del proxy_config['username']
    if not proxy_config['password']:
        del proxy_config['password']
    return proxy_config
 def main():
    parser = argparse.ArgumentParser(description='Launch Camoufox server with optional proxy support')
    parser.add_argument('--proxy-url', help='Optional proxy URL in format proto://user:pass@host:port (supports http, https, socks5)')
    parser.add_argument('--ws-host', default='localhost', help='WebSocket server host address (e.g., localhost, 0.0.0.0)')
    parser.add_argument('--port', type=int, default=0, help='WebSocket server port (0 for random)')
    parser.add_argument('--ws-path', default='camoufox', help='WebSocket server path')
    parser.add_argument('--headless', action='store_true', help='Run browser in headless mode')
    parser.add_argument('--geoip', nargs='?', const=True, default=False,
        help='Enable geo IP protection. Can specify IP address or use True for automatic detection')
    parser.add_argument('--locale', help='Locale(s) to use (e.g. "en-US" or "en-US,fr-FR")')
    parser.add_argument('--block-images', action='store_true', help='Block image requests to save bandwidth')
    parser.add_argument('--block-webrtc', action='store_true', help='Block WebRTC entirely')
    parser.add_argument('--humanize', nargs='?', const=True, type=float,
        help='Humanize cursor movements. Can specify max duration in seconds')
    parser.add_argument('--extensions', type=str,
        help='Comma-separated list of extension paths to enable (XPI files or extracted directories). Use quotes if paths contain spaces.')
    args = parser.parse_args()
    proxy_config = None
    if args.proxy_url:
        try:
            proxy_config = parse_proxy_url(args.proxy_url)
            print(f"Using proxy configuration: {args.proxy_url}")
        except ValueError as e:
            print(f'Error parsing proxy URL: {e}')
            return
    else:
        print("No proxy URL provided. Running without proxy.")
    # --- Basic Logging Configuration ---
    # Configure the root logger to show INFO level messages
    # This might capture logs from camoufox or its dependencies (like websockets)
    log_formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    log_handler = logging.StreamHandler(sys.stdout) # Log to standard output
    log_handler.setFormatter(log_formatter)
    root_logger = logging.getLogger()
    # Remove existing handlers to avoid duplicates if script is re-run in same process
    for handler in root_logger.handlers[:]:
        root_logger.removeHandler(handler)
    root_logger.addHandler(log_handler)
    # Set level to DEBUG for more detailed output from Camoufox/Playwright
    root_logger.setLevel(logging.DEBUG)
    logging.debug("DEBUG logging enabled. Starting Camoufox server setup...")
    # --- End Logging Configuration ---
    try:
        # --- Check DISPLAY environment variable ---
        display_var = os.environ.get('DISPLAY')
        logging.info(f"Value of DISPLAY environment variable: {display_var}")
        # --- End Check ---
        # Build config dictionary
        config = {
            'headless': args.headless,
            'geoip': args.geoip,
            # 'proxy': proxy_config, # Add proxy config only if it exists
            'host': args.ws_host, # Add the host argument
            'port': args.port,
            'ws_path': args.ws_path,
            # Explicitly pass DISPLAY environment variable to Playwright
            'env': {'DISPLAY': os.environ.get('DISPLAY')}
        }
        # Add proxy to config only if it was successfully parsed
        if proxy_config:
            config['proxy'] = proxy_config
        # Add optional parameters
        if args.locale:
            config['locale'] = args.locale
        if args.block_images:
            config['block_images'] = True
        if args.block_webrtc:
            config['block_webrtc'] = True
        if args.humanize:
            config['humanize'] = args.humanize if isinstance(args.humanize, float) else True
        # Exclude default addons including uBlock Origin
        config['exclude_addons'] = ['ublock_origin', 'default_addons']
        print('Excluded default addons including uBlock Origin')
        # Add custom extensions if specified
        if args.extensions:
            from pathlib import Path
            valid_extensions = []
            # Split comma-separated extensions
            extensions_list = [ext.strip() for ext in args.extensions.split(',')]
            temp_dirs_to_cleanup = [] # List to store temp dirs
            # Register cleanup function
            def cleanup_temp_dirs():
                for temp_dir in temp_dirs_to_cleanup:
                    try:
                        shutil.rmtree(temp_dir)
                        print(f"Cleaned up temporary extension directory: {temp_dir}")
                    except Exception as e:
                        print(f"Warning: Failed to clean up temp dir {temp_dir}: {e}")
            atexit.register(cleanup_temp_dirs)
            for ext_path in extensions_list:
                # Convert to absolute path
                ext_path = Path(ext_path).absolute()
                if not ext_path.exists():
                    print(f"Warning: Extension path does not exist: {ext_path}")
                    continue
                if ext_path.is_file() and ext_path.suffix == '.xpi':
                    # Extract XPI to temporary directory
                    import tempfile
                    import zipfile
                    try:
                        temp_dir = tempfile.mkdtemp(prefix=f"camoufox_ext_{ext_path.stem}_")
                        temp_dirs_to_cleanup.append(temp_dir) # Add to cleanup list
                        with zipfile.ZipFile(ext_path, 'r') as zip_ref:
                            zip_ref.extractall(temp_dir)
                        valid_extensions.append(temp_dir)
                        print(f"Successfully loaded extension: {ext_path.name} (extracted to {temp_dir})")
                    except Exception as e:
                        print(f"Error loading extension {ext_path}: {str(e)}")
                        # Remove from cleanup list if extraction failed before adding to valid_extensions
                        if temp_dir in temp_dirs_to_cleanup:
                            temp_dirs_to_cleanup.remove(temp_dir)
                        continue
                elif ext_path.is_dir():
                    # Check if it's a valid Firefox extension
                    if (ext_path / 'manifest.json').exists():
                        valid_extensions.append(str(ext_path))
                        print(f"Successfully loaded extension: {ext_path.name}")
                    else:
                        print(f"Warning: Directory is not a valid Firefox extension: {ext_path}")
                else:
                    print(f"Warning: Invalid extension path: {ext_path}")
            if valid_extensions:
                config['addons'] = valid_extensions
                print(f"Loaded {len(valid_extensions)} extensions")
            else:
                print("Warning: No valid extensions were loaded")
        server = launch_server(**config)
    except Exception as e:
        print(f'Error launching server: {str(e)}')
        if 'Browser.setBrowserProxy' in str(e):
            print('Note: The browser may not support SOCKS5 proxy authentication')
        return
    print(f'\nCamoufox server started successfully!')
    print(f'WebSocket endpoint: {server.ws_endpoint}\n')
 if __name__ == '__main__':
    main()
--- a/camoufox/google_sign_in_popup_blocker-1.0.2.xpi
+++ b/camoufox/google_sign_in_popup_blocker-1.0.2.xpi
--- a/camoufox/spoof_timezone-0.3.4.xpi
+++ b/camoufox/spoof_timezone-0.3.4.xpi
--- a/camoufox/start_camoufox.sh
+++ b/camoufox/start_camoufox.sh
@ -0,0 +1,58 @@
 #!/bin/bash
 # Set error handling
 set -e
 # Function to cleanup resources on exit
 cleanup() {
  echo "Cleaning up resources..."
  # Kill Xvfb if it's running
  if [ -n "$XVFB_PID" ] && ps -p $XVFB_PID > /dev/null; then
    echo "Stopping Xvfb (PID: $XVFB_PID)"
    kill $XVFB_PID || true
  fi
  # Remove X lock files if they exist
  if [ -e "/tmp/.X99-lock" ]; then
    echo "Removing X lock file"
    rm -f /tmp/.X99-lock
  fi
  echo "Cleanup complete"
 }
 # Register the cleanup function to run on script exit
 trap cleanup EXIT
 # Check if X lock file exists and remove it (in case of previous unclean shutdown)
 if [ -e "/tmp/.X99-lock" ]; then
  echo "Removing existing X lock file"
  rm -f /tmp/.X99-lock
 fi
 # Start Xvfb with display :99
 echo "Starting Xvfb on display :99"
 Xvfb :99 -screen 0 1280x1024x24 -ac &
 XVFB_PID=$!
 # Wait a moment for Xvfb to initialize
 sleep 2
 # Check if Xvfb started successfully
 if ! ps -p $XVFB_PID > /dev/null; then
  echo "Failed to start Xvfb"
  exit 1
 fi
 # Export the DISPLAY variable for the browser
 export DISPLAY=:99
 echo "Xvfb started successfully with PID: $XVFB_PID"
 echo "DISPLAY set to: $DISPLAY"
 # Start the Camoufox server with all arguments passed to this script  
 echo "Starting Camoufox server with arguments:"                       
 printf "  Arg: '%s'\n" "$@" # Print each argument quoted on a new line
 echo "Executing: python3 camoufox_server.py $@"                       
 python3 camoufox_server.py "$@"  
--- a/camoufox/youtube_ad_auto_skipper-0.6.0.xpi
+++ b/camoufox/youtube_ad_auto_skipper-0.6.0.xpi
--- a/dags/ytdlp_client_dag_v2.1.py
+++ b/dags/ytdlp_client_dag_v2.1.py
@ -468,9 +468,10 @@ class YtdlpOpsOperator(BaseOperator):
            # Write to timestamped file
            try:
                logger.info(f"Writing info.json content (received from service) to {info_json_path}...")
                with open(info_json_path, 'w', encoding='utf-8') as f:
                    f.write(info_json)
-                logger.info(f"Saved info.json to timestamped file: {info_json_path}")
+                logger.info(f"Successfully saved info.json to timestamped file: {info_json_path}")
            except IOError as e:
                 logger.error(f"Failed to write info.json to {info_json_path}: {e}")
                 return None # Indicate failure
--- a/dags/ytdlp_mgmt_queue_add_urls.py
+++ b/dags/ytdlp_mgmt_queue_add_urls.py
@ -0,0 +1,189 @@
 from airflow import DAG
 from airflow.models.param import Param
 from airflow.operators.python import PythonOperator
 from airflow.providers.redis.hooks.redis import RedisHook
 from airflow.utils.dates import days_ago
 from airflow.exceptions import AirflowException
 from datetime import timedelta
 import logging
 import redis # Import redis exceptions if needed
 # Configure logging
 logger = logging.getLogger(__name__)
 # Default settings
 DEFAULT_QUEUE_NAME = 'video_queue_inbox' # Default to the inbox queue
 DEFAULT_REDIS_CONN_ID = 'redis_default'
 # --- Helper Functions ---
 def _get_redis_client(redis_conn_id):
    """Gets a Redis client connection using RedisHook."""
    try:
        hook = RedisHook(redis_conn_id=redis_conn_id)
        client = hook.get_conn()
        client.ping()
        logger.info(f"Successfully connected to Redis using connection '{redis_conn_id}'.")
        return client
    except redis.exceptions.AuthenticationError:
        logger.error(f"Redis authentication failed for connection '{redis_conn_id}'. Check password.")
        raise AirflowException(f"Redis authentication failed for '{redis_conn_id}'.")
    except Exception as e:
        logger.error(f"Failed to get Redis client for connection '{redis_conn_id}': {e}")
        raise AirflowException(f"Redis connection failed for '{redis_conn_id}': {e}")
 # --- Python Callables for Tasks ---
 def add_urls_callable(**context):
    """Adds URLs from comma/newline separated input to the specified Redis list."""
    params = context['params']
    redis_conn_id = params['redis_conn_id']
    queue_name = params['queue_name'] # Should be the inbox queue, e.g., video_queue_inbox
    urls_input = params['urls']
    if not queue_name.endswith('_inbox'):
        logger.warning(f"Target queue name '{queue_name}' does not end with '_inbox'. Ensure this is the intended inbox queue.")
    if not urls_input or not isinstance(urls_input, str):
        logger.warning("No URLs provided or 'urls' parameter is not a string. Nothing to add.")
        return
    # Process input: split by newline, then by comma, flatten, strip, and filter empty
    urls_to_add = []
    for line in urls_input.splitlines():
        urls_to_add.extend(url.strip() for url in line.split(',') if url.strip())
    # Remove duplicates while preserving order (optional, but good practice)
    seen = set()
    urls_to_add = [x for x in urls_to_add if not (x in seen or seen.add(x))]
    if not urls_to_add:
        logger.info("No valid URLs found after processing input. Nothing added.")
        return
    logger.info(f"Attempting to add {len(urls_to_add)} unique URLs to Redis list '{queue_name}' using connection '{redis_conn_id}'.")
    try:
        redis_client = _get_redis_client(redis_conn_id)
        # Use rpush to add to the end of the list (FIFO behavior with lpop)
        added_count = redis_client.rpush(queue_name, *urls_to_add)
        logger.info(f"Successfully added {len(urls_to_add)} URLs to list '{queue_name}'. New list length: {added_count}.")
    except Exception as e:
        logger.error(f"Failed to add URLs to Redis list '{queue_name}': {e}", exc_info=True)
        raise AirflowException(f"Failed to add URLs to Redis: {e}")
 # Removed clear_queue_callable as this DAG focuses on adding and verifying
 def check_status_callable(**context):
    """Checks the type and length/size of the specified Redis key."""
    # Access DAG run parameters directly from context['params']
    dag_params = context['params']
    redis_conn_id = dag_params['redis_conn_id']
    # Check the status of the queue specified in the main DAG parameters
    queue_to_check = dag_params['queue_name']
    if not queue_to_check:
        raise ValueError("DAG parameter 'queue_name' cannot be empty.")
    logger.info(f"Attempting to check status of Redis key '{queue_to_check}' using connection '{redis_conn_id}'.") # Uses DAG param value
    try:
        # Use the resolved redis_conn_id to get the client
        redis_client = _get_redis_client(redis_conn_id)
        # redis_client.type returns bytes (e.g., b'list', b'hash', b'none')
        key_type_bytes = redis_client.type(queue_to_check)
        key_type_str = key_type_bytes.decode('utf-8') # Decode to string
        length = 0
        if key_type_str == 'list':
            length = redis_client.llen(queue_to_check)
            logger.info(f"Redis list '{queue_to_check}' has {length} items.")
        elif key_type_str == 'hash':
            length = redis_client.hlen(queue_to_check)
            logger.info(f"Redis hash '{queue_to_check}' has {length} fields.")
        elif key_type_str == 'none': # Check against the decoded string 'none'
            logger.info(f"Redis key '{queue_to_check}' does not exist.")
        else:
            # Attempt to get size for other types if possible, e.g., set size
            try:
                if key_type_str == 'set':
                    length = redis_client.scard(queue_to_check)
                    logger.info(f"Redis set '{queue_to_check}' has {length} members.")
                # Add checks for other types like zset if needed
                else:
                    logger.info(f"Redis key '{queue_to_check}' exists but is of unhandled type '{key_type_str}'. Cannot determine size.")
            except Exception as size_error:
                 logger.warning(f"Could not determine size for Redis key '{queue_to_check}' (type: {key_type_str}): {size_error}")
                 logger.info(f"Redis key '{queue_to_check}' exists but is of unhandled/unsizeable type '{key_type_str}'.")
        # Push results to XCom
        context['task_instance'].xcom_push(key='queue_key_type', value=key_type_str)
        context['task_instance'].xcom_push(key='queue_size', value=length)
        # Return status info using the resolved queue_to_check
        return {'key': queue_to_check, 'type': key_type_str, 'size': length}
    except Exception as e:
        # Log error using the resolved queue_to_check
        logger.error(f"Failed to check status of Redis key '{queue_to_check}': {e}", exc_info=True)
        raise AirflowException(f"Failed to check Redis key status: {e}")
 # --- DAG Definition ---
 default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1), # Slightly longer retry delay for management tasks
    'start_date': days_ago(1)
 }
 # This single DAG contains operators for different management actions,
 # This DAG allows adding URLs and then checking the status of the target queue.
 with DAG(
    dag_id='ytdlp_mgmt_queue_add_and_verify', # Updated DAG ID
    default_args=default_args,
    schedule_interval=None, # Manually triggered
    catchup=False,
    description='Manually add URLs to a YTDLP inbox queue and verify the queue status.', # Updated description
    tags=['ytdlp', 'queue', 'management', 'redis', 'manual', 'add', 'verify'], # Updated tags
    params={
        # Common params
        'redis_conn_id': Param(DEFAULT_REDIS_CONN_ID, type="string", description="Airflow Redis connection ID."),
        # Params for adding URLs (and checking the same queue)
        'queue_name': Param(DEFAULT_QUEUE_NAME, type="string", title="Target Queue Name", description="Redis list (inbox queue) to add URLs to and check status of."),
        'urls': Param("", type="string", title="URLs to Add", description="Comma and/or newline separated list of video URLs.", multiline=True), # Updated description, keep multiline for UI
        # Removed clear_queue_name param
        # Removed check_queue_name param (will use queue_name)
    }
 ) as dag:
    add_urls_task = PythonOperator(
        task_id='add_urls_to_queue',
        python_callable=add_urls_callable,
        # Pass only relevant params to the callable via context['params']
        # Note: context['params'] automatically contains all DAG params
    )
    add_urls_task.doc_md = """
    ### Add URLs to Queue
    Adds URLs from the `urls` parameter (comma/newline separated) to the Redis list specified by `queue_name`.
    *Trigger this task manually via the UI and provide the URLs.*
    """
    # Removed clear_queue_task
    check_status_task = PythonOperator(
        task_id='check_queue_status_after_add',
        python_callable=check_status_callable,
        # No task-specific params needed; callable uses context['params'] directly.
    )
    check_status_task.doc_md = """
    ### Check Queue Status After Add
    Checks the type and length/size of the Redis key specified by `queue_name` (the same queue URLs were added to).
    Logs the result and pushes `queue_key_type` and `queue_size` to XCom.
    *This task runs automatically after `add_urls_to_queue`.*
    """
    # Define dependency: Add URLs first, then check status
    add_urls_task >> check_status_task
--- a/dags/ytdlp_mgmt_queue_check_status.py
+++ b/dags/ytdlp_mgmt_queue_check_status.py
@ -0,0 +1,133 @@
 # -*- coding: utf-8 -*-
 # vim:fenc=utf-8
 #
 # Copyright © 2024 rl <rl@rlmbp>
 #
 # Distributed under terms of the MIT license.
 """
 Airflow DAG for manually checking the status (type and size) of a specific Redis key used by YTDLP queues.
 """
 from airflow import DAG
 from airflow.exceptions import AirflowException
 from airflow.models.param import Param
 from airflow.operators.python import PythonOperator
 from airflow.providers.redis.hooks.redis import RedisHook
 from airflow.utils.dates import days_ago
 from datetime import timedelta
 import logging
 import redis # Import redis exceptions if needed
 # Configure logging
 logger = logging.getLogger(__name__)
 # Default settings
 DEFAULT_REDIS_CONN_ID = 'redis_default'
 # Default to a common inbox pattern, user should override with the specific key
 DEFAULT_QUEUE_TO_CHECK = 'video_queue_inbox'
 # --- Helper Function ---
 def _get_redis_client(redis_conn_id):
    """Gets a Redis client connection using RedisHook."""
    try:
        hook = RedisHook(redis_conn_id=redis_conn_id)
        client = hook.get_conn()
        client.ping()
        logger.info(f"Successfully connected to Redis using connection '{redis_conn_id}'.")
        return client
    except redis.exceptions.AuthenticationError:
        logger.error(f"Redis authentication failed for connection '{redis_conn_id}'. Check password.")
        raise AirflowException(f"Redis authentication failed for '{redis_conn_id}'.")
    except Exception as e:
        logger.error(f"Failed to get Redis client for connection '{redis_conn_id}': {e}")
        raise AirflowException(f"Redis connection failed for '{redis_conn_id}': {e}")
 # --- Python Callable for Check Status Task ---
 def check_status_callable(**context):
    """Checks the length/size of the specified Redis key (queue/hash)."""
    params = context['params']
    redis_conn_id = params['redis_conn_id']
    queue_to_check = params['queue_to_check'] # Specific queue/hash name
    if not queue_to_check:
        raise ValueError("Parameter 'queue_to_check' cannot be empty.")
    logger.info(f"Attempting to check status of Redis key '{queue_to_check}' using connection '{redis_conn_id}'.")
    try:
        redis_client = _get_redis_client(redis_conn_id)
        key_type = redis_client.type(queue_to_check)
        key_type_str = key_type.decode('utf-8') if isinstance(key_type, bytes) else key_type # Decode if needed
        length = 0
        if key_type_str == 'list':
            length = redis_client.llen(queue_to_check)
            logger.info(f"Redis list '{queue_to_check}' has {length} items.")
        elif key_type_str == 'hash':
            length = redis_client.hlen(queue_to_check)
            logger.info(f"Redis hash '{queue_to_check}' has {length} fields.")
        elif key_type_str == 'none':
            logger.info(f"Redis key '{queue_to_check}' does not exist.")
        else:
            # Attempt to get size for other types if possible, e.g., set size
            try:
                length = redis_client.scard(queue_to_check) # Example for set
                logger.info(f"Redis key '{queue_to_check}' (type: {key_type_str}) has size {length}.")
            except:
                 logger.info(f"Redis key '{queue_to_check}' exists but is of unhandled/unsizeable type '{key_type_str}'.")
        # Optionally push length to XCom if needed downstream
        context['task_instance'].xcom_push(key='queue_key_type', value=key_type_str)
        context['task_instance'].xcom_push(key='queue_size', value=length)
        return {'key': queue_to_check, 'type': key_type_str, 'size': length} # Return status info
    except Exception as e:
        logger.error(f"Failed to check status of Redis key '{queue_to_check}': {e}", exc_info=True)
        raise AirflowException(f"Failed to check Redis key status: {e}")
 # --- DAG Definition ---
 default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(seconds=30),
    'start_date': days_ago(1)
 }
 with DAG(
    dag_id='ytdlp_mgmt_queue_check_status',
    default_args=default_args,
    schedule_interval=None, # Manually triggered
    catchup=False,
    description='Manually check the type and size of a specific YTDLP Redis queue/key.',
    tags=['ytdlp', 'queue', 'management', 'redis', 'manual', 'status'],
    params={
        'redis_conn_id': Param(DEFAULT_REDIS_CONN_ID, type="string", description="Airflow Redis connection ID."),
        'queue_to_check': Param(
            DEFAULT_QUEUE_TO_CHECK,
            type="string",
            description="Exact name of the Redis key to check (e.g., 'video_queue_inbox_account_xyz', 'video_queue_progress', 'video_queue_result', 'video_queue_fail')."
        ),
    }
 ) as dag:
    check_status_task = PythonOperator(
        task_id='check_specified_queue_status',
        python_callable=check_status_callable,
        # Params are implicitly passed via context['params']
    )
    check_status_task.doc_md = """
    ### Check Specified Queue/Key Status Task
    Checks the type and size (length for lists, number of fields for hashes) of the Redis key specified by `queue_to_check`.
    Logs the result and pushes `queue_key_type` and `queue_size` to XCom.
    Can check keys like:
    - `_inbox` (Redis List)
    - `_progress` (Redis Hash)
    - `_result` (Redis Hash)
    - `_fail` (Redis Hash)
    *Trigger this task manually via the UI.*
    """
--- a/dags/ytdlp_mgmt_queue_clear.py
+++ b/dags/ytdlp_mgmt_queue_clear.py
@ -0,0 +1,113 @@
 # -*- coding: utf-8 -*-
 # vim:fenc=utf-8
 #
 # Copyright © 2024 rl <rl@rlmbp>
 #
 # Distributed under terms of the MIT license.
 """
 Airflow DAG for manually clearing (deleting) a specific Redis key used by YTDLP queues.
 """
 from airflow import DAG
 from airflow.exceptions import AirflowException
 from airflow.models.param import Param
 from airflow.operators.python import PythonOperator
 from airflow.providers.redis.hooks.redis import RedisHook
 from airflow.utils.dates import days_ago
 from datetime import timedelta
 import logging
 import redis # Import redis exceptions if needed
 # Configure logging
 logger = logging.getLogger(__name__)
 # Default settings
 DEFAULT_REDIS_CONN_ID = 'redis_default'
 # Provide a placeholder default, user MUST specify the queue to clear
 DEFAULT_QUEUE_TO_CLEAR = 'PLEASE_SPECIFY_QUEUE_TO_CLEAR'
 # --- Helper Function ---
 def _get_redis_client(redis_conn_id):
    """Gets a Redis client connection using RedisHook."""
    try:
        hook = RedisHook(redis_conn_id=redis_conn_id)
        client = hook.get_conn()
        client.ping()
        logger.info(f"Successfully connected to Redis using connection '{redis_conn_id}'.")
        return client
    except redis.exceptions.AuthenticationError:
        logger.error(f"Redis authentication failed for connection '{redis_conn_id}'. Check password.")
        raise AirflowException(f"Redis authentication failed for '{redis_conn_id}'.")
    except Exception as e:
        logger.error(f"Failed to get Redis client for connection '{redis_conn_id}': {e}")
        raise AirflowException(f"Redis connection failed for '{redis_conn_id}': {e}")
 # --- Python Callable for Clear Task ---
 def clear_queue_callable(**context):
    """Clears (deletes) the specified Redis key (queue/hash)."""
    params = context['params']
    redis_conn_id = params['redis_conn_id']
    queue_to_clear = params['queue_to_clear'] # Specific queue/hash name
    if not queue_to_clear or queue_to_clear == DEFAULT_QUEUE_TO_CLEAR:
        raise ValueError("Parameter 'queue_to_clear' must be specified and cannot be the default placeholder.")
    logger.info(f"Attempting to clear Redis key '{queue_to_clear}' using connection '{redis_conn_id}'.")
    try:
        redis_client = _get_redis_client(redis_conn_id)
        deleted_count = redis_client.delete(queue_to_clear)
        if deleted_count > 0:
            logger.info(f"Successfully cleared Redis key '{queue_to_clear}'.")
        else:
            logger.info(f"Redis key '{queue_to_clear}' did not exist or was already empty.")
    except Exception as e:
        logger.error(f"Failed to clear Redis key '{queue_to_clear}': {e}", exc_info=True)
        raise AirflowException(f"Failed to clear Redis key: {e}")
 # --- DAG Definition ---
 default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 0, # No retries for manual clear operation
    'start_date': days_ago(1)
 }
 with DAG(
    dag_id='ytdlp_mgmt_queue_clear',
    default_args=default_args,
    schedule_interval=None, # Manually triggered
    catchup=False,
    description='Manually clear/delete a specific YTDLP Redis queue/key (inbox, progress, result, fail). Use with caution!',
    tags=['ytdlp', 'queue', 'management', 'redis', 'manual', 'clear'],
    params={
        'redis_conn_id': Param(DEFAULT_REDIS_CONN_ID, type="string", description="Airflow Redis connection ID."),
        'queue_to_clear': Param(
            DEFAULT_QUEUE_TO_CLEAR,
            type="string",
            description="Exact name of the Redis key to clear (e.g., 'video_queue_inbox_account_xyz', 'video_queue_progress', 'video_queue_result', 'video_queue_fail')."
        ),
    }
 ) as dag:
    clear_queue_task = PythonOperator(
        task_id='clear_specified_queue',
        python_callable=clear_queue_callable,
        # Params are implicitly passed via context['params']
    )
    clear_queue_task.doc_md = """
    ### Clear Specified Queue/Key Task
    Deletes the Redis key specified by the `queue_to_clear` parameter.
    This can target any key, including:
    - `_inbox` (Redis List): Contains URLs waiting to be processed.
    - `_progress` (Redis Hash): Contains URLs currently being processed.
    - `_result` (Redis Hash): Contains details of successfully processed URLs.
    - `_fail` (Redis Hash): Contains details of failed URLs.
    **Warning:** This operation is destructive and cannot be undone. Ensure you specify the correct key name.
    *Trigger this task manually via the UI.*
    """
--- a/dags/ytdlp_mgmt_queue_list_contents.py
+++ b/dags/ytdlp_mgmt_queue_list_contents.py
@ -0,0 +1,163 @@
 # -*- coding: utf-8 -*-
 # vim:fenc=utf-8
 #
 # Copyright © 2024 rl <rl@rlmbp>
 #
 # Distributed under terms of the MIT license.
 """
 Airflow DAG for manually listing the contents of a specific Redis key used by YTDLP queues.
 """
 from airflow import DAG
 from airflow.exceptions import AirflowException
 from airflow.models.param import Param
 from airflow.operators.python import PythonOperator
 from airflow.providers.redis.hooks.redis import RedisHook
 from airflow.utils.dates import days_ago
 from datetime import timedelta
 import logging
 import json
 import redis # Import redis exceptions if needed
 # Configure logging
 logger = logging.getLogger(__name__)
 # Default settings
 DEFAULT_REDIS_CONN_ID = 'redis_default'
 # Default to a common inbox pattern, user should override with the specific key
 DEFAULT_QUEUE_TO_LIST = 'video_queue_inbox'
 DEFAULT_MAX_ITEMS = 100 # Limit number of items listed by default
 # --- Helper Function ---
 def _get_redis_client(redis_conn_id):
    """Gets a Redis client connection using RedisHook."""
    try:
        hook = RedisHook(redis_conn_id=redis_conn_id)
        # decode_responses=True removed as it's not supported by get_conn in some environments
        # We will decode manually where needed.
        client = hook.get_conn()
        client.ping()
        logger.info(f"Successfully connected to Redis using connection '{redis_conn_id}'.")
        return client
    except redis.exceptions.AuthenticationError:
        logger.error(f"Redis authentication failed for connection '{redis_conn_id}'. Check password.")
        raise AirflowException(f"Redis authentication failed for '{redis_conn_id}'.")
    except Exception as e:
        logger.error(f"Failed to get Redis client for connection '{redis_conn_id}': {e}")
        raise AirflowException(f"Redis connection failed for '{redis_conn_id}': {e}")
 # --- Python Callable for List Contents Task ---
 def list_contents_callable(**context):
    """Lists the contents of the specified Redis key (list or hash)."""
    params = context['params']
    redis_conn_id = params['redis_conn_id']
    queue_to_list = params['queue_to_list']
    max_items = params.get('max_items', DEFAULT_MAX_ITEMS)
    if not queue_to_list:
        raise ValueError("Parameter 'queue_to_list' cannot be empty.")
    logger.info(f"Attempting to list contents of Redis key '{queue_to_list}' (max: {max_items}) using connection '{redis_conn_id}'.")
    try:
        redis_client = _get_redis_client(redis_conn_id)
        key_type_bytes = redis_client.type(queue_to_list)
        key_type = key_type_bytes.decode('utf-8') # Decode type
        if key_type == 'list':
            list_length = redis_client.llen(queue_to_list)
            # Get range, respecting max_items (0 to max_items-1)
            items_to_fetch = min(max_items, list_length)
            # lrange returns list of bytes, decode each item
            contents_bytes = redis_client.lrange(queue_to_list, 0, items_to_fetch - 1)
            contents = [item.decode('utf-8') for item in contents_bytes]
            logger.info(f"--- Contents of Redis List '{queue_to_list}' (showing first {len(contents)} of {list_length}) ---")
            for i, item in enumerate(contents):
                logger.info(f"  [{i}]: {item}") # item is now a string
            if list_length > len(contents):
                logger.info(f"  ... ({list_length - len(contents)} more items not shown)")
            logger.info(f"--- End of List Contents ---")
            # Optionally push contents to XCom if small enough
            # context['task_instance'].xcom_push(key='list_contents', value=contents)
        elif key_type == 'hash':
            hash_size = redis_client.hlen(queue_to_list)
            # HGETALL can be risky for large hashes. Consider HSCAN for production.
            # For manual inspection, HGETALL is often acceptable.
            if hash_size > max_items * 2: # Heuristic: avoid huge HGETALL
                 logger.warning(f"Hash '{queue_to_list}' has {hash_size} fields, which is large. Listing might be slow or incomplete. Consider using redis-cli HSCAN.")
                 # Optionally implement HSCAN here for large hashes
            # hgetall returns dict of bytes keys and bytes values, decode them
            contents_bytes = redis_client.hgetall(queue_to_list)
            contents = {k.decode('utf-8'): v.decode('utf-8') for k, v in contents_bytes.items()}
            logger.info(f"--- Contents of Redis Hash '{queue_to_list}' ({len(contents)} fields) ---")
            item_count = 0
            for key, value in contents.items(): # key and value are now strings
                if item_count >= max_items:
                    logger.info(f"  ... (stopped listing after {max_items} items of {hash_size})")
                    break
                # Attempt to pretty-print if value is JSON
                try:
                    parsed_value = json.loads(value)
                    pretty_value = json.dumps(parsed_value, indent=2)
                    logger.info(f"  '{key}':\n{pretty_value}")
                except json.JSONDecodeError:
                    logger.info(f"  '{key}': {value}") # Print as string if not JSON
                item_count += 1
            logger.info(f"--- End of Hash Contents ---")
            # Optionally push contents to XCom if small enough
            # context['task_instance'].xcom_push(key='hash_contents', value=contents)
        elif key_type == 'none':
            logger.info(f"Redis key '{queue_to_list}' does not exist.")
        else:
            logger.info(f"Redis key '{queue_to_list}' is of type '{key_type}'. Listing contents for this type is not implemented.")
    except Exception as e:
        logger.error(f"Failed to list contents of Redis key '{queue_to_list}': {e}", exc_info=True)
        raise AirflowException(f"Failed to list Redis key contents: {e}")
 # --- DAG Definition ---
 default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 0, # No retries for manual list operation
    'start_date': days_ago(1)
 }
 with DAG(
    dag_id='ytdlp_mgmt_queue_list_contents',
    default_args=default_args,
    schedule_interval=None, # Manually triggered
    catchup=False,
    description='Manually list the contents of a specific YTDLP Redis queue/key (list or hash).',
    tags=['ytdlp', 'queue', 'management', 'redis', 'manual', 'list'],
    params={
        'redis_conn_id': Param(DEFAULT_REDIS_CONN_ID, type="string", description="Airflow Redis connection ID."),
        'queue_to_list': Param(
            DEFAULT_QUEUE_TO_LIST,
            type="string",
            description="Exact name of the Redis key (list/hash) to list contents for (e.g., 'video_queue_inbox_account_xyz', 'video_queue_progress', etc.)."
        ),
        'max_items': Param(DEFAULT_MAX_ITEMS, type="integer", description="Maximum number of items/fields to list from the key."),
    }
 ) as dag:
    list_contents_task = PythonOperator(
        task_id='list_specified_queue_contents',
        python_callable=list_contents_callable,
        # Params are implicitly passed via context['params']
    )
    list_contents_task.doc_md = """
    ### List Specified Queue/Key Contents Task
    Lists the contents of the Redis key specified by `queue_to_list`.
    - For **Lists** (e.g., `_inbox`), shows the first `max_items`.
    - For **Hashes** (e.g., `_progress`, `_result`, `_fail`), shows up to `max_items` key-value pairs. Attempts to pretty-print JSON values.
    - Logs a warning for very large hashes.
    *Trigger this task manually via the UI.*
    """
--- a/dags/ytdlp_proc_sequential_processor.py
+++ b/dags/ytdlp_proc_sequential_processor.py
@ -0,0 +1,910 @@
 # -*- coding: utf-8 -*-
 # vim:fenc=utf-8
 #
 # Copyright © 2024 rl <rl@rlmbp>
 #
 # Distributed under terms of the MIT license.
 """
 DAG for processing YouTube URLs sequentially from a Redis queue using YTDLP Ops Thrift service.
 """
 from airflow import DAG
 from airflow.exceptions import AirflowException, AirflowSkipException, AirflowFailException
 from airflow.hooks.base import BaseHook
 from airflow.models import BaseOperator, Variable
 from airflow.models.param import Param
 from airflow.operators.bash import BashOperator # Import BashOperator
 from airflow.operators.python import PythonOperator
 from airflow.providers.redis.hooks.redis import RedisHook
 from airflow.utils.dates import days_ago
 from airflow.utils.decorators import apply_defaults
 from datetime import datetime, timedelta
 from pangramia.yt.common.ttypes import TokenUpdateMode
 from pangramia.yt.exceptions.ttypes import PBServiceException
 from pangramia.yt.tokens_ops import YTTokenOpService
 from thrift.protocol import TBinaryProtocol
 from thrift.transport import TSocket, TTransport
 from thrift.transport.TTransport import TTransportException
 import json
 import logging
 import os
 import redis # Import redis exceptions if needed
 import socket
 import time
 import traceback # For logging stack traces in failure handler
 # Configure logging
 logger = logging.getLogger(__name__)
 # Default settings
 DEFAULT_QUEUE_NAME = 'video_queue' # Base name for queues
 DEFAULT_REDIS_CONN_ID = 'redis_default'
 DEFAULT_TIMEOUT = 30 # Default Thrift timeout in seconds
 MAX_RETRIES_REDIS_LOOKUP = 3 # Retries for fetching service details from Redis
 RETRY_DELAY_REDIS_LOOKUP = 10 # Delay (seconds) for Redis lookup retries
 # --- Helper Functions ---
 def _get_redis_client(redis_conn_id):
    """Gets a Redis client connection using RedisHook."""
    try:
        hook = RedisHook(redis_conn_id=redis_conn_id)
        client = hook.get_conn()
        client.ping()
        logger.info(f"Successfully connected to Redis using connection '{redis_conn_id}'.")
        return client
    except redis.exceptions.AuthenticationError:
        logger.error(f"Redis authentication failed for connection '{redis_conn_id}'. Check password.")
        raise AirflowException(f"Redis authentication failed for '{redis_conn_id}'.")
    except Exception as e:
        logger.error(f"Failed to get Redis client for connection '{redis_conn_id}': {e}")
        raise AirflowException(f"Redis connection failed for '{redis_conn_id}': {e}")
 def _extract_video_id(url):
    """Extracts YouTube video ID from URL."""
    if not url or not isinstance(url, str):
        logger.debug("URL is empty or not a string, cannot extract video ID.")
        return None
    try:
        video_id = None
        if 'youtube.com/watch?v=' in url:
            video_id = url.split('v=')[1].split('&')[0]
        elif 'youtu.be/' in url:
            video_id = url.split('youtu.be/')[1].split('?')[0]
        if video_id and len(video_id) >= 11:
             video_id = video_id[:11] # Standard ID length
             logger.debug(f"Extracted video ID '{video_id}' from URL: {url}")
             return video_id
        else:
             logger.debug(f"Could not extract a standard video ID pattern from URL: {url}")
             return None
    except Exception as e:
        logger.error(f"Failed to extract video ID from URL '{url}'. Error: {e}")
        return None
 # --- Queue Management Callables ---
 def pop_url_from_queue(**context):
    """Pops a URL from the inbox queue and pushes to XCom."""
    params = context['params']
    queue_name = params['queue_name']
    inbox_queue = f"{queue_name}_inbox"
    redis_conn_id = params.get('redis_conn_id', DEFAULT_REDIS_CONN_ID)
    logger.info(f"Attempting to pop URL from inbox queue: {inbox_queue}")
    try:
        client = _get_redis_client(redis_conn_id)
        # LPOP is non-blocking, returns None if empty
        url_bytes = client.lpop(inbox_queue) # Returns bytes if decode_responses=False on hook/client
        if url_bytes:
            url = url_bytes.decode('utf-8') if isinstance(url_bytes, bytes) else url_bytes
            logger.info(f"Popped URL: {url}")
            context['task_instance'].xcom_push(key='current_url', value=url)
            return url # Return URL for logging/potential use
        else:
            logger.info(f"Inbox queue '{inbox_queue}' is empty. Skipping downstream tasks.")
            context['task_instance'].xcom_push(key='current_url', value=None)
            # Raise AirflowSkipException to signal downstream tasks to skip
            raise AirflowSkipException(f"Inbox queue '{inbox_queue}' is empty.")
    except AirflowSkipException:
        raise # Re-raise skip exception
    except Exception as e:
        logger.error(f"Error popping URL from Redis queue '{inbox_queue}': {e}", exc_info=True)
        raise AirflowException(f"Failed to pop URL from Redis: {e}")
 def move_url_to_progress(**context):
    """Moves the current URL from XCom to the progress hash."""
    ti = context['task_instance']
    url = ti.xcom_pull(task_ids='pop_url_from_queue', key='current_url')
    # This task should be skipped if pop_url_from_queue raised AirflowSkipException
    # Adding check for robustness
    if not url:
        logger.info("No URL found in XCom (or upstream skipped). Skipping move to progress.")
        raise AirflowSkipException("No URL to process.")
    params = context['params']
    queue_name = params['queue_name']
    progress_queue = f"{queue_name}_progress"
    redis_conn_id = params.get('redis_conn_id', DEFAULT_REDIS_CONN_ID)
    logger.info(f"Moving URL '{url}' to progress hash: {progress_queue}")
    progress_data = {
        'status': 'processing',
        'start_time': time.time(),
        'dag_run_id': context['dag_run'].run_id,
        'task_instance_key_str': context['task_instance_key_str']
    }
    try:
        client = _get_redis_client(redis_conn_id)
        client.hset(progress_queue, url, json.dumps(progress_data))
        logger.info(f"Moved URL '{url}' to progress hash '{progress_queue}'.")
    except Exception as e:
        logger.error(f"Error moving URL to Redis progress hash '{progress_queue}': {e}", exc_info=True)
        # If this fails, the URL is popped but not tracked as processing. Fail the task.
        raise AirflowException(f"Failed to move URL to progress hash: {e}")
 def handle_success(**context):
    """Moves URL from progress to result hash on success."""
    ti = context['task_instance']
    url = ti.xcom_pull(task_ids='pop_url_from_queue', key='current_url')
    if not url:
        logger.warning("handle_success called but no URL found from pop_url_from_queue XCom. This shouldn't happen on success path.")
        return # Or raise error
    params = context['params']
    queue_name = params['queue_name']
    progress_queue = f"{queue_name}_progress"
    result_queue = f"{queue_name}_result"
    redis_conn_id = params.get('redis_conn_id', DEFAULT_REDIS_CONN_ID)
    # Pull results from get_token task
    info_json_path = ti.xcom_pull(task_ids='get_token', key='info_json_path')
    socks_proxy = ti.xcom_pull(task_ids='get_token', key='socks_proxy')
    ytdlp_command = ti.xcom_pull(task_ids='get_token', key='ytdlp_command') # Original command
    logger.info(f"Handling success for URL: {url}")
    logger.info(f"  Info JSON Path: {info_json_path}")
    logger.info(f"  SOCKS Proxy: {socks_proxy}")
    logger.info(f"  YTDLP Command: {ytdlp_command[:100] if ytdlp_command else 'None'}...") # Log truncated command
    result_data = {
        'status': 'success',
        'end_time': time.time(),
        'info_json_path': info_json_path,
        'socks_proxy': socks_proxy,
        'ytdlp_command': ytdlp_command,
        'url': url,
        'dag_run_id': context['dag_run'].run_id,
        'task_instance_key_str': context['task_instance_key_str'] # Record which task instance succeeded
    }
    try:
        client = _get_redis_client(redis_conn_id)
        # Remove from progress hash
        removed_count = client.hdel(progress_queue, url)
        if removed_count > 0:
             logger.info(f"Removed URL '{url}' from progress hash '{progress_queue}'.")
        else:
             logger.warning(f"URL '{url}' not found in progress hash '{progress_queue}' during success handling.")
        # Add to result hash
        client.hset(result_queue, url, json.dumps(result_data))
        logger.info(f"Stored success result for URL '{url}' in result hash '{result_queue}'.")
    except Exception as e:
        logger.error(f"Error handling success in Redis for URL '{url}': {e}", exc_info=True)
        # Even if Redis fails, the task succeeded. Log error but don't fail the task.
        # Consider adding retry logic for Redis operations here or marking state differently.
 def handle_failure(**context):
    """Moves URL from progress to fail hash on failure."""
    ti = context['task_instance']
    url = ti.xcom_pull(task_ids='pop_url_from_queue', key='current_url')
    if not url:
        logger.error("handle_failure called but no URL found from pop_url_from_queue XCom.")
        # Cannot move to fail queue if URL is unknown
        return
    params = context['params']
    queue_name = params['queue_name']
    progress_queue = f"{queue_name}_progress"
    fail_queue = f"{queue_name}_fail"
    redis_conn_id = params.get('redis_conn_id', DEFAULT_REDIS_CONN_ID)
    # Get failure reason from the exception context
    exception = context.get('exception')
    error_message = str(exception) if exception else "Unknown error"
    # Get traceback if available
    tb_str = traceback.format_exc() if exception else "No traceback available."
    logger.info(f"Handling failure for URL: {url}")
    logger.error(f"  Failure Reason: {error_message}") # Log the error that triggered failure
    logger.debug(f"  Traceback:\n{tb_str}") # Log traceback at debug level
    fail_data = {
        'status': 'failed',
        'end_time': time.time(),
        'error': error_message,
        'traceback': tb_str, # Store traceback
        'url': url,
        'dag_run_id': context['dag_run'].run_id,
        'task_instance_key_str': context['task_instance_key_str'] # Record which task instance failed
    }
    try:
        client = _get_redis_client(redis_conn_id)
        # Remove from progress hash
        removed_count = client.hdel(progress_queue, url)
        if removed_count > 0:
             logger.info(f"Removed URL '{url}' from progress hash '{progress_queue}'.")
        else:
             logger.warning(f"URL '{url}' not found in progress hash '{progress_queue}' during failure handling.")
        # Add to fail hash
        client.hset(fail_queue, url, json.dumps(fail_data))
        logger.info(f"Stored failure details for URL '{url}' in fail hash '{fail_queue}'.")
    except Exception as e:
        logger.error(f"Error handling failure in Redis for URL '{url}': {e}", exc_info=True)
        # Log error, but the task already failed.
 # --- YtdlpOpsOperator ---
 class YtdlpOpsOperator(BaseOperator):
    """
    Custom Airflow operator to interact with YTDLP Thrift service. Handles direct connections
    and Redis-based discovery, retrieves tokens, saves info.json, and manages errors.
    Modified to pull URL from XCom for sequential processing.
    """
    # Removed 'url' from template_fields as it's pulled from XCom
    template_fields = ('service_ip', 'service_port', 'account_id', 'timeout', 'info_json_dir', 'redis_conn_id')
    @apply_defaults
    def __init__(self,
                 # url parameter removed - will be pulled from XCom
                 redis_conn_id=DEFAULT_REDIS_CONN_ID,
                 max_retries_lookup=MAX_RETRIES_REDIS_LOOKUP,
                 retry_delay_lookup=RETRY_DELAY_REDIS_LOOKUP,
                 service_ip=None,
                 service_port=None,
                 redis_enabled=False, # Default to direct connection now
                 account_id=None,
                 # save_info_json removed, always True
                 info_json_dir=None,
                 # get_socks_proxy removed, always True
                 # store_socks_proxy removed, always True
                 # get_socks_proxy=True, # Removed
                 # store_socks_proxy=True, # Store proxy in XCom by default # Removed
                 timeout=DEFAULT_TIMEOUT,
                 *args, **kwargs):
        super().__init__(*args, **kwargs)
        logger.info(f"Initializing YtdlpOpsOperator (Processor Version) with parameters: "
                    f"redis_conn_id={redis_conn_id}, max_retries_lookup={max_retries_lookup}, retry_delay_lookup={retry_delay_lookup}, "
                    f"service_ip={service_ip}, service_port={service_port}, redis_enabled={redis_enabled}, "
                    f"account_id={account_id}, info_json_dir={info_json_dir}, timeout={timeout}")
                    # save_info_json, get_socks_proxy, store_socks_proxy removed from log
        # Validate parameters based on connection mode
        if redis_enabled:
            # If using Redis, account_id is essential for lookup
            if not account_id:
                raise ValueError("account_id is required when redis_enabled=True for service lookup.")
        else:
            # If direct connection, IP and Port are essential
            if not service_ip or not service_port:
                raise ValueError("Both service_ip and service_port must be specified when redis_enabled=False.")
            # Account ID is still needed for the API call itself, but rely on DAG param or operator config
            if not account_id:
                logger.warning("No account_id provided for direct connection mode. Ensure it's set in DAG params or operator config.")
                # We won't assign 'default' here, let the value passed during instantiation be used.
        # self.url is no longer needed here
        self.redis_conn_id = redis_conn_id
        self.max_retries_lookup = max_retries_lookup
        self.retry_delay_lookup = int(retry_delay_lookup.total_seconds() if isinstance(retry_delay_lookup, timedelta) else retry_delay_lookup)
        self.service_ip = service_ip
        self.service_port = service_port
        self.redis_enabled = redis_enabled
        self.account_id = account_id
        # self.save_info_json removed
        self.info_json_dir = info_json_dir # Still needed
        # self.get_socks_proxy removed
        # self.store_socks_proxy removed
        self.timeout = timeout
    def execute(self, context):
        logger.info("Executing YtdlpOpsOperator (Processor Version)")
        transport = None
        ti = context['task_instance'] # Get task instance for XCom access
        try:
            # --- Get URL from XCom ---
            url = ti.xcom_pull(task_ids='pop_url_from_queue', key='current_url')
            if not url:
                # This should ideally be caught by upstream skip, but handle defensively
                logger.info("No URL found in XCom from pop_url_from_queue. Skipping execution.")
                raise AirflowSkipException("Upstream task did not provide a URL.")
            logger.info(f"Processing URL from XCom: {url}")
            # --- End Get URL ---
            logger.info("Getting task parameters and rendering templates")
            params = context['params'] # DAG run params
            # Render template fields using context
            # Use render_template_as_native for better type handling if needed, else render_template
            redis_conn_id = self.render_template(self.redis_conn_id, context)
            service_ip = self.render_template(self.service_ip, context)
            service_port_rendered = self.render_template(self.service_port, context)
            account_id = self.render_template(self.account_id, context)
            timeout_rendered = self.render_template(self.timeout, context)
            info_json_dir = self.render_template(self.info_json_dir, context) # Rendered here for _save_info_json
            # Determine effective settings (DAG params override operator defaults)
            redis_enabled = params.get('redis_enabled', self.redis_enabled)
            account_id = params.get('account_id', account_id) # Use DAG param if provided
            redis_conn_id = params.get('redis_conn_id', redis_conn_id) # Use DAG param if provided
            logger.info(f"Effective settings: redis_enabled={redis_enabled}, account_id='{account_id}', redis_conn_id='{redis_conn_id}'")
            host = None
            port = None
            if redis_enabled:
                # Get Redis connection using the helper for consistency
                redis_client = _get_redis_client(redis_conn_id)
                logger.info(f"Successfully connected to Redis using connection '{redis_conn_id}' for service discovery.")
                # Get service details from Redis with retries
                service_key = f"ytdlp:{account_id}"
                legacy_key = account_id # For backward compatibility
                for attempt in range(self.max_retries_lookup):
                    try:
                        logger.info(f"Attempt {attempt + 1}/{self.max_retries_lookup}: Fetching service details from Redis for keys: '{service_key}', '{legacy_key}'")
                        service_details = redis_client.hgetall(service_key)
                        if not service_details:
                            logger.warning(f"Key '{service_key}' not found, trying legacy key '{legacy_key}'")
                            service_details = redis_client.hgetall(legacy_key)
                        if not service_details:
                            raise ValueError(f"No service details found in Redis for keys: {service_key} or {legacy_key}")
                        # Find IP and port (case-insensitive keys)
                        ip_key = next((k for k in service_details if k.lower() == 'ip'), None)
                        port_key = next((k for k in service_details if k.lower() == 'port'), None)
                        if not ip_key: raise ValueError(f"'ip' key not found in Redis hash for {service_key}/{legacy_key}")
                        if not port_key: raise ValueError(f"'port' key not found in Redis hash for {service_key}/{legacy_key}")
                        host = service_details[ip_key] # Assumes decode_responses=True in hook
                        port_str = service_details[port_key]
                        try:
                            port = int(port_str)
                        except (ValueError, TypeError):
                            raise ValueError(f"Invalid port value '{port_str}' found in Redis for {service_key}/{legacy_key}")
                        logger.info(f"Extracted from Redis - Service IP: {host}, Service Port: {port}")
                        break # Success
                    except Exception as e:
                        logger.warning(f"Attempt {attempt + 1} failed to get Redis details: {str(e)}")
                        if attempt == self.max_retries_lookup - 1:
                            logger.error("Max retries reached for fetching Redis details.")
                            raise AirflowException(f"Failed to get service details from Redis after {self.max_retries_lookup} attempts: {e}")
                        logger.info(f"Retrying in {self.retry_delay_lookup} seconds...")
                        time.sleep(self.retry_delay_lookup)
            else:
                # Direct connection: Use rendered/param values
                host = params.get('service_ip', service_ip) # Use DAG param if provided
                port_str = params.get('service_port', service_port_rendered) # Use DAG param if provided
                logger.info(f"Using direct connection settings: service_ip={host}, service_port={port_str}")
                if not host or not port_str:
                     raise ValueError("Direct connection requires service_ip and service_port (check Operator config and DAG params)")
                try:
                    port = int(port_str)
                except (ValueError, TypeError):
                     raise ValueError(f"Invalid service_port value: {port_str}")
                logger.info(f"Connecting directly to Thrift service at {host}:{port} (Redis bypassed)")
            # Validate and use timeout
            try:
                timeout = int(timeout_rendered)
                if timeout <= 0: raise ValueError("Timeout must be positive")
                logger.info(f"Using timeout: {timeout} seconds")
            except (ValueError, TypeError):
                logger.warning(f"Invalid timeout value: '{timeout_rendered}'. Using default: {DEFAULT_TIMEOUT}")
                timeout = DEFAULT_TIMEOUT
            # Create Thrift connection objects
            # socket_conn = TSocket.TSocket(host, port) # Original
            socket_conn = TSocket.TSocket(host, port, socket_family=socket.AF_INET) # Explicitly use AF_INET (IPv4)
            socket_conn.setTimeout(timeout * 1000) # Thrift timeout is in milliseconds
            transport = TTransport.TFramedTransport(socket_conn) # Use TFramedTransport if server expects it
            # transport = TTransport.TBufferedTransport(socket_conn) # Use TBufferedTransport if server expects it
            protocol = TBinaryProtocol.TBinaryProtocol(transport)
            client = YTTokenOpService.Client(protocol)
            logger.info(f"Attempting to connect to Thrift server at {host}:{port}...")
            try:
                transport.open()
                logger.info("Successfully connected to Thrift server.")
                # Test connection with ping
                try:
                    client.ping()
                    logger.info("Server ping successful.")
                except Exception as e:
                    logger.error(f"Server ping failed: {e}")
                    raise AirflowException(f"Server connection test (ping) failed: {e}")
                # Get token from service using the URL from XCom
                try:
                    logger.info(f"Requesting token for accountId='{account_id}', url='{url}'")
                    token_data = client.getOrRefreshToken(
                        accountId=account_id,
                        updateType=TokenUpdateMode.AUTO,
                        url=url # Use the url variable from XCom
                    )
                    logger.info("Successfully retrieved token data from service.")
                except PBServiceException as e:
                    # Handle specific service exceptions
                    error_code = getattr(e, 'errorCode', 'N/A')
                    error_message = getattr(e, 'message', 'N/A')
                    error_context = getattr(e, 'context', {})
                    logger.error(f"PBServiceException occurred: Code={error_code}, Message={error_message}")
                    if error_context:
                        logger.error(f"  Context: {error_context}") # Log context separately
                    # Construct a concise error message for AirflowException
                    error_msg = f"YTDLP service error (Code: {error_code}): {error_message}"
                    # Add specific error code handling if needed...
                    logger.error(f"Failing task instance due to PBServiceException: {error_msg}") # Add explicit log before raising
                    raise AirflowException(error_msg) # Fail task on service error
                except TTransportException as e:
                    logger.error(f"Thrift transport error during getOrRefreshToken: {e}")
                    logger.error(f"Failing task instance due to TTransportException: {e}") # Add explicit log before raising
                    raise AirflowException(f"Transport error during API call: {e}")
                except Exception as e:
                    logger.error(f"Unexpected error during getOrRefreshToken: {e}")
                    logger.error(f"Failing task instance due to unexpected error during API call: {e}") # Add explicit log before raising
                    raise AirflowException(f"Unexpected error during API call: {e}")
            except TTransportException as e:
                # Handle connection errors
                logger.error(f"Thrift transport error during connection: {str(e)}")
                logger.error(f"Failing task instance due to TTransportException during connection: {e}") # Add explicit log before raising
                raise AirflowException(f"Transport error connecting to YTDLP service: {str(e)}")
            # Removed the overly broad except Exception block here, as inner blocks raise AirflowException
            # --- Process Token Data ---
            logger.debug(f"Token data received. Attributes: {dir(token_data)}")
            info_json_path = None # Initialize
            # save_info_json is now always True
            logger.info("Proceeding to save info.json (save_info_json=True).")
            info_json = self._get_info_json(token_data)
            if info_json and self._is_valid_json(info_json):
                try:
                    # Pass rendered info_json_dir to helper
                    info_json_path = self._save_info_json(context, info_json, url, account_id, info_json_dir)
                    if info_json_path:
                        ti.xcom_push(key='info_json_path', value=info_json_path)
                        logger.info(f"Successfully saved info.json and pushed path to XCom: {info_json_path}")
                    else:
                        ti.xcom_push(key='info_json_path', value=None)
                        logger.warning("info.json saving failed (check logs from _save_info_json).")
                except Exception as e:
                    logger.error(f"Unexpected error during info.json saving process: {e}", exc_info=True)
                    ti.xcom_push(key='info_json_path', value=None)
            elif info_json:
                logger.warning("Retrieved infoJson is not valid JSON. Skipping save.")
                ti.xcom_push(key='info_json_path', value=None)
            else:
                logger.info("No infoJson found in token data. Skipping save.")
                ti.xcom_push(key='info_json_path', value=None)
            # Extract and potentially store SOCKS proxy
            # get_socks_proxy and store_socks_proxy are now always True
            socks_proxy = None
            logger.info("Attempting to extract SOCKS proxy (get_socks_proxy=True).")
            proxy_attr = next((attr for attr in ['socks5Proxy', 'socksProxy', 'socks'] if hasattr(token_data, attr)), None)
            if proxy_attr:
                socks_proxy = getattr(token_data, proxy_attr)
                if socks_proxy:
                     logger.info(f"Extracted SOCKS proxy ({proxy_attr}): {socks_proxy}")
                     # Always store if found (store_socks_proxy=True)
                     ti.xcom_push(key='socks_proxy', value=socks_proxy)
                     logger.info("Pushed 'socks_proxy' to XCom.")
                else:
                    logger.info(f"Found proxy attribute '{proxy_attr}' but value is empty.")
                    # Store None if attribute found but empty
                    ti.xcom_push(key='socks_proxy', value=None)
                    logger.info("Pushed None to XCom for 'socks_proxy' as extracted value was empty.")
            else:
                logger.info("No SOCKS proxy attribute found in token data.")
                # Store None if attribute not found
                ti.xcom_push(key='socks_proxy', value=None)
                logger.info("Pushed None to XCom for 'socks_proxy' as attribute was not found.")
 # --- Removed old logic block ---
 #            # Extract and potentially store SOCKS proxy
 #            socks_proxy = None
 #            get_socks_proxy = params.get('get_socks_proxy', self.get_socks_proxy)
 #            store_socks_proxy = params.get('store_socks_proxy', self.store_socks_proxy)
 #
 #            if get_socks_proxy:
 #                proxy_attr = next((attr for attr in ['socks5Proxy', 'socksProxy', 'socks'] if hasattr(token_data, attr)), None)
 #                if proxy_attr:
 #                    socks_proxy = getattr(token_data, proxy_attr)
 #                    if socks_proxy:
 #                         logger.info(f"Extracted SOCKS proxy ({proxy_attr}): {socks_proxy}")
 #                         if store_socks_proxy:
 #                             ti.xcom_push(key='socks_proxy', value=socks_proxy)
 #                             logger.info("Pushed 'socks_proxy' to XCom.")
 #                    else:
 #                        logger.info(f"Found proxy attribute '{proxy_attr}' but value is empty.")
 #                        if store_socks_proxy: ti.xcom_push(key='socks_proxy', value=None)
 #                else:
 #                    logger.info("get_socks_proxy is True, but no SOCKS proxy attribute found.")
 #                    if store_socks_proxy: ti.xcom_push(key='socks_proxy', value=None)
 #            else:
 #                logger.info("get_socks_proxy is False. Skipping proxy extraction.")
 #                if store_socks_proxy: ti.xcom_push(key='socks_proxy', value=None)
 # --- End Removed old logic block ---
            # Get the original command from the server
            ytdlp_cmd = getattr(token_data, 'ytdlpCommand', None)
            if not ytdlp_cmd:
                logger.error("No 'ytdlpCommand' attribute found in token data.")
                raise AirflowException("Required 'ytdlpCommand' not received from service.")
            logger.info(f"Original command received from server: {ytdlp_cmd[:100]}...") # Log truncated
            # Push the *original* command to XCom
            ti.xcom_push(key='ytdlp_command', value=ytdlp_cmd)
            logger.info("Pushed original command to XCom key 'ytdlp_command'.")
            # No explicit return needed, success is implicit if no exception raised
        except (AirflowSkipException, AirflowFailException) as e:
             logger.info(f"Task skipped or failed explicitly: {e}")
             raise # Re-raise to let Airflow handle state
        except AirflowException as e: # Catch AirflowExceptions raised explicitly
             logger.error(f"Operation failed due to AirflowException: {e}", exc_info=True)
             raise # Re-raise AirflowExceptions to ensure task failure
        except (TTransportException, PBServiceException) as e: # Catch specific Thrift/Service errors not already handled inside inner try
            logger.error(f"Unhandled YTDLP Service/Transport error in outer block: {e}", exc_info=True)
            logger.error(f"Failing task instance due to unhandled outer Service/Transport error: {e}") # Add explicit log before raising
            raise AirflowException(f"Unhandled YTDLP service error: {e}") # Wrap in AirflowException to fail task
        except Exception as e: # General catch-all for truly unexpected errors
            logger.error(f"Caught unexpected error in YtdlpOpsOperator outer block: {e}", exc_info=True)
            logger.error(f"Failing task instance due to unexpected outer error: {e}") # Add explicit log before raising
            raise AirflowException(f"Unexpected error caused task failure: {e}") # Wrap to fail task
        finally:
            if transport and transport.isOpen():
                logger.info("Closing Thrift transport.")
                transport.close()
    # --- Helper Methods ---
    def _get_info_json(self, token_data):
        """Safely extracts infoJson from token data."""
        return getattr(token_data, 'infoJson', None)
    def _is_valid_json(self, json_str):
        """Checks if a string is valid JSON."""
        if not json_str or not isinstance(json_str, str): return False
        try:
            json.loads(json_str)
            return True
        except json.JSONDecodeError:
            return False
    def _save_info_json(self, context, info_json, url, account_id, rendered_info_json_dir):
        """Saves info_json to a file. Uses pre-rendered directory path."""
        try:
            video_id = _extract_video_id(url) # Use standalone helper
            save_dir = rendered_info_json_dir or "." # Use rendered path
            logger.info(f"Target directory for info.json: {save_dir}")
            # Ensure directory exists
            try:
                os.makedirs(save_dir, exist_ok=True)
                logger.info(f"Ensured directory exists: {save_dir}")
            except OSError as e:
                logger.error(f"Could not create directory {save_dir}: {e}. Cannot save info.json.")
                return None
            # Construct filename
            timestamp = int(time.time())
            base_filename = f"info_{video_id or 'unknown'}_{account_id}_{timestamp}.json"
            info_json_path = os.path.join(save_dir, base_filename)
            latest_json_path = os.path.join(save_dir, "latest.json") # Path for the latest symlink/copy
            # Write to timestamped file
            try:
                logger.info(f"Writing info.json content (received from service) to {info_json_path}...")
                with open(info_json_path, 'w', encoding='utf-8') as f:
                    f.write(info_json)
                logger.info(f"Successfully saved info.json to timestamped file: {info_json_path}")
            except IOError as e:
                 logger.error(f"Failed to write info.json to {info_json_path}: {e}")
                 return None
            # Write to latest.json (overwrite) - best effort
            try:
                with open(latest_json_path, 'w', encoding='utf-8') as f:
                    f.write(info_json)
                logger.info(f"Updated latest.json file: {latest_json_path}")
            except IOError as e:
                logger.warning(f"Failed to update latest.json at {latest_json_path}: {e}")
            return info_json_path
        except Exception as e:
            logger.error(f"Unexpected error in _save_info_json: {e}", exc_info=True)
            return None
 # =============================================================================
 # DAG Definition
 # =============================================================================
 default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1, # Default retries for tasks like queue management
    'retry_delay': timedelta(minutes=1),
    'start_date': days_ago(1),
    # Add concurrency control if needed for sequential processing
    # 'concurrency': 1, # Ensure only one task instance runs at a time per DAG run
    # 'max_active_runs': 1, # Ensure only one DAG run is active
 }
 # Define DAG
 with DAG(
    dag_id='ytdlp_proc_sequential_processor', # New DAG ID
    default_args=default_args,
    schedule_interval=None,     # Manually triggered or triggered by external sensor/event
    catchup=False,
    description='Processes YouTube URLs sequentially from a Redis queue using YTDLP Ops.',
    tags=['ytdlp', 'thrift', 'client', 'sequential', 'queue', 'processor'], # Updated tags
    params={
        # Define DAG parameters
        'queue_name': Param(DEFAULT_QUEUE_NAME, type="string", description="Base name for Redis queues (e.g., 'video_queue' -> video_queue_inbox, video_queue_progress, etc.)."),
        'redis_conn_id': Param(DEFAULT_REDIS_CONN_ID, type="string", description="Airflow Redis connection ID."),
        # YtdlpOpsOperator specific params (can be overridden at task level if needed)
        'redis_enabled': Param(False, type="boolean", description="Use Redis for service discovery? If False, uses service_ip/port."), # Default changed to False
        'service_ip': Param(None, type=["null", "string"], description="Required Service IP if redis_enabled=False."), # Clarified requirement
        'service_port': Param(None, type=["null", "integer"], description="Required Service port if redis_enabled=False."), # Clarified requirement
        'account_id': Param('default_account', type="string", description="Account ID for the API call (used for Redis lookup if redis_enabled=True)."), # Clarified usage
        'timeout': Param(DEFAULT_TIMEOUT, type="integer", description="Timeout in seconds for the Thrift connection."),
        # save_info_json removed, always True
        # get_socks_proxy removed, always True
        # store_socks_proxy removed, always True
        # Download specific parameters
        'download_format': Param(
            # Default to best audio-only format (e.g., m4a)
            'ba[ext=m4a]/bestaudio/best',
            type="string",
            description="yt-dlp format selection string (e.g., 'ba' for best audio, 'wv*+wa/w' for worst video+audio)."
        ),
        'output_path_template': Param(
            # Simplified template, removed queue_name subdir
            "{{ var.value.get('DOWNLOADS_TEMP', '/opt/airflow/downloads') }}/%(title)s [%(id)s].%(ext)s",
            type="string",
            description="yt-dlp output template (e.g., '/path/to/downloads/%(title)s.%(ext)s'). Uses Airflow Variable 'DOWNLOADS_TEMP'."
        ),
        # Simplified info_json_dir, just uses DOWNLOADS_TEMP variable
        'info_json_dir': Param(
            "{{ var.value.get('DOWNLOADS_TEMP', '/opt/airflow/downloadfiles') }}",
            type="string",
            description="Directory to save info.json. Uses Airflow Variable 'DOWNLOADS_TEMP'."
        )
    }
 ) as dag:
    # --- Task Definitions ---
    pop_url = PythonOperator(
        task_id='pop_url_from_queue',
        python_callable=pop_url_from_queue,
        # Params are implicitly passed via context
    )
    pop_url.doc_md = """
    ### Pop URL from Inbox Queue
    Pops the next available URL from the `{{ params.queue_name }}_inbox` Redis list.
    Pushes the URL to XCom key `current_url`.
    If the queue is empty, raises `AirflowSkipException` to skip downstream tasks.
    """
    move_to_progress = PythonOperator(
        task_id='move_url_to_progress',
        python_callable=move_url_to_progress,
        trigger_rule='all_success', # Only run if pop_url succeeded (didn't skip)
    )
    move_to_progress.doc_md = """
    ### Move URL to Progress Hash
    Retrieves the `current_url` from XCom (pushed by `pop_url_from_queue`).
    Adds the URL as a key to the `{{ params.queue_name }}_progress` Redis hash with status 'processing'.
    This task is skipped if `pop_url_from_queue` was skipped.
    """
    # YtdlpOpsOperator task to get the token
    get_token = YtdlpOpsOperator(
        task_id='get_token',
        # Operator params are inherited from DAG params by default,
        # but can be overridden here if needed.
        # We rely on the operator pulling the URL from XCom internally.
        # Pass DAG params explicitly to ensure they are used if overridden
        redis_conn_id="{{ params.redis_conn_id }}",
        redis_enabled="{{ params.redis_enabled }}",
        service_ip="{{ params.service_ip }}",
        service_port="{{ params.service_port }}",
        account_id="{{ params.account_id }}",
        timeout="{{ params.timeout }}",
        # save_info_json removed
        info_json_dir="{{ params.info_json_dir }}", # Pass the simplified path template
        # get_socks_proxy removed
        # store_socks_proxy removed
        retries=0, # Set operator retries to 0; failure handled by branching/failure handler
        trigger_rule='all_success', # Only run if move_to_progress succeeded
    )
    get_token.doc_md = """
    ### Get Token and Info Task
    Connects to the YTDLP Thrift service for the URL pulled from XCom (`current_url`).
    Retrieves token, metadata, command, and potentially proxy. Saves `info.json`.
    Failure of this task triggers the `handle_failure` path.
    Success triggers the `handle_success` path.
    **Pulls from XCom:**
    - `current_url` (from `pop_url_from_queue`) - *Used internally*
    **Pushes to XCom:**
    - `info_json_path`
    - `socks_proxy`
    - `ytdlp_command`
    """
    # Task to perform the actual download using yt-dlp
    # Ensure info_json_path and socks_proxy are correctly quoted within the bash command
    # Use {% raw %} {% endraw %} around Jinja if needed, but direct templating should work here.
    # Added --no-simulate, --no-write-info-json, --ignore-errors, --no-progress
    download_video = BashOperator(
        task_id='download_video',
        bash_command="""
            INFO_JSON_PATH="{{ ti.xcom_pull(task_ids='get_token', key='info_json_path') }}"
            PROXY="{{ ti.xcom_pull(task_ids='get_token', key='socks_proxy') }}"
            FORMAT="{{ params.download_format }}"
            OUTPUT_TEMPLATE="{{ params.output_path_template }}"
            echo "Starting download..."
            echo "Info JSON Path: $INFO_JSON_PATH"
            echo "Proxy: $PROXY"
            echo "Format: $FORMAT"
            echo "Output Template: $OUTPUT_TEMPLATE"
            # Check if info.json path exists
            if [ -z "$INFO_JSON_PATH" ] || [ ! -f "$INFO_JSON_PATH" ]; then
                echo "Error: info.json path is missing or file does not exist ($INFO_JSON_PATH)."
                exit 1
            fi
            # Construct command
            CMD="yt-dlp --load-info-json \"$INFO_JSON_PATH\""
            # Add proxy if it exists
            if [ -n "$PROXY" ]; then
                CMD="$CMD --proxy \"$PROXY\""
            fi
            # Add format and output template
            CMD="$CMD -f \"$FORMAT\" -o \"$OUTPUT_TEMPLATE\""
            # Add other useful flags
            CMD_ARRAY=(yt-dlp --load-info-json "$INFO_JSON_PATH")
            # Add proxy if it exists
            if [ -n "$PROXY" ]; then
                CMD_ARRAY+=(--proxy "$PROXY")
            fi
            # Add format and output template
            CMD_ARRAY+=(-f "$FORMAT" -o "$OUTPUT_TEMPLATE")
            # Add other useful flags
            CMD_ARRAY+=(--no-progress --no-simulate --no-write-info-json --ignore-errors --verbose)
            echo "Executing command array:"
            # Use printf to safely quote and display the command array
            printf "%q " "${CMD_ARRAY[@]}"
            echo "" # Newline after command
            # Execute the command directly using the array
            "${CMD_ARRAY[@]}"
            # Check exit code
            EXIT_CODE=$?
            if [ $EXIT_CODE -ne 0 ]; then
                echo "Error: yt-dlp command failed with exit code $EXIT_CODE"
                exit $EXIT_CODE
            fi
            echo "Download command completed successfully."
        """,
        trigger_rule='all_success', # Run only if get_token succeeded
    )
    download_video.doc_md = """
    ### Download Video/Audio Task
    Executes `yt-dlp` using the `info.json` and proxy obtained from the `get_token` task.
    Uses the `download_format` and `output_path_template` parameters from the DAG run configuration.
    Failure of this task triggers the `handle_failure` path.
    **Pulls from XCom (task_id='get_token'):**
    - `info_json_path`
    - `socks_proxy`
    """
    # Task to handle successful token retrieval AND download
    success_handler = PythonOperator(
        task_id='handle_success',
        python_callable=handle_success,
        trigger_rule='all_success', # Run only if get_token succeeds
    )
    success_handler.doc_md = """
    ### Handle Success Task
    Runs after `get_token` succeeds.
    Retrieves `current_url` and results from `get_token` via XCom.
    Removes the URL from the `{{ params.queue_name }}_progress` hash.
    Adds the URL and results to the `{{ params.queue_name }}_result` hash.
    """
    # Task to handle failed token retrieval or download
    failure_handler = PythonOperator(
        task_id='handle_failure',
        python_callable=handle_failure,
        trigger_rule='one_failed', # Run only if get_token or download_video fails
    )
    failure_handler.doc_md = """
    ### Handle Failure Task
    # Runs after `get_token` (or potentially `move_url_to_progress`) fails.
    # Retrieves `current_url` from XCom.
    # Retrieves the error message and traceback from the context.
    # Removes the URL from the `{{ params.queue_name }}_progress` hash.
    # Adds the URL and error details to the `{{ params.queue_name }}_fail` hash.
    # **Important:** This task succeeding means the failure was *handled*, the DAG run itself might still be marked as failed if `get_token` failed.
    # """
    # --- Task Dependencies ---
    # Core processing flow
    pop_url >> move_to_progress >> get_token >> download_video
    # Handlers depend on the outcome of both token retrieval and download
    # Success handler runs only if download_video succeeds
    download_video >> success_handler # Default trigger_rule='all_success' is suitable
    # Failure handler runs if either get_token or download_video fails
    [get_token, download_video] >> failure_handler # Uses trigger_rule='one_failed' defined in the task
 # Removed Jinja filters as they are no longer needed for the simplified info_json_dir
--- a/docker-compose-ytdlp-ops.yaml
+++ b/docker-compose-ytdlp-ops.yaml
@ -1,15 +1,40 @@
 version: '3.8'
 services:
-  ytdlp-ops:
+  camoufox:
-    image: pangramia/ytdlp-ops-server:latest
+    build:
      context: ./camoufox # Path relative to the docker-compose file
      dockerfile: Dockerfile
    ports:
-      - "9090:9090"
+      # Optionally expose the camoufox port to the host for debugging
-      - "9091:9091"
+      # - "12345:12345"
      - "12345" # Expose port within the docker network, pass in Dockerfile
      - "5900:5900" # Expose VNC port to the host
    networks:
      - airflow_prod_proxynet
    command: [
      "--ws-host", "0.0.0.0",
      "--port", "12345",
      "--ws-path", "mypath",
      "--proxy-url", "socks5://sslocal-rust-1082:1082",
      "--locale", "en-US",
      "--geoip",
      "--extensions", "/app/extensions/google_sign_in_popup_blocker-1.0.2.xpi,/app/extensions/spoof_timezone-0.3.4.xpi,/app/extensions/youtube_ad_auto_skipper-0.6.0.xpi"
    ]
    restart: unless-stopped
    # Add healthcheck if desired
  ytdlp-ops: 
    image: pangramia/ytdlp-ops-server:latest # Don't comment 
    depends_on:
      - camoufox # Ensure camoufox starts first
    ports:
      - "9090:9090" # Main RPC port
      - "9091:9091" # Health check port
    volumes:
      - context-data:/app/context-data
    networks:
-      - airflow_workers_prod_proxynet
+      - airflow_prod_proxynet
    command:
      - "--script-dir"
      - "/app/scripts"
@ -18,10 +43,18 @@ services:
      - "--port"
      - "9090"
      - "--clients"
-      - "ios,android,mweb"
+      # Add 'web' client since we now have camoufox
      - "web,ios,android,mweb"
      - "--proxy"
-      - "socks5://sslocal-rust-1084:1084"
+      - "socks5://sslocal-rust-1082:1082"
      # Add the endpoint argument pointing to the camoufox service
      - "--endpoint"
      - "ws://camoufox:12345/mypath"
      - "--probe"
      # Add --camouflage-only if you don't want ytdlp-ops to manage the browser directly
      - "--camouflage-only"
      # Add flag to print full tokens in logs by default
      - "--print-tokens"
    restart: unless-stopped
    pull_policy: always
@ -30,5 +63,5 @@ volumes:
    name: context-data
 networks:
-  airflow_workers_prod_proxynet:
+  airflow_prod_proxynet:
    external: true
--- a/ytdlp-ops-auth/info_json_vKTVLpmvznI.json
+++ b/ytdlp-ops-auth/info_json_vKTVLpmvznI.json
@ -1 +0,0 @@
 info_json_vKTVLpmvznI_1743507631.json
--- a/ytdlp-ops-auth/info_json_vKTVLpmvznI_1743507631.json
+++ b/ytdlp-ops-auth/info_json_vKTVLpmvznI_1743507631.json
--- a/ytdlp-ops-auth/latest.json
+++ b/ytdlp-ops-auth/latest.json