"Worker did not establish connection" and "worker did not complete initializing" errors during S3 and FLR

"Worker did not establish connection" and "worker did not complete initializing" errors during S3 and FLR

During S3 operations, you may encounter the message "Worker i-... did not establish connection" in the log of an S3 copy or S3 restore operation.
Error - Worker i-1234567890abcdef did not establish connection - terminating operation

During File Level Restore operations (which starting with v2.6 requires a worker too), you may encounter a "worker i-... did not complete initializing" in the CPM Server log:
ERROR:  execute(mount_volumes.py:180)  worker i-1234567890abcdef did not complete initializing

Both of these errors are caused by lack of connectivity between the Worker and CPM Server.


Please check the following:


1. The worker appliance security group settings, the following must be allowed for both backup and restore:

           OUTBOUND HTTPS (port 443):     
           OUTBOUND SSH (port 22) - for File Level Restore workers only:     
  • To the CPM server private or public IP.
      Note 1: Amazon regularly rotates public gateway IPs, check current IP information here: https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

      Note 2: If restoring from S3 to a different region, select a security group at the target region which allows outbound connections to the source region S3 bucket. Please see 4. below as well.

2. The CPM server security group settings:

           INBOUND HTTPS (port 443):
    • To the subnet the worker is configured to use.
           INBOUND SSH (port 22) - for File Level Restore workers only:
    • To the subnet the worker is configured to use.
3. If using an ACL, be sure the proper ports are open for bi-directional communication between the worker and S3: https://aws.amazon.com/premiumsupport/knowledge-center/connect-s3-vpc-endpoint/
 Also be sure access to the CPM server is allowed via port 443 where necessary.

4. The worker will connect to S3 via internet, even if the CPM server and S3 bucket are in the same region.

5. The worker must be able to resolve the DNS name of the region with your S3 bucket. (e.g. https://s3.us-west-2.amazonaws.com/). Please ensure the worker will be able to resolve DNS names of S3 endpoints.

6. If an HTTP proxy is necessary to access public IPs, ensure this is properly configured in the worker settings and not blocking the transfer to s3. It may be advisable to test by opening internet to the worker without a proxy in some cases to be sure the proxy is not causing an issue.

Testing connectivity from the worker to CPM

To test a proper connectivity from the worker to CPM, you can run the following commands from the worker (put CPM's IP or hostname instead of "CPMIP"):

wget --no-check-certificate https://cpmip/
This command should result in status 302 (redirecting to "/signin/") followed by 200

ssh cpmuser@CPMIP
This command should result in "Permission denied (publickey)". This is only needed for File Level Restore workers, not S3.

Important: In order to be able to login to the worker instance over SSH (using "ubuntu" username) you have to make sure that you have configured the workers to use a key pair:



You also have this KB which show how to connect to a worker: How To Test Connectivity from a CPM Worker to AWS endpoints

Diagram:

Below diagram is aimed to help illustrate the setup of the worker, This is just one example and might change based on your configuration/settings.

For exact technical details, please see above KB instruction and our User Guide.

 



    • Related Articles

    • How to test the worker configuration from UI

      Background: This document explains the steps to test a CPM Worker configuration. This can help reduce errors during S3 copy and File-Level restores by being able to confirm the settings used for these jobs are successfully able to connect. Worker ...
    • How to retrieve logs from a CPM AWS Worker instance

      Linux & AWS knowledge is required Please read the entire KB before starting. N2WS uses temporary EC2 worker instances for several operations (copy to S3, FLR, etc), In cases where a worker is failing before it could communicate with the main server, ...
    • FLR or Copy to S3 with Exception: could not assume role

      Issues: When doing file level recovery(FLR) or copy to S3 operation in the same account, N2WS might need to assume its own role to generate a token for the worker, this could lead to the below error ERROR: ...
    • Copy backup to S3 may fail with the error message "Workers could not be launched"

      During backup copy to S3 a worker creation may fail with the error (backup.log): - Error - Workers could not be launched The worker usually fails right after it was started. There are several possible issues that gives you this error. 1) The was an ...
    • N2WS-21382 - S3 copy worker issues

      Issues: 1. In some rare cases, worker's volumes are not removed after worker instance is terminated 2. Unnecessary alert is raised when none default worker type is used: Worker i-12345 was launched for FLR as t2.micro, which may not an optimal ...