N2WS 3.1.x - worker reported error: failed processing segment 0 : Failed to load blocks meta-data

N2WS 3.1.x - worker reported error: failed processing segment 0 : Failed to load blocks meta-data

Issue summary

When running copy to S3 on version 3.1 it might fails with below error:


Issue description and troubleshooting

This error is usually caused by communication issues with the EBS Direct API endpoint.

In version 3.1 we have started to use the new EBS Direct API for copy to S3 related operations, this new API let us read the blocks directly from the snapshot and help us reduce cost & time for copy process.

This means that in addition to the internet access required in previous versions(Worker to S3, Worker to N2WS server), you also need to make sure you have communication opened to the EBS Direct API endpoints, missing this might cause the above error.

Troubleshooting:
You need to connect to the worker instance via ssh(user is ubuntu & your private key).
You also have the option of launching a new Linux instance under the same VPC/SecurityGroups then follow the steps listed below.
If you don't know how to connect via SSH, You can see information from AWS on how to connect by clicking on the instance and then connect:


if you connected to the worker instance, then you should turn on termination protection so instance would not get deleted while testing/Troubleshooting.

Note: Don't forget to disable this option when done and delete the worker instance.

once connected,  you can test connectivity by running this curl command:
  1. curl -v https://ebs.us-east-1.amazonaws.com
Note: You need to replace us-east-1 with the relevant region,  the region to choose is the one where the snapshot to copy is located
you can see list of endpoints here : Link - Endpoints for the EBS direct APIs

If you get timeout, you need to check your VPC/SecurityGroup settings for what is blocking the communication, Different networks might have different reasons for the timeout.


For the test instance used for this KB, It was resolved by doing the following:
1. We added EBS VPC endpoint to the worker VPC.


2. Updating the Security Group attached to the endpoint with inbound role

Note: The security group above was attached only to the endpoint.

Once you update your network so that the CURL commands works, you can try to run Copy to S3 again.



In addition to above steps, if you are connected to the worker you can also view the worker logs:
  1. vi /var/log/cpm/c2s3_log.log
You will probably see this error, which shows again that this is a communication error:
urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f5d6abf9b38>, 'Connection to ebs.us-east-1.amazonaws.com timed out. (connect timeout=20)')


You can find General information about the required Internet access in this KB Article:
  1. Internet access required for CPM operations