How to retrieve logs from a CPM AWS Worker instance

How to retrieve logs from a CPM AWS Worker instance

Linux & AWS knowledge is required 
Please read the entire KB before starting.

N2WS uses temporary EC2 worker instances for several operations (copy to S3, FLR, etc),  In cases where a worker is failing before it could communicate with the main server, the logs will not be automatically copied to the main server. in such cases support may request logs directly from the EC2 worker instance.

These logs can be collected in two ways
  1. by setting termination protection on a running EC2 worker instance and SSH into it.
  2. Configure the server to retain worker or worker root volume of a failed worker



Option 1: Setting termination protection

If the issue is easy to reproduce, you can run the scenario, enable termination protection on the instance, connect to it and collect logs manually

Step 1: Verify keyPair is configured

Go to N2WS UI -> Worker Configuration, then make sure that the configuration for the relevant worker has a AWS SSH keyPair configured


If no Key Pair is configured, Select the relevant configuration, click on "Edit" and select a keyPair

Step 2: re-run scenario and enable termination protection

Rerun the scenario (worker test or Copy to S3 or File Level recovery), when the worker is launch enable termination protection on it from the AWS EC2 console


Wait for the scenario to fail and continue to next step.

Step 3: Connect to the worker instance & Collect logs

Connect to the worker instance with username ubuntu and the KeyPair you configured. you can use software such as MobaXterm which also allow to easily copy files out of the instance.
Collect the following from the instance:
  1. the entire folder: /var/log/cpm/
  2. file: /var/log/cloud-init-output.log
  3. file: /var/log/syslog.log
  4. file: /var/log/cloud-init.log

Step 4: Cleanup

  1. Disable the termination protection !!
  2. Terminate the worker


Option 2: Keep root volume or worker

If the issue is happening intermittent or you have many instances and not sure which worker is the relevant one, you can configure CPM to keep either the root volume of the worker or the entire worker.

Step 1: Verify keyPair is configured

Go to N2WS UI -> Worker Configuration, then make sure that the configuration for the relevant worker has a AWS SSH keyPair configured


If no Key Pair is configured, Select the relevant configuration, click on "Edit" and select a keyPair

Step 2: Configure keeping the temp resources

Connect to the
N2WS server via SSH (user is cpmuser), and do the following

1. Edit or create this file:  /cpmdata/conf/cpmserver.cfg
2. Add one of the following:
keep_failed_proxy -  this option will tell the server to keep the entire worker instance
keep_root_volume_of_failed_proxy - this option will detach and keep only the root volume of the worker instance
[backup_copy]
keep_failed_proxy=True

3. when no backup is running, restart apache server: 
sudo service apache2 restart

Step 3: re-run scenario or wait for issue to happen again, then collect logs.

re-run scenario or wait for issue to happen again. then check the relevant region for the leftover worker instance/root volume, or check the logs for info message with the details
Info - Root volume of failed worker i-1234567890abcdef (vol-1234567890abcdef) has been retained

Step 4: Collect logs.

If you configured to keep the entire worker instance, start it and then connect to the worker instance with username ubuntu and the KeyPair you configured.
you can use software such as MobaXterm which also allow to easily copy files out of the instance.

Then collect the following from the instance:
  1. the entire folder: /var/log/cpm/
  2. file: /var/log/cloud-init-output.log
  3. file: /var/log/syslog.log
  4. file: /var/log/cloud-init.log

If you configured to keep only the root volume, create a Linux instance, attach & mount the volume to the instance and collect the logs mentioned above
This AWS article explains how to mount a volume to an instance https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html

Step 5: Cleanup

  1. After you completed collecting logs, don't forget to delete the worker instance/root volume
  2. Disable the configuration added in step2 by connecting to N2WS server via ssh, edit the cpmserver.cfg and set the values to False.