CPM in Recovery Mode error - S3 worker fails with worker i-12345678912345678 stopped communicating - terminating operation

CPM in Recovery Mode error - S3 worker fails with worker i-12345678912345678 stopped communicating - terminating operation


CPM in Recovery Mode error - S3 worker fails with worker i-12345678912345678 stopped communicating - terminating operation

 

Issue:  When CPM is in recovery mode being used for DR drills, the original CPM’s process to detect “zombie” worker’s can be triggered and it terminates the workers being launched from the CPM DR instance. This only occurs in DR drills where the original CPM instance is still up and running in the original location.

There are two solutions:

1. Enable termination protection on the workers being launched to do the recovery from the CPM DR Instance.  You will need to launch the recovery, and then go into AWS and enable termination protection on any workers that launch.

2. Change the CPM DR Server’s (not the original server) UUID by checking the “Generate New UUID” option shown below.

 

If you enable termination protection, then the original CPM instance will no longer terminate the DR CPM Instances workers.

If you change the UUID of the DR CPM instance then it will also stop terminating the workers, so either option will work to correct the issue.