Disaster Recovery for SonarC

SonarC High Availability Disaster Recovery (HADR) are two SonarC nodes, that act as a single system, providing continuity in the face of hardware or software failures. The system is based on an active/passive structure where there is a single primary and one secondary to which data is replicated; additionally, the secondary Disaster Recovery (DR) machine can take over as primary if there is a failure in communication with the Main.

Terminology

  • Main Machine: The default server running the primary SonarC application. This is a static definition.
  • DR Machine: The second server with the SonarC application, This sonar is mostly used as backup. Generally this will be the weaker machine in the disaster recovery set. This is also a static definition.
  • Primary Sonar: One of the dynamic states sonar can be in. In this state sonar interacts with all other applications and is used for all the ordinary uses of sonar (write, read). This is the default state for the main machine. The term ‘primary’ is interchangeable with the term ‘master’.
  • Secondary Sonar: The other dynamic state sonar can be in. In this state sonar does not interact with any other applications, and solely connects to the Primary Sonar. This sonar acts as a replica for the Primary Sonar and is the default state for the DR machine. The term ‘secondary’ is interchangeable with the term ‘slave’.

SonarC’s disaster recovery system is made of three parts:

  1. The Main machine which has a primary sonar running on it. This machine is the machine in which you would interact with continuously unless there is a failure.
  2. The DR machine which has a secondary sonar. The usage of this node is to sync data with the main machine and take over operations in case that the primary fails.
  3. A dynamic DNS definition that maps a virtual-IP/DNS-Name used by the Guardium appliances and any external application or users to either the primary or the secondary based on availability

In the most common setup two nodes will be installed, with similar hardware, with the two nodes in separate data centers or at least separate areas. Additionally, some dynamic DNS system will be configured such that there is a virtual hostname that will be mapped either to the primary’s IP address or to the secondary’s IP address (at different times). This virtual hostname is the one used when configuring all of the Guardium appliances with the data marts - i.e. the appliances push the data to the virtual IP address which is mapped to either the primary or the secondary.

As long as the primary is operational, the DR machine continuously replicates data to itself: the secondary node. During this time the system is in passive mode - apart from replicating data it can be used for querying the data, but not for making any changes. The DR node also monitors the Main node continuously. Values related to monitoring and replication can be adjusted in /etc/sonar/sonard.conf

If the DR sees that the Main is down for some length of time (3 hours, configurable), it initiates a takeover process. The DR node’s operation is:

  • The DR sonar will attempt to connect to the Main machine every minute. This heartbeat can be adjusted by editing dr_heartbeat_interval = 60.
  • If the DR cannot reach the Main it will send an email indicating that it cannot connect, by default, once every half hour. #rs_wait_time = 30 This email is to provide the opportunity to rectify the issue on the Main before there is a failover.
  • After 6 connection attempts #rs_connection_attempts = 6 (default 3 hours) the DR will promote itself from secondary to primary and will send an email confirming that its own status has changed.
  • If turned back on after a failover, the Main will demote itself to secondary and start syncing data from the DR (the new primary).
  • It is recommended that once the issue has been fixed you reset the Main back to primary and the DR to secondary. You do this by running a script to convert the sonar on both machines back to the default/initial states. See Operational Scenario 3 below for instructions.

Setting up SonarC DR

To install a SonarC HADR system follow these instructions:

  • On Both machines (Main & DR):

    1. Install SonarC
    1. Install rsync
    • sudo yum install rsync
    1. Run the sonarg-setup script on both machines
    • sudo sonarg-setup
    • Note: during the setup you need to enter the “Public IP address/virtual-IP/DNS-Name” address of the system.
    • Note: Main and DR machines should be setup identically, I.E. use the same passwords, SONAR_HOME, et al.
    1. On both machines edit the HADR setup config file. Values in “<>” below must be updated according to the information provided in the sonarg-setup (of the previous step). All other values are set to defaults and there is normally no need to change them:

      vi /usr/lib/sonarw/hadr_install_defaults.sh
        REPLSET_NAME="rs0"
        SONAR_ADMIN_USER="admin"
        SONAR_ADMIN_PASSWORD=<”admin” password specified in the sonarg-setup>
        REPLICATION_SYNC_INTERVAL=30
        MAIN_MEMBER_ADDRESS=<IP-address/DNS-name of the “Main” sonar Server>
        MAIN_MEMBER_PORT=27117
        DR_MEMBER_ADDRESS=<IP-address/DNS-name of the “DR” sonar Server>
        DR_MEMBER_PORT=27117
        SONAR_HOME=<Location of “SonarW” specified in the sonarg-setup>
        SONARGD_HOME="/var/lib/sonargd"
        SONARGD_BASEDIR=<Location of “sonargd directory” specified in the sonarg-setup>
        CLOUD_BUCKET_NAME=<name of cloud storage bucket>
      Save and exit
      
    2. On “Main” run the script:

    • sudo /usr/lib/sonarw/hadr_install.sh main
    • Follow the instruction at the end of the script (on the DR, create a .ssh directory under the SONAR_HOME; copy the required files from /tmp/sshkeys to the newly created .ssh folder on the DR. Additionally copy the contents of /etc/sonar/ssl on the Main to /etc/sonar/ssl on the DR.
    • Run the script on the DR sudo /usr/lib/sonarw/hadr_install.sh dr
    1. Verify on DR the results of the hadr script
    • last command in the previous step “sudo /usr/lib/sonarw/hadr_install.sh dr” needs to complete successfully
    • There will be a new log (on the secondary) detailing the rsyncing of data between Main and DR. /var/log/sonargd/replication.log

Synchronizing Configuration Files with config_replication.py

config_replication.py is a script that allows you to automatically synchronize configuration files between the Main and DR machines. The default location for the script is /usr/lib/sonarw.

You can use config_replication.py to copy configuration files from the Main to the DR machine, or vice versa.

Copying all configuration files from the MAIN to the DR machine

  1. On the DR machine, change the directory to /usr/lib/sonarw:

    cd /usr/lib/sonarw
    
  2. From the /usr/lib/sonarw folder on the DR machine, run config_replication.py with sudo access:

    sudo python3 config_replication.py
    
  3. You will be prompted to select the machine to synchronize from; select the MAIN machine.

Notes:
  • During the upgrade process from SonarC 3.2.2 to 4.0, you must run the config_replication.py script to synchronize the configuration files exactly as shown above. This procedure automates the process of copying and moving the SSL folder from the Main to the DR machine, rather than requiring you to do it manually.
  • The config_replication.py script will also correct the permissions for the SSL folder, something that would normally be done by the HADR setup script (which will not be run during the upgrade process).

Operational Scenario 1: DR system cannot connect to primary and sends a warning email (SonarW replication connectivity issue)

This means that the Secondary Sonar (probably on the DR machine) cannot connect with the Primary Sonar (probably on the Main machine)

Check the Main machine

  • Check that the main machine is up. If it isn’t, start it.
  • Check that sonar is up: systemctl status sonard
  • If sonar isn’t running, start it: systemctl start sonard
  • Check that there is disk space that can be accessed by the SONAR_HOME directory. At least 20% of the disk should be free.
  • Connect to sonar via a mongo shell and check that it is in a Primary Sonar state. The result of the following command should be ‘true’: rs.isMaster()[“ismaster”]

Check Connectivity between Main machine and DR Machine

  • Open a shell on the DR machine
  • Run a mongo shell connecting to the sonar on the main machine. This is done by opening a shell the way you would normally in the main machine, but the –host option would be the ip/hostname of the main machine.

Wait until the next email

If you tried the previous steps and everything seems ok, There may have been a hiccup in the connection and everything is fine now.

You will get another email within 30 (configurable) minutes if it is still not connecting.

Please contact SonarC support if you get a second email and you cannot find a problem with the main machine or the connectivity between the main machine and the DR machine.

Operational Scenario 2: Secondary take over (getting a SonarW replication state change email)

This means that the Secondary Sonar (on the DR machine) has not been able to connect to the Primary Sonar (on the Main machine) and has decided to promote itself to be the primary sonar.

Check the Main machine

  • Check that the main machine is up. If it isn’t, start it.
  • Check that sonar is up: systemctl status sonard
  • If sonar isn’t running, start it: systemctl start sonard
  • Check that there is disk space that can be accessed by the SONAR_HOME directory. At least 20% of the disk should be free.
  • Connect to sonar via a mongo shell and check that it is in a Secondary Sonar state. The result of the following command should be ‘false’: rs.isMaster()[“ismaster”]

Check Connectivity between Main machine and DR Machine

  • Open a shell on the DR machine
  • Connect to sonar via a mongo shell and check that it is in a Primary Sonar state. The result of the following command should be ‘true’: rs.isMaster()[“ismaster”]
  • Run a mongo shell connecting to the sonar on the main machine. This is done by opening a shell the way you would normally in the main machine, but the –host option would be the ip/hostname of the main machine.

If you checked that everything is running as desired you should change the states back to the default settings (Primary on main and Secondary on DR). This can be accomplished by following the instructions in Scenario 3.

Operational Scenario 3: Switch back the states of your machines

This means that we are in the following states:

  • Secondary Sonar (probably on the DR machine) has not been able to connect to the Primary Sonar (probably on the Main machine) for a while and has decided to promote itself to be the primary sonar.
  • Previously Primary sonar (on the main machine) is reconnected and is now functioning as a Secondary sonar on the Main machine.
  • You ran all the steps in Scenario 2.

Switch the states of sonar on your machines back to the defaults

  • On DR machine Run the DR_secondary_to_primary script as sudo with the following parameters:

    • user - The admin user
    • password - The password of the admin user.
    • port - the sonar port on the DR machine, if omitted defaults to 27117.
  • Run command:

    sudo DR_secondary_to_primary --user=admin --password=<sonar password> [--port=27117]
    
  • Watch the output of the script on the DR machine. Wait for the following line:

    START THIS SCRIPT IN MAIN MACHINE
    
  • On Main machine Run the script as sudo same as previously ran in the DR machine.

  • Wait for script to finish.

  • If the script succeeds:

    • Check that the states have really changed by connecting to sonar on both machines and running: rs.isMaster()[“ismaster”]
  • If script fails:

    • Follow the instructions on the script and try again.
    • Copy the script printout into an email to support@jsonar.com opening a ticket with SonarC support.

Example of script output:

On DR:

2016-08-04 16:18:22 || INFO  || Start primary transfer script!
2016-08-04 16:18:23 || INFO  || Connections to sonarw established
2016-08-04 16:18:23 || INFO  || Stopping sonargd...
2016-08-04 16:18:24 || INFO  || Stopping sonardispatcher...
2016-08-04 16:18:24 || INFO  || Stopping sonarfinder...
2016-08-04 16:18:25 || INFO  || Stopping sonarsql...
2016-08-04 16:18:25 || INFO  || Changing permissions of /var/lib/sonargd/incoming
2016-08-04 16:18:25 || INFO  || START THIS SCRIPT IN MAIN MACHINE
This script will continue automatically when the other machine finishes the sync process.
If other script fails, enter the prompt it displays here:
2016-08-04 16:27:09 || INFO  || Starting sonard...
2016-08-04 16:27:09 || INFO  || Starting sonargd...
2016-08-04 16:27:09 || INFO  || Starting sonardispatcher...
2016-08-04 16:27:09 || INFO  || Starting sonarfinder...
2016-08-04 16:27:09 || INFO  || Starting sonarsql...
2016-08-04 16:27:09 || INFO  || Changing permissions of /var/lib/sonargd/incoming back
2016-08-04 16:27:09 || INFO  || Finished primary transfer script

On Primary:

[date] || INFO  || Start primary transfer script!
[date] || INFO  || Connections to sonarw established
[date] || INFO  || Connection between machines via ssh established
[date] || INFO  || Stopping sonargd...
[date] || INFO  || Copying files over from DR to main machine
[date] || INFO  || Getting the list of files to copy from current primary to new primary
[date] || INFO  || Copying directory: 'sonarw@dr:/var/lib/sonargd/audit/' to '
/var/lib/sonargd/audit/'
[date] || INFO  || Copying directory: 'sonarw@dr:/opt/sonarfinder/sonarFinder/reports/' to
'/opt/sonarfinder/sonarFinder/reports/'
[date] || INFO  || Copying directory: 'sonarw@dr:/var/lib/sonargd/inprogress/' to
'/var/lib/sonargd/inprogress/'
[date] || INFO  || Copying directory: 'sonarw@dr:/var/lib/sonargd/incoming/' to '
/var/lib/sonargd/incoming/'
[date] || INFO  || Starting data sync with DR machine
[date] || INFO  || Running SonarW data sync. Log is at: /var/log/sonargd/replication.log
[date] || INFO  || Finished SonarW data sync
[date] || INFO  || Stopping sonard...
[date] || INFO  || Starting sonard...
[date] || INFO  || Starting sonargd...

Operational Scenario 4: Defining a new Main in an HADR (Total loss of Main machine)

Install Brand New SonarG

Install a new SonarG on the system that will become your new Main; During setup ensure that passwords, reports folder location, sonargd home, and sonarw home are all identical to the original machines.

scp the entire .ssh directory from DR to the new Main, and ensure the correct ownership of all files.

  • Send the entire .ssh on the DR over to the main:

    1. scp -rp /var/lib/sonarw/.ssh/ [username]@[ip.address]:
    2. cp -rp .ssh/ /var/lib/sonarw/
    3. sudo chown -R sonarw:sonar /var/lib/sonarw/.ssh/

Correct the configuration files on both machines

On Main:

  1. sudo vi /etc/sonar/sonard.conf:

    rs_id = 0
    drset_main_address = (new Main IP):27117
    drset_dr_address = (existing DR IP):27117
    bind_ip = 0.0.0.0
    replSet = (Same as existing DR)
    cloud_bucket_name = (Same as existing DR)
    

On DR:

  1. sudo vi /etc/sonar/sonard.conf:

    drset_main_address = (new Main IP):27117
    

Still on the DR, edit /etc/hosts replacing the outdated internal IP address of the new Main

  1. sudo vi /etc/hosts

    xxx.xx.x.xxx main

  2. Restart sonard service

    sudo systemctl restart sonard

Note: If on the same network, it’s possible to use the ‘internal’ IP addresses.

If these steps were successful, the MAIN will be active but running as a Secondary.

Follow the steps listed in Operational Scenario 3 to complete a switchback and establish the new Main as Master.

Upgrading HADR Set from SonarC Version 3.2.2 to 4.0

To upgrade the HADR set from SonarC 3.2.2 TO 4.0:

  1. Stop sonard service on the DR machine to prevent replication during the upgrade process.

  2. Upgrade Main machine to SonarC 4.0.
    • Run sonarg-setup.
  3. Upgrade DR machine to SonarC 4.0.
    • Ensure sonard service remains stopped and do not run setup.
  4. Run the config_replication.py script (from /usr/lib/sonarw on the DR machine) to ensure all configuration files and permissions are synchronized from the Main machine. (When prompted, ensure the MAIN machine is selected as the synchronization source.)

    Note: The synchronization procedure must be completed exactly as shown in Copying all configuration files from the MAIN to the DR machine.

  5. Start sonard service on the DR machine and ensure that a complete replication occurs.

  6. Run sonarg-setup on the DR machine.

  7. Since sonard is restarted during sonarg-setup, the DR will replicate again after setup is complete. Ensure setup and replication are both completed without errors before proceeding further.