Long-term Retention using Object Stores and Cloud Storage

SonarC is especially suited for building compliance data lakes/clouds. The underlying architecture makes use of low-cost object stores on-prem and in the public cloud for long-term retention. All data is always available for querying, reporting and analytics regardless of whether it is in local storage, in the cloud, in cold storage and even offline. This is all transparent to the user - the user merely runs a query and the system manages the correct data access.

By default all data is “born” on local storage and has an infinite retention policy. You manage retention policies using the Cloud Storage Lifecycle Management option in Cloud Management. Policies can be defined for an entire database or on a collection-by-collection basis. If you do not use cloud storage you can set the purge policy for your collections/databases and manage all data locally. If you do want to use cloud storage or an on-prem S3-compatible object store, you select the storage provider and then select retention policies for different storage tiers. Note that different cloud providers offer different storage tiers and the system knows what is available to you. All policy decisions are based on the age of data - i.e. you specify at what age of the data should it be moved to which storage tier and/or purged as shown below.

_images/cloud1.png

In this example data will reside in local storage for 6 months. As the data is being created in local storage it is also automatically copied to Amazon AWS S3 serving as a hot backup. Data that is older then 1 year is moved off to S3-IA (cold storage). Data that is older than 2 years is moved off to Amazon Glacier (offline storage). Data that is older than 3 years is purged from the cloud.

At all times the user can query and generate reports on this data and the only impact will be on reporting times (since accessing S3 is typically slower than accessing EBS or local storage). All of this is available for Amazon AWS, Microsoft Azure, Google GCP, and IBM Softlayer as well as in-house S3-compatible object stores.

When you install SonarC the installer asks you whether you have a storage provider and which one to use. When you use one of the public providers the system knows the storage locations and you only need add the following parameters when setting up the policies:

  • AWS - Access key id and secret access key.
  • Azure - Storage account and key.
  • GCP - Project id, private key and service account email.
  • Softlayer - Access key id and secret access key.

When you use an on-prem S3 compatible object store you use the S2 Compatible & Softlayer option but you also need to configure the IP address for the storage device in your sonard.conf using the privates3_server_address parameter.

When you install SonarC the session, instance, exception and full_sql collections are automatically configured to use hot cloud storage as a backup. When you install SonarC no collections are configured as such by default; use the Information Lifecycle Management screen to decide which collections to upload to cloud storage.

Note that you should only use a single storage provider for SonarC system and you should select the provider where your SonarC instance is located since cloud providers usually charge for outbound data access.