Data Recovery
All the data being ingested and/or generated inside Polly is being stored inside AWS. Types of data include input/output files, logs, DB transactional data & static data. No data as of now is being stored outside of the AWS account. Although AWS provides very high durability and reliability for all the data stores, there is still a risk associated with some circumstances like accidental deletion, natural hazard & unwanted access to the AWS account.
Datastores
There are majorly 5 data stores in which the data is being stored currently inside AWS:
-
RDS (Postgres)
Datastore | Purpose | Backup | Frequency |
---|---|---|---|
RDS (Postgres) | For Tabular data To store relational transactional data, including but not limited to users & organizations' information, workspaces data/metadata etc. |
Uses AWS backup service (managed service for automated backups) | Daily |
DynamoDB | For JSON based data It works as extension to Postgres for the core working of the platform. |
Uses AWS backup service (managed service for automated backups) | Daily |
Elasticsearch | For Analysis Ready data | Before ingesting data into Elasticsearch, it is being loaded to an Amazon S3 bucket(s). In case Elasticsearch data is lost, we can re-index the data from S3 buckets. | Daily |
EFS | For application state storing Applications based on shiny architecture uses EFS for storing application state at a given point in time. |
Uses AWS backup service (managed service for automated backups) | Daily |
S3 buckets | For File storage S3 contains: Static content: Media, Configurations etc Input/Output files: Files ingested into the platform and/or generated output files Logs Application logs |
Custom cron job exectution The job copies the files every day into a separate account. |
Daily |
Key take-aways
-
Backups are done at least once a day
-
Following datastores are backed up - RDS, DynamoDB, Elasticsearch, EFS, S3 buckets.
-
In case of a disaster, a maximum of one day of data will be lost.