Strategy for long term preserving of your digital memories
Almost everyone these days have a growing collection of digital memories: most (or all) photos are never printed, just stored digitally; same for documents; home videos are created and stay in a digital form.
How to preserve digital memories perfectly for say 30 years - for your kids to enjoy when they become adults? The answer is actually rather complicated.
The simple (and very incomplete) answer is "backup your data". But a good backup strategy might make a whole difference between a perfectly preserved home digital archive and a total loss of your digital memories in just a few years.
In my personal experience, optical media (CDs, DVDs) archives don't last. I had at least a couple of old CDs and DVDs which after 4-5 years became unreadable. Hard drives are not better in this regard - you can't expect your data being passively preserved on a hard drive for much longer than 3 years. You should also take into account a possibility of a virus wiping out your data, meltdown of the hard drive due to failure of your PC (happened to me) etc. But actually more likely thing to happen is a gradual corruption (decay) of your data, regardless where it is stored. Check for example this article on a massive hard drive data decay study carried out at CERN:
http://storagemojo.com/2007/09/19/cerns ... -research/
I observed at least one clear case of data corruption, where an old photo on my PC's hard drive all of a sudden became corrupt.
There is no media where your digital memories could be passively preserved for 30 years. But it doesn't mean that a good long term strategy using such media cannot be developed. I've spent a few years perfecting my archiving setup, and want to share the details here. Obviously it is not the only way to archive your data, but I think the following points have to be covered by any good strategy:
These points are a must:
1) Keep multiple copies of your whole archive, on different hard drives. I think 3 copies is a good and reasonable number. (That's what I do.) Optical media is extremely inconvenient (it'd take forever to do a bit-by-bit health check of your whole DVD collection; so the chances are you'll end up not doing regularly), and doesn't really provide longer life for your data, so should be avoided.
2) Regularly (say, once a month) verify the health of all of your copies by comparing hashes of all the files between the copies. (Hash is a digital signature, unique to each file. There are many programs which can compute and compare hashes. This is faster and more convenient than a bit-by-bit comparison of two files, especially if the two copies can only communicate over Internet, but is almost as reliable: hashes will catch a single bit mutation in a vast majority of cases.) It is convenient to do this during backing up of your data.
3) You should only be doing "full backup" of your data, not "mirroring", "synchronizing" etc. Basically, no files should ever be erased on your archive copies.
4) The primary backup step (from your PC, tablet etc.) is the most critical, and requires your full attention: all the cases when an original file changed have to be analyzed - because the change could have been due to the original data corruption (but also can be due to the archival data corruption ). Fortunately not too many digital memory files change naturally - these are mostly some ongoing documents, notes etc. Things like photos and home videos are created once and should never change.
5) Don't wait until one of your hard drives fails! Instead, you should regularly (say, every 3-4 years) move the archive copy to a new hard drive. You should do the health check (hash comparison between the old and new hard drive copies) right after copying the data, to eliminate the chance of the data getting corrupted during copying.
These points are less critical, but still highly desirable:
6) Keep at least one archive copy in a physically remote location - in your work workstation (my case), in a PC owned by your relative etc. In any case, you need to have full access to that computer. This is to prevent catastrophic data loss due to floods, fires, burglaries etc.
7) Ideally, keep at least one copy under a different operational system, to prevent the risk of a global data wipe out due to viruses, Trojan horses and hackers. For example, if your primary copy is under Windows, keep at least one copy under Linux (my case) or Mac OS.
And here are the specific details of my setup (no need to follow this literally; as long as the above points are covered, you can change and modify this to your liking).
- Most of my original files reside on my home desktop, which is running Windows 7. For a number of years I have been using a free and quite powerful Windows backup program SyncBack to do the primary backup - from my desktop to an external hard drive over USB. It allows you to set up a number of different backup profiles (for unrelated locations on your desktop - say, one profile for photos, another for videos etc.) if needed. Make sure you choose the "Backup" option (not "Synchronize") when creating a new profile, to avoid erasing files on your archive. Also, for each backup you should go to "Expert" settings, then to "Compare Options" tab, and then enable the option "Use slower but more reliable method of file change detection" at the top. (This option will enable hashes computing and comparison, so it will catch even a single bit corruption.) Also, for flexibility I use the "Sub-dirs" option "Let me choose which subdirectories to include", as this will let you easily check/uncheck any subdirectory to archive, in the "Sub-directories" tab. You can combine multiple profiles into a single backup job - simply create a new profile of the type "Group: contains links to other profiles". Then all you need is to run the group profile. (You are still going to get multiple confirmation windows, with any file changes, one per each profile.)
- Right after carrying out a regular (every month) primary backup, I do the secondary backup - to a hard drive inside my workstation at work running Linux. I do it either over Internet (if not too much data to copy), or by bringing my external hard drive to work and connecting it over USB. I wrote a Bash script (run it on my workstation under Linux) which does everything automatically: it uses internal rsync command to first copy all the data, and then do the health check (hash comparison between the two copies). It stores all the file changes in a log file. For this to work, I installed cygwin (Linux emulator) on my home desktop. I tunnel all the rsync traffic over protected SSH channel for security. As my workstation is behind a corporate firewall (cannot be directly accessed from outside), the procedure is to initiate the SSH connection from the workstation to my home desktop (which has an open access from Internet, via port forwarding for the SSH port in my router). I can run the backup script either directly from work, or from home, using SSH connection to my workstation. The archival copy I create under Linux is all converted to "read-only" file permissions, to prevent accidental data deletion. Again, no data is ever erased on the third copy; it is a full backup.
Here is my backup Bash script (UPDATED 2022); you can copy and edit it to your particular circumstances:
https://drive.google.com/file/d/0B_p2gA ... bCyHyc5vNg
This is not a perfect system, but if it will fail it will most likely be a human error: me accidentally changing some files (or perhaps a virus changing my files on purpose), and then ignoring the warnings about the files changes during the two backup steps - very unlikely, but not impossible. As long as I am attentive during the primary backup step, pay attention to the error log of the secondary backup step, do the backups regularly, upgrade hard drives every 3-4 years, my digital memories (which are already ~300GB, mostly due to HD home videos) have a very good chance to be perfectly preserved for the next 30 years - or longer!
How to preserve digital memories perfectly for say 30 years - for your kids to enjoy when they become adults? The answer is actually rather complicated.
The simple (and very incomplete) answer is "backup your data". But a good backup strategy might make a whole difference between a perfectly preserved home digital archive and a total loss of your digital memories in just a few years.
In my personal experience, optical media (CDs, DVDs) archives don't last. I had at least a couple of old CDs and DVDs which after 4-5 years became unreadable. Hard drives are not better in this regard - you can't expect your data being passively preserved on a hard drive for much longer than 3 years. You should also take into account a possibility of a virus wiping out your data, meltdown of the hard drive due to failure of your PC (happened to me) etc. But actually more likely thing to happen is a gradual corruption (decay) of your data, regardless where it is stored. Check for example this article on a massive hard drive data decay study carried out at CERN:
http://storagemojo.com/2007/09/19/cerns ... -research/
I observed at least one clear case of data corruption, where an old photo on my PC's hard drive all of a sudden became corrupt.
There is no media where your digital memories could be passively preserved for 30 years. But it doesn't mean that a good long term strategy using such media cannot be developed. I've spent a few years perfecting my archiving setup, and want to share the details here. Obviously it is not the only way to archive your data, but I think the following points have to be covered by any good strategy:
These points are a must:
1) Keep multiple copies of your whole archive, on different hard drives. I think 3 copies is a good and reasonable number. (That's what I do.) Optical media is extremely inconvenient (it'd take forever to do a bit-by-bit health check of your whole DVD collection; so the chances are you'll end up not doing regularly), and doesn't really provide longer life for your data, so should be avoided.
2) Regularly (say, once a month) verify the health of all of your copies by comparing hashes of all the files between the copies. (Hash is a digital signature, unique to each file. There are many programs which can compute and compare hashes. This is faster and more convenient than a bit-by-bit comparison of two files, especially if the two copies can only communicate over Internet, but is almost as reliable: hashes will catch a single bit mutation in a vast majority of cases.) It is convenient to do this during backing up of your data.
3) You should only be doing "full backup" of your data, not "mirroring", "synchronizing" etc. Basically, no files should ever be erased on your archive copies.
4) The primary backup step (from your PC, tablet etc.) is the most critical, and requires your full attention: all the cases when an original file changed have to be analyzed - because the change could have been due to the original data corruption (but also can be due to the archival data corruption ). Fortunately not too many digital memory files change naturally - these are mostly some ongoing documents, notes etc. Things like photos and home videos are created once and should never change.
5) Don't wait until one of your hard drives fails! Instead, you should regularly (say, every 3-4 years) move the archive copy to a new hard drive. You should do the health check (hash comparison between the old and new hard drive copies) right after copying the data, to eliminate the chance of the data getting corrupted during copying.
These points are less critical, but still highly desirable:
6) Keep at least one archive copy in a physically remote location - in your work workstation (my case), in a PC owned by your relative etc. In any case, you need to have full access to that computer. This is to prevent catastrophic data loss due to floods, fires, burglaries etc.
7) Ideally, keep at least one copy under a different operational system, to prevent the risk of a global data wipe out due to viruses, Trojan horses and hackers. For example, if your primary copy is under Windows, keep at least one copy under Linux (my case) or Mac OS.
And here are the specific details of my setup (no need to follow this literally; as long as the above points are covered, you can change and modify this to your liking).
- Most of my original files reside on my home desktop, which is running Windows 7. For a number of years I have been using a free and quite powerful Windows backup program SyncBack to do the primary backup - from my desktop to an external hard drive over USB. It allows you to set up a number of different backup profiles (for unrelated locations on your desktop - say, one profile for photos, another for videos etc.) if needed. Make sure you choose the "Backup" option (not "Synchronize") when creating a new profile, to avoid erasing files on your archive. Also, for each backup you should go to "Expert" settings, then to "Compare Options" tab, and then enable the option "Use slower but more reliable method of file change detection" at the top. (This option will enable hashes computing and comparison, so it will catch even a single bit corruption.) Also, for flexibility I use the "Sub-dirs" option "Let me choose which subdirectories to include", as this will let you easily check/uncheck any subdirectory to archive, in the "Sub-directories" tab. You can combine multiple profiles into a single backup job - simply create a new profile of the type "Group: contains links to other profiles". Then all you need is to run the group profile. (You are still going to get multiple confirmation windows, with any file changes, one per each profile.)
- Right after carrying out a regular (every month) primary backup, I do the secondary backup - to a hard drive inside my workstation at work running Linux. I do it either over Internet (if not too much data to copy), or by bringing my external hard drive to work and connecting it over USB. I wrote a Bash script (run it on my workstation under Linux) which does everything automatically: it uses internal rsync command to first copy all the data, and then do the health check (hash comparison between the two copies). It stores all the file changes in a log file. For this to work, I installed cygwin (Linux emulator) on my home desktop. I tunnel all the rsync traffic over protected SSH channel for security. As my workstation is behind a corporate firewall (cannot be directly accessed from outside), the procedure is to initiate the SSH connection from the workstation to my home desktop (which has an open access from Internet, via port forwarding for the SSH port in my router). I can run the backup script either directly from work, or from home, using SSH connection to my workstation. The archival copy I create under Linux is all converted to "read-only" file permissions, to prevent accidental data deletion. Again, no data is ever erased on the third copy; it is a full backup.
Here is my backup Bash script (UPDATED 2022); you can copy and edit it to your particular circumstances:
https://drive.google.com/file/d/0B_p2gA ... bCyHyc5vNg
This is not a perfect system, but if it will fail it will most likely be a human error: me accidentally changing some files (or perhaps a virus changing my files on purpose), and then ignoring the warnings about the files changes during the two backup steps - very unlikely, but not impossible. As long as I am attentive during the primary backup step, pay attention to the error log of the secondary backup step, do the backups regularly, upgrade hard drives every 3-4 years, my digital memories (which are already ~300GB, mostly due to HD home videos) have a very good chance to be perfectly preserved for the next 30 years - or longer!
Last edited by pulsar123 on Jan 24th, 2022 9:34 am, edited 1 time in total.