Computers & Electronics

Strategy for long term preserving of your digital memories

  • Last Updated:
  • May 19th, 2014 10:02 am
Tags:
None
[OP]
Deal Addict
Aug 30, 2007
1933 posts
1262 upvotes

Strategy for long term preserving of your digital memories

Almost everyone these days have a growing collection of digital memories: most (or all) photos are never printed, just stored digitally; same for documents; home videos are created and stay in a digital form.

How to preserve digital memories perfectly for say 30 years - for your kids to enjoy when they become adults? The answer is actually rather complicated.

The simple (and very incomplete) answer is "backup your data". But a good backup strategy might make a whole difference between a perfectly preserved home digital archive and a total loss of your digital memories in just a few years.

In my personal experience, optical media (CDs, DVDs) archives don't last. I had at least a couple of old CDs and DVDs which after 4-5 years became unreadable. Hard drives are not better in this regard - you can't expect your data being passively preserved on a hard drive for much longer than 3 years. You should also take into account a possibility of a virus wiping out your data, meltdown of the hard drive due to failure of your PC (happened to me) etc. But actually more likely thing to happen is a gradual corruption (decay) of your data, regardless where it is stored. Check for example this article on a massive hard drive data decay study carried out at CERN:

http://storagemojo.com/2007/09/19/cerns ... -research/

I observed at least one clear case of data corruption, where an old photo on my PC's hard drive all of a sudden became corrupt.

There is no media where your digital memories could be passively preserved for 30 years. But it doesn't mean that a good long term strategy using such media cannot be developed. I've spent a few years perfecting my archiving setup, and want to share the details here. Obviously it is not the only way to archive your data, but I think the following points have to be covered by any good strategy:

These points are a must:

1) Keep multiple copies of your whole archive, on different hard drives. I think 3 copies is a good and reasonable number. (That's what I do.) Optical media is extremely inconvenient (it'd take forever to do a bit-by-bit health check of your whole DVD collection; so the chances are you'll end up not doing regularly), and doesn't really provide longer life for your data, so should be avoided.

2) Regularly (say, once a month) verify the health of all of your copies by comparing hashes of all the files between the copies. (Hash is a digital signature, unique to each file. There are many programs which can compute and compare hashes. This is faster and more convenient than a bit-by-bit comparison of two files, especially if the two copies can only communicate over Internet, but is almost as reliable: hashes will catch a single bit mutation in a vast majority of cases.) It is convenient to do this during backing up of your data.

3) You should only be doing "full backup" of your data, not "mirroring", "synchronizing" etc. Basically, no files should ever be erased on your archive copies.

4) The primary backup step (from your PC, tablet etc.) is the most critical, and requires your full attention: all the cases when an original file changed have to be analyzed - because the change could have been due to the original data corruption (but also can be due to the archival data corruption ). Fortunately not too many digital memory files change naturally - these are mostly some ongoing documents, notes etc. Things like photos and home videos are created once and should never change.

5) Don't wait until one of your hard drives fails! Instead, you should regularly (say, every 3-4 years) move the archive copy to a new hard drive. You should do the health check (hash comparison between the old and new hard drive copies) right after copying the data, to eliminate the chance of the data getting corrupted during copying.

These points are less critical, but still highly desirable:

6) Keep at least one archive copy in a physically remote location - in your work workstation (my case), in a PC owned by your relative etc. In any case, you need to have full access to that computer. This is to prevent catastrophic data loss due to floods, fires, burglaries etc.

7) Ideally, keep at least one copy under a different operational system, to prevent the risk of a global data wipe out due to viruses, Trojan horses and hackers. For example, if your primary copy is under Windows, keep at least one copy under Linux (my case) or Mac OS.


And here are the specific details of my setup (no need to follow this literally; as long as the above points are covered, you can change and modify this to your liking).

- Most of my original files reside on my home desktop, which is running Windows 7. For a number of years I have been using a free and quite powerful Windows backup program SyncBack to do the primary backup - from my desktop to an external hard drive over USB. It allows you to set up a number of different backup profiles (for unrelated locations on your desktop - say, one profile for photos, another for videos etc.) if needed. Make sure you choose the "Backup" option (not "Synchronize") when creating a new profile, to avoid erasing files on your archive. Also, for each backup you should go to "Expert" settings, then to "Compare Options" tab, and then enable the option "Use slower but more reliable method of file change detection" at the top. (This option will enable hashes computing and comparison, so it will catch even a single bit corruption.) Also, for flexibility I use the "Sub-dirs" option "Let me choose which subdirectories to include", as this will let you easily check/uncheck any subdirectory to archive, in the "Sub-directories" tab. You can combine multiple profiles into a single backup job - simply create a new profile of the type "Group: contains links to other profiles". Then all you need is to run the group profile. (You are still going to get multiple confirmation windows, with any file changes, one per each profile.)

- Right after carrying out a regular (every month) primary backup, I do the secondary backup - to a hard drive inside my workstation at work running Linux. I do it either over Internet (if not too much data to copy), or by bringing my external hard drive to work and connecting it over USB. I wrote a Bash script (run it on my workstation under Linux) which does everything automatically: it uses internal rsync command to first copy all the data, and then do the health check (hash comparison between the two copies). It stores all the file changes in a log file. For this to work, I installed cygwin (Linux emulator) on my home desktop. I tunnel all the rsync traffic over protected SSH channel for security. As my workstation is behind a corporate firewall (cannot be directly accessed from outside), the procedure is to initiate the SSH connection from the workstation to my home desktop (which has an open access from Internet, via port forwarding for the SSH port in my router). I can run the backup script either directly from work, or from home, using SSH connection to my workstation. The archival copy I create under Linux is all converted to "read-only" file permissions, to prevent accidental data deletion. Again, no data is ever erased on the third copy; it is a full backup.

Here is my backup Bash script (UPDATED 2022); you can copy and edit it to your particular circumstances:

https://drive.google.com/file/d/0B_p2gA ... bCyHyc5vNg

This is not a perfect system, but if it will fail it will most likely be a human error: me accidentally changing some files (or perhaps a virus changing my files on purpose), and then ignoring the warnings about the files changes during the two backup steps - very unlikely, but not impossible. As long as I am attentive during the primary backup step, pay attention to the error log of the secondary backup step, do the backups regularly, upgrade hard drives every 3-4 years, my digital memories (which are already ~300GB, mostly due to HD home videos) have a very good chance to be perfectly preserved for the next 30 years - or longer!
Last edited by pulsar123 on Jan 24th, 2022 9:34 am, edited 1 time in total.
66 replies
Sr. Member
Jan 20, 2013
544 posts
123 upvotes
Woodbridge
Agree, for a lot of user very useful information.
one more point - in a 3-4 years, newer hard disks would be way bigger in typical use. this would cover additional data grown in those 3-4 years and you would transfer "older" data to the new disk.
I keep my data on two different computers (windows) and one disk I am connecting once in a 3-4 month to make a third copy. So, no chance for virus. But yes, fire, flood...
So far my last lost data was in 1998 (I know how and I learned my lesson). Since that time I never lost any data.
in general, it is a problem for most users. peoples do not realize problem with reliability. Yes, disks are pretty reliable for several years, DVD are not bad if you keep it properly, but not for too long.
you just have to keep moving your data to keep it safe and at least two safe copies.
[OP]
Deal Addict
Aug 30, 2007
1933 posts
1262 upvotes
walker2238 wrote: What about using Flickr?
You mean - using Flickr as a cloud storage for all your data, or just for photos? In either case, it is a free service, Flickr doesn't guarantee anything, and it doesn't provide the means to check the health of your data via hashing. The same issue is with almost any other cloud storage provider, either free or not. So by uploading to a cloud you are basically okaying that sooner or later your data will decay.

As a 30 years strategy - fail.
[OP]
Deal Addict
Aug 30, 2007
1933 posts
1262 upvotes
vnkvnk wrote: one more point - in a 3-4 years, newer hard disks would be way bigger in typical use. this would cover additional data grown in those 3-4 years and you would transfer "older" data to the new disk.
Good point, I forgot to mention that. I already moved to a new hard drive, once, and it actually happened at the time when the old hard drive (at 250GB) was almost full. I moved to a 2TB disk; as my rate of data generation is less than 50 GB per year (for now), it is kind of an overkill, but given the price (I paid 70$ for an external 2TB disk) you can't really save money by using a smaller disk.
Banned
User avatar
Jun 8, 2008
3977 posts
1421 upvotes
Toronto
I've actually made photo books (blurb), I took the best pictures from 5 years and put them in a book, added my own comments about what was where, and made enough copies for my kids for when they get older. I need to do it again actually but its a way to produce meaningful books. I've got back ups in an external drive outside the home, on cloud storage and at home.
Deal Expert
Aug 22, 2006
29402 posts
14902 upvotes
I keep all my important data on monitored redundant storage systems (currently ZFS) with important bits encrypted and sent off to Amazon Glacier.
ZFS should protect me against corruption (since it does active monitoring of files) and S3/Glacier keeps it around in case of disaster.
[OP]
Deal Addict
Aug 30, 2007
1933 posts
1262 upvotes
death_hawk wrote: I keep all my important data on monitored redundant storage systems (currently ZFS) with important bits encrypted and sent off to Amazon Glacier.
ZFS should protect me against corruption (since it does active monitoring of files) and S3/Glacier keeps it around in case of disaster.
This sounds like an excellent way to safeguard your digital memories, but I suspect it is more expensive than the DIY approach I described (my only expense is buying two hard drives every 4 years, or ~35$/year). Also, will any cloud provider be around in 30 years? Probably not. Then the risk is that when it goes off business something might happen to your data. I also like the idea to have a full control over your data, but it is just me.
Deal Fanatic
Jan 17, 2003
8953 posts
1437 upvotes
This is how I run my backups at home.

Server running raid 5.
movies, tv shows, security camera recordings etc. - unimportant data
Pictures, videos, personal data - important data


RDX library automatically backups up entire server weekly. (So both sets of data has 1 backup)
My movies and tv shows are shared with friends (unimportant data has multiple backups, ease of restoring data)

Ironkey external hard drive using FIPS level 3 encryption and has a biometric scanner - important data
A second identical drive that I keep at the safety deposit box, in no situation is my data all in one place. I do a backup of the important data then switch it with the drive at the safety deposit box. Because I know when my server is updated with new pictures, it allows me to schedule my backups manually. I use the ironkey drive for the encryption and build protection since it's off site.

I have an additional RDX drive connected to my desktop using a secure FIPS level 2 software encrypted cartridge that I backup to and that i keep around the house with my important data. I have in the past had this backup at my parents, but never remember to update it, so I've moved it home. (important data has 3 backups).

Currently in use, is about 20tb of data for online and backup storage. I have an additional 55tb of RDX carts and hard drives in case drives fail. I still worry about things going down.
[OP]
Deal Addict
Aug 30, 2007
1933 posts
1262 upvotes
r1lee - multiple backup copies are great, but do you have any protection against data decay? Data decay is unavoidable, and a solid long-term (decades) archiving strategy must have means to monitor the health of your data. AFAIK RAID doesn't really protect from data decay.
Deal Addict
Sep 15, 2004
3309 posts
147 upvotes
My strategy for family images, is to produce optical multi-media gifts and distribute these though out the family at holiday get to-gathers (complete with artwork covers, and 'backup' folders). I use brand name media that other RFDers have mentioned as fairly reliable. Even so; over time I can see that I will have to produce a separate image disk for the back-up. You can get a lot of pictures on 1 DVD but not so much with HD videos, so boxed sets are the future. The family looks forward to these gifts every few years and I enthusiastically solicit donations of all images etc to incorporate into the latest story telling. If your story telling is interesting enough a you-tube account might come in handy.

Spinning rust is not the best place for loved images, move the data blocks every 2-3 yrs, preferably to new rust.
Optical storage can be good, but don't count on 20 years, plastic degrades as the enzo dies fade, use a 5 year move program with verify and keep the old ones.
Internet storage is a gamble, what was here to-day is gone tomorrow. (Think Megaupload)
Google drive, Google Mail, Hotmail, Yahoo etc.. there was a time. (Hotmail used to have a 3 months or lose it policy)
Share; the more people you can dupe into holding your data the more you'll be able to sleep.
And most important of all! Keep all your content dated and catalogued on the outside of the box, organization will be your ally.
Deal Addict
Nov 11, 2009
1948 posts
337 upvotes
pulsar123 wrote: You mean - using Flickr as a cloud storage for all your data, or just for photos? In either case, it is a free service, Flickr doesn't guarantee anything, and it doesn't provide the means to check the health of your data via hashing. The same issue is with almost any other cloud storage provider, either free or not. So by uploading to a cloud you are basically okaying that sooner or later your data will decay.

As a 30 years strategy - fail.
I've never had any issues... I have photos on a local drive and also on Flickr. If my house burns down and I survive I can just download them from Flickr.
Banned
User avatar
Feb 15, 2008
26318 posts
3237 upvotes
Calgary
pulsar123 wrote: r1lee - multiple backup copies are great, but do you have any protection against data decay? Data decay is unavoidable, and a solid long-term (decades) archiving strategy must have means to monitor the health of your data. AFAIK RAID doesn't really protect from data decay.
ECC codes on the drives themselves will tell you if the data is a true representation of the original. So as long as a backup is made every few years, shouldn't be a problem.

Personally I have a cron mdadm job that runs every week to verify my RAID-1's against each other. This involves reading both sides of the mirror and comparing. Any errors in such are reported to me. The key is to keep backups current, and to store things in a manner that is as non-proprietary as possible. Avoiding compression or even encryption as much as possible.

I personally run 6 hard drives now, and intend to replace one every 2 years or so to keep the fleet life reasonable. Of course, my data isn't anywhere near filling even a single drive, but I use the spare space for my PVR.
TodayHello wrote: ...The Banks are smarter than you - they have floors full of people whose job it is to read Mark77 posts...
Deal Fanatic
Jan 17, 2003
8953 posts
1437 upvotes
Mark77 wrote: ECC codes on the drives themselves will tell you if the data is a true representation of the original. So as long as a backup is made every few years, shouldn't be a problem.

Personally I have a cron mdadm job that runs every week to verify my RAID-1's against each other. This involves reading both sides of the mirror and comparing. Any errors in such are reported to me. The key is to keep backups current, and to store things in a manner that is as non-proprietary as possible. Avoiding compression or even encryption as much as possible.
.

Agreed. Even of there is possibly a bit rot, I'm wishing my luck that it's not large. I can deal with a loss of a few pictures.

I don't run compression, but encryption based on FIPS is my only security when the drives are not on site and I want to keep my data for my eyes only.
Deal Expert
Aug 22, 2006
29402 posts
14902 upvotes
pulsar123 wrote: This sounds like an excellent way to safeguard your digital memories, but I suspect it is more expensive than the DIY approach I described (my only expense is buying two hard drives every 4 years, or ~35$/year).
That all depends on the amount of data you're backing up. If you have 1GB of photos, something like Amazon Glacier would be MUCH cheaper than $35/year.
Hell at $0.01/gb, $35/year means I could cover 291GB.
The flip side of that is large scale. I plan for 10% failures of drives per year. I actually run about 5%. Not a huge cost when you're dealing with 5 drives, but when you're dealing with 50 drives, that's 5 drives per year.
Also, will any cloud provider be around in 30 years? Probably not. Then the risk is that when it goes off business something might happen to your data. I also like the idea to have a full control over your data, but it is just me.
Will the data on your HDD be around in 30 years? How about that CD/DVD/BD?
There's a reason why a backup is kept in a separate location. If something goes wrong, you can (hopefully) rely on another location or at least the original copy.
And I would imagine that someone as large as Amazon would give you notice before pulling the plug on their network considering how many large companies use it.
Mark77 wrote: ECC codes on the drives themselves will tell you if the data is a true representation of the original. So as long as a backup is made every few years, shouldn't be a problem.
What? No it's not. Most file systems wouldn't tell you if data is suddenly corrupt until you actually access said file. There's a couple of file systems that do active checking, but they're rarely used (relatively speaking).
I wouldn't trust anything on a HDD that doesn't do active file monitoring.
Deal Fanatic
User avatar
Dec 4, 2009
7650 posts
3438 upvotes
How bout shooting them into space...?
"I'm a bit upset. I've been grab by the back without any alert and lubrification"
Lucky
[OP]
Deal Addict
Aug 30, 2007
1933 posts
1262 upvotes
death_hawk wrote: That all depends on the amount of data you're backing up. If you have 1GB of photos, something like Amazon Glacier would be MUCH cheaper than $35/year.
Hell at $0.01/gb, $35/year means I could cover 291GB.
The flip side of that is large scale. I plan for 10% failures of drives per year. I actually run about 5%. Not a huge cost when you're dealing with 5 drives, but when you're dealing with 50 drives, that's 5 drives per year.
That's what I have right now - around 300GB, and it's growing 30-50GB per year for now. So money wise it sounds comparable, but I really love the fact that I have a full control over all my copies, and don't have to rely on some large corporation's word that they indeed do what they say they are doing.

Top

Thread Information

There is currently 1 user viewing this thread. (0 members and 1 guest)