Episode 9, Backups

Here are the show notes for episode 9.
Make sure to send us feedback so we can make the show even better.
PodCast Feed



Links:

DisasterRecovery
Netbackup
Amanda
BackupPC
Thud's backup script

RYOS, Episode 9 - Backups

Thud: The RunYourOwnServer podcast for September 8th, 2006.

Thud: In this episode, backups. Why would you want to back stuff up? Tapes are disks? Local or remote? Commercial, open source, or home-grown?, and a moment of seg.


This episode's reverse sponsor is BackupPC. BackupPC is a high-performance, enterprise-grade system for backing up Linux and Windows, PCs and laptops to a server's disk. BackupPC is highly configurable and easy to install and maintain. You can find more details at
backuppc.sourceforge.net.


OK, let's get started. Gek, why do you want to back stuff up?

Gek: There are a lot of different reasons that somebody might want to back something up. Some of the obvious ones are: If you are running a server, you're most likely doing it because you have data that you need access to or that you are trying to create, so you back that stuff up so you don't lose it. The whole point of having a server, in most cases, is to have data on it. And although sometimes you may have multiple web-heads pointing to the same database, you are going to want to at least back up one of the web-heads, so that if a web-head goes down you have a copy of the website.


There are other reasons too. You could want to to back up data because you don't want to store it on disk anymore, you want to just keep it on tape, and should you ever need it down the road, you have it on that tape. You might back it up on CDs too, that's pretty common. That's about all I can think of. Do you have anything that you can think of?

Thud:
Well, really backing up stuff protects against the two most dangerous things in the computer industry. The first one being bad hard drives. Even if you have RAID so that you can survive a single hard drive failure, I've seen situations where multiple drives go down, or 50% of your drives go down, or even more, and your data is all gone.


The other most dangerous thing in the computer industry is stupid admins. Right now, I'm running about 50/50. I've had data loss about 50% of the time due to bad disks and about 50% of the time due to a misplaced rm. I was in the middle of typing something, forgot where I was, and ended up wiping out a directory I didn't really mean to. It's happened to just about every admin that I know. The best thing to do: if you back stuff up all the time and you do something like that, you don't have to feel too bad, because you think, "Yeah, I did it in the wrong place, I lost the data, but I can recover. I can get it back pretty quickly."


OK, so there's different kinds of backup media. There is copying data to another disk, so there is hard drive backup media, the other one is CD or DVD, which is optical, and the third one is tape. Gek, what's your experience with those three?

Gek:
Most of the clients that I work with just back up to tape. There are some where, like for databases it's necessary, they don't want to take their database down, and most databases have a way of dumping a copy of their contents to disk. So what we do is, we dump to disk, and then we back up the dump instead of backing up the actual live database. Typically in order to back up a database, you have to bring it down, and most people don't want to do that.


My ideal solution is actually a combination of the two. What I really like to see is, you have some kind of massive storage device, a storage array or a SAN or something where you can dump files, and then you back that up. I honestly believe that if you have to go to a tape, then you're already in a whole lot of trouble. Tapes go bad, they are slow to restore. It's much faster if you can pull things straight off of a disk, and you can always lose tapes. Hard drives are typically mounted in machines or arrays, and it's a harder thing to just walk away with. But tapes, if you've got a massive tape library that's got 400 tapes in it, if people are swapping tapes in and out of the library constantly, then it's real easy for a tape to get misplaced and now you don't have that backup.


I prefer backing up to disk at least for your hot backup. You want to have a copy of everything on some other disk array and then back that up as you need to, to make archives of things. How have you seen things done, usually?

Thud:
Well, I've seen it done a lot of different ways. In my personal experience, I try not to rely on tape. It tends to be slow unless you have a massive infrastructure -- high speed tapes, high speed backup server, high speed network that the rest of the servers are connected to so you can backup as quickly as possible. It's real expensive to make tape fast, and the only legitimate reason that I can think of for using tape is if you need to move large amounts of data off-site in a small medium. I think that they're approaching one terabyte on a tape. I think right now they're around 400, I have seen a couple of ads for 500 gigs on a single tape.


It just makes it easy to get a lot of data off-site. It's not something that, in my experience, you can really rely on. Some of the commercial products can actually make copies of tapes, and I've even seen that fail. Like I said, it's expensive to do fast, it's a lot cheaper just to get a whole bunch of hard drives and back it up there. Even if you back it up to the hard drive initially, and then from there to tape so you have different backup media, or from the secondary hard drive to optical like CDs or DVDs.


I really like the idea of having backups on disk because they are fast to access and it's more readily available. I've seen times when we suddenly realized we had to do a restore and the tape was just shipped off-site. Now we have to wait for the tape to be delivered at the storage facility and then turned around the next day to be sent back; now you are waiting two days for a restore that should take ten minutes.

Gek:
Another thing with larger storage devices, like EMC filers. You can get advanced features, where they'll make versions of documents, so as things change, you actually get kind of an archival backup. That alleviates some of the concern that people have backing up to disk, where, if you're just copying stuff to disk, you don't have different versions of documents and you can't go back. If you have to go back a month, then that means you're going to tape. With some of the more advanced file servers, you can actually just roll back a file or roll back a directory and recover things.

Thud:
Okay, let's talk a little about the difference between local and remote storage. Local, of course, is local to the system. So, basically, just backing up to another hard drive in the system, or to another directory in the system, for that matter. Remote being on a remote server that's either in the same facility, or it could be halfway around the world. Of those two options, gek, which one do you prefer?

Gek:
It depends. A lot of customers that I work with need offsite storage. If the building that they host their stuff in should crumble, they need the ability to get their stuff back - not necessarily back up and running, like a failover site, but they do need access to the data. For them, it makes sense to have something stored offsite, and that usually means a different zip code. If you do something like that, it's very, very expensive, and it is the kind of thing that you're not going to do it just because you think it's a good idea. If you need that, it will be obvious.


I like local storage for most situations. Really, it is the most cost effective of the two. If you need to store something offsite, you could always use somebody like Iron Mountain to come pick up backup tapes, or even spare hard drives, and keep it offsite without setting up a remote site to do backups to.

Thud:
Yeah, I kind of agree with that. Generally, like some of the home-brewed backup groups I have for my personal servers, I do an incremental backup every day to a local directory, and once a week I do a full backup. Because my servers don't change that often, usually every couple of weeks I'll go download those and remove them from the server. Just so I have them here at home, so if I need them I can restore it. The important thing for me is, I am trying to protect against a drive failure, but the data isn't that hard to replicate.


If I were in a commercial environment, I would want backups as often as possible. I would want them local to the drive or local to the server, so that I can restore quickly. I would also want them off on another server somewhere, whether that's a backup server with tapes or whether it's just another NFS server that's mounted. I'd also want it on tapes, so that I could easily move it around. If I were making millions of dollars, of course I'd want it offsite. I'd want a second site, where the data gets sent to.


There are a lot of options when it come to local and remote storage. It just boils down to how important is the data, how easy is it to replicate, and how much you are going to lose if you're down while you're trying to rebuild the data.


Now let's talk about commercial backup applications. You have a lot more experience in this arena than I do, gek, so why don't you tell us a little bit about it.

Gek:
My main experience has been with Veritas: Backup Exec and NetBackup. They're both pretty good products. NetBackup is a much better product than Backup Exec.


I've found that, for my own stuff and when I've been in charge of making the decision as to what gets used for backups, it's just easier to use Tar or rsync and some scripts, because it allows you a lot more flexibility. For instance, if you want to run encrypted backups on NetBackup, you're talking about buying the backup server, which is really expensive software, then you have to buy an encryption license for each client that you want to do that on. It can rack up the dollars really quickly.


If you wanted to do it with just open source and regular tools, you could use tar and GPG, and you could password protect or use a key to encrypt backups. It costs you nothing; the software is free, it's just a little bit of labor to set it up. I actually used that in a situation where I was the admin for a bunch of boxes.


The commercial projects are great if you're working on a large scale. If you've got hundreds and hundreds of servers, scripts are probably not the best way to go; you definitely want to look at a product like NetBackup. There are others out there; I haven't worked with many of them. I've worked with ARCserv a long time ago, I think I've seen BrightStor. But, honestly, I've never seen a product put in front of me that I liked better than NetBackup when it comes to the commercial side of the house.


Thud, what do you use typically if you're going to try and go the open source route?

Thud:
There are a couple of projects out there. The most common one, and probably best known, is AMANDA. I have used it in the past. It's very flexible and easy to set up. The one problem I had with it is that if you have a tape library that's never been used with AMANDA before, you're not going to get it to work. It seems that the tape library controls are different enough that they pretty much have to write it specifically for it. There are some tape library tools that are built in, to Linux for example, that assist with that, but they always have issues. In our particular case, and we were trying it, I don't know, probably five years ago, the issue was that it would eject the tape, command the robot to take the tape and put it into a certain slot and it was always off by one slot. So, if we told it to put it in slot three, it would put it in slot four, even if there was already a tape in it. So that caused a lot of issues, but for doing small server backups, it's not that big a deal. It seems to work great for that, if you've got manual tapes or things like that.


The other one that I used, and this was probably the last time that I used a project backup application, is called BackupPC. You can find it on SourceForge, just search for BackupPC. It's a nice little application that is designed for backing up Windows and Linux servers. You can back up through SCP or NFS or on the Windows servers with samba shares.


It has a very interesting way of storing the backups. Basically, if you do a full backup and then it does incremental backups, the way that it works is that every incremental backup has, depending on how you do it, a hard link or a soft link on the Linux server back to the files that haven't changed. So, what you have is a directory that is only the size of an incremental, but if you go to restore from that, you can restore a full backup from it. It simply links back to the things that haven't changed. They don't take up any extra space. If you have a one gig file that's on there on every full, but it doesn't change throughout the week when you're doing your incrementals, you only have to store the one gig and then the six additional hard links for it. That came in really handy, because normally when you have to do a restore, you have to do the full and then all the individual incrementals after that. This way, you can just do a full restore off your last incremental.


And it also has some really nice - it has a nice web interface into it. So you can see what's backed up, handle some of the scheduling. It was a pretty well thought out system. As it turned out, it works OK if you're doing multiple servers; but for me personally, because I only have a handful of personal servers, I want the stuff backed up, but I don't really care about it that much. It was just overkill for what I do.


That kind of brings us into the next section which is homegrown solutions, which is more or less what I use for all of my stuff, using things like tar, GZip, rsync, SCP, and a variety of other tools, just to script together something that meets my needs and works exactly the way that I want. Gek, do you have anything that you've built like that?

Gek:
Actually, I have a couple of things I've used and one idea that I'm kind of working on at the moment. My backups right now, I just do a very simple rsync from my server to a USB drive that I disconnect once the backups are done.


But in the past, when I was running servers out on the internet, and some at my dad's office, and I had more servers here, I would really just use tar and GPG to encrypt them. Then I would have scripts that automatically went and copied all of the stuff that was out on the internet, at my dad's office, or at my hosting provider. I'd copy the backups to my machine, and then my machine, I'd copy to my dad's office, so that the stuff was in different locations. And even if somebody came across one of the files, they wouldn't be able to get to it, because it was encrypted. Stuff like that is good just for quick and dirty, but if you've got data you really care about, I would not suggest, even if it is encrypted, taking it somewhere that is public, or could become public if it was breached.


One of the ideas I want to try and play around with, is with rsync you have the ability to build a list of files that it would have copied over, and I was going to try and make some sort of incremental backup using that list. How about you, thud, what do you usually use?

Thud:
Well, I have a script that I've kind of been working on over the years and adding features onto it. I think the last one I added was the ability to do encrypted backups, if I wanted to. I'm actually kind of debating about whether or not it's ready for public consumption. I mean, I use it and I've never had any issues with it. It's not really a complete backup system, which kind of bothers me. It basically just creates a directory with full backups and then incrementals, and you have to have some other mechanism for getting it off of the server. That's one of the reasons why I'm not - I really don't want to release it to the public, because people may think it's the end-all, be-all.


Basically what I'll do instead of making it available for the general public, if you're interested in seeing it or want some more detail about it, just send us an email at
Podcast at RunYourOwnServer.org and ask for my backup script, and I'll send you a copy of it. It's not very long. It doesn't need very many tools. It was originally written for OpenBSD, but I know it works on Linux and FreeBSD, but you just have to make sure, for example, if you want to do the encryption, you have to have GPG installed and configured. But yeah, if you're interested, just shoot us an email and I'll do it that way. It's not something I really want showing up in Google. It's not something I want to have to support later on.


Thud: This week since we're on backups and we've mentioned it a couple of times, let's go into a little bit more detail about encrypted backups. Gek, why do you think encrypted backups are important?

Gek:
Well, if you follow the news, I don't think you could have missed the stories about the government losing backups of sensitive data. If you've got data that you care about and you don't want anybody to get it, you need to protect your backups just as well as you do your data and that's the bottom line.


If anybody has their backups, that might even might be more valuable to them than what's live, because now they've got a history of what's happened. And really when you think about it, that's a ton more data than just the way things are currently. If you can see how things have been, you could even learn things like people's upgrade cycles: How frequently do they upgrade? When do they patch? There's a lot of information besides just the actual data that can be gleaned from somebody stealing somebody's backup tapes.


And really it isn't that hard to get the data off the tape. Once you have the tape, you don't necessarily have to have the software even all the time to read the data off of it. All I can say is that if you care about the data enough to encrypt it when it's not backed up, then it definitely needs to be encrypted when you back it up. What about you, thud?

Thud:
Yeah, I'd have to agree. The other thing to think about is that a lot of time in backup, especially on the enterprise level, backup solutions - because a tape would hold, say 400 gigs - and if it has a bunch of incrementals on it, you now have data from a lot of different servers on that tape. So getting hold of a tape is much more valuable than trying to hack in, because if you hack in you might get one server. If you get a tape, there might be ten or fifteen servers on there. So, you know, it's very important to encrypt it.


I'm even willing to go to the point of saying that if you're in a commercial atmosphere, where you've got any kind of data that is making you money, whether you think it's personal information or not, it should be encrypted. Every commercial package I know of supports encryption. Most of the open source do on some level. Or you can write your own. You know, tar and GZip and GPG and a shell language like bash, you've got a backup system that can encrypt the data.


It's very important to get all the data encrypted. It's so easy, I just don't know why everybody doesn't do it. Take the extra time to do the backups properly, so you don't have to worry about it when a tape goes missing.

Gek:
Yeah, you have to ask yourself: Is it going to cost you more to have to pay to have the data encrypted now? Or is it going to cost you more when you have to recover from the data having been stolen?


[music, "Down the Road" by Rob Coslo]

Thud:
For show notes, or other details, please visit our website at RunYourOwnServer.org.


[music continues]
Thud: If you would like to send us feedback, or have a question you would like to answer on the show, please send an email to Podcast att RunYourOwnServer.org.


[music continues]

Thud:
The intro music, "I Like Caffeine," is by Tom Cody. This song, "Down the Road," is by Rob Coslo. Please visit our website for links to their websites.



Thud: This podcast is covered under a Creative Commons license. Please visit our website for more details.




Transcription by CastingWords