I'm sure that many of you have read that HP’s online backup service went down shortly after it was introduced. It’s back up now, and HP has explained that what they experienced was a "technical glitch." Here are a few links about this story:
Enterprise Storage Forum: HP Upline Suffers Downtime
InformationWeek: HP Shuts Down Online Storage Service
Beta News: Bringing down the cloud: HP's Upline down for a third of its life
Why am I blogging about HP’s problems? Only because it underscores the difficulty of building a reliable large scale backup service.
When I was out raising money for Carbonite, one venture capitalist dismissed backup as a "trivial" application. It reminded me of an incident when I was teaching at MIT a few years ago: one of my students insisted that Google wasn't worth billions of dollars because "anyone can write a search engine." It’s true. But Google's barrier to entry is their huge scale. To build a backup service that can flawlessly store and retrieve billions of files is not so easy, as HP has learned.
When we first started out, we were using Microsoft’s NTFS file system. When we got to about 500 million files, it started to crash and gave us all kinds of problems. When we called Microsoft for help, the engineer on the other end asked us how many files we were storing. When we said "about 500 million", there was silence on the other end of the phone. He said "Uh, well we didn’t really design NTFS for that many files." So we set about building our own proprietary file system that could handle more than a trillion files, because that’s where we’ll be in a couple of years. We currently receive almost 40 million new files every day. And we have close to 7 billion files backed up. We restore millions of files every day.
To do all that without losing any data is an enormously complicated engineering challenge. We're three and a half years into this effort and there is no shortcut way to get to where we are, no matter how much money you throw at the problem. Our confidence in our own infrastructure is due to the fact that we’ve built a customer base of hundreds of thousands of users. Until a company does that, they’ll never know whether their systems are going to fall over. HP, with all their resources, is going through the same learning curve that we’ve gone through for the last 3 years.
— Dave
CEO, Carbonite