The following commentary comes from an independent investor or market observer as part of TheStreet's guest contributor program, which is separate from the company's news coverage.
NEW YORK (
) -- By now of course, nearly everyone has heard about the several days of the largely unexplained worldwide outages affecting BlackBerry services around the world.
I was just listening to the
press conference to try and get a little more clarity about what happened. Here are my thoughts/comments about this:
First, I take them at their word that it was a hardware failure of what was supposed to be a fully redundant core switch. Unfortunately, as good as we think we are at designing redundant systems, there are numerous examples of such systems not being truly redundant and failing massively. What did surprise me, however, is that it took so long to rectify the problem.
Second, while no time is a good time for a massive system failure, the time of this failure, given RIM's position in the current market with dropping market share and everyone looking at them with a magnifying glass, means that this failure has much higher visibility than it might have gotten a couple of years ago. That's not to say a failure of this magnitude wouldn't have been noticed. But RIM's current challenges mean increased scrutiny.
Third, we should remember that there is no such thing as a totally fail-proof system. SMS systems have gone down. Gmail, Yahoo! and Hotmail have gone down. Carrier outages, while rare, have occurred. Even iTunes has had glitches on occasion. So we need to be careful to understand all systems can have this kind of issue, and we need to look at the relative up time of the system (although obviously everyone always focuses on down time because of the disruption). The BlackBerry service actually has a pretty good record of up time (99.97% if I recall correctly). That's probably better than most Exchange servers in large organizations and better than many ISPs and web hosting services.
Fourth, while I think RIM should have been more upfront with the public about the problem and done it sooner, I also believe it probably took them a while to determine the crux of the problem. Most systems like this are computer-modeled in great detail before being installed to try and find potential problems. In all likelihood, the failure mechanism involved was not part of any modeling, and so it was hard to find. That is probably why it took them some time to determine what was going on.