Virus attack: DR fails NSW ambulances?

14

blog From NSW Opposition health spokesperson Jillian Skinner this weekend comes news of a dramatic new attack on the state’s health system — a “virus” that had infested the computer-aided dispatch system used by the Ambulance Service of New South Wales. Quoth Skinner:

“There’s been a complete failure of the computer-aided dispatch system that allows ambulances to respond, sometimes to critically ill patients. This could potentially cost lives.”

Now it is (mostly) completely understandable that a single system could have been taken down by a virus attack — large organisations have been dealing with this sort of thing for years, after all; especially since almost every system in existence became connected to the Wild West that is the Internet. And ambulances are still going out — using manual operations, according to general manager of operations Mike Willis.

But what we’re really wondering here is why the Ambulance Service of NSW didn’t switch over straight away to its disaster recovery facility — you know, the one it built several years ago, presumably to cope precisely with this sort of of problem? Quoting from a Computerworld article in May 2007:

“The Ambulance Service of New South Wales will procure new data centre facilities and services for the co-location of disaster recovery equipment for its mission-critical computer aided dispatch (CAD) platform.”

Another question might be; what system did the virus actually attack? It is unlikely to be the VisiCAD software itself — after all, it seems unlikely that many people other than high-grade terrorists would bother writing a virus specifically to target an emergency services system. It’s far more likely that this was a general virus which attacked the underlying server platform which the dispatch system ran on; or the desktop systems which were used to access it.

Which begs the question … why didn’t the Ambulance Service of NSW simply switch over to their backup systems?

Image credit: Whrelf Siemens, royalty free

14 COMMENTS

    • Are you saying virus attacks don’t happen and don’t get through? The security vendors have never been able to guarantee they can block everything … zero day attacks etc.

      • I believe that the ability of virus attacks to cause significant downtime and business interruption shows a lack of adequate governance and security policy. A virus outbreak should be limited in scope and impact if you have your systems and infrastructure well defined and properly planned.

        Also worth noting that we haven’t had a failure to detect and isolate viruses since dumping McAfee. I particularly believe in testing antivirus against competing products, and McAfee performed abysmally each time (http://www.autechheads.com/blogs/entryid/179/malware-prevention-it-needs-more-work). That said, we pay a lot of attention to our security, and especially malware. Sadly, I can’t say the same of other government agencies.

        I am however trying to be restrained in my assumptions until more is known about what happened, purely out of respect for their ICT staff.

  1. I’d suspect that the virus (or likely a worm) just flooded their network or attacked services such as RPC/DCOM. While they might have had a backup application site – they probably didn’t have a backup network or backup network services.

    Some dill probably brought their laptop in to while away the time between call outs…

    • Still … an internal virus attacking an internal network; surely they should be able to route around that fairly easily? It’s 2011 … this stuff shouldn’t be rocket science — especially for what should be a hardened emergency services organisation. And the actual terminals in the ambulances actually use radio — not 3G — to my knowledge, so network attacks wouldn’t be an issue there.

  2. Two fundamental governance questions apply:

    What evidence exists to demonstrate that all critical business activities can be promptly restored in the event of any serious failure of IT systems?

    When was the last completely successful proof of prompt recovery and how frequently is this capability reconfirmed?

    These are universal questions that apply in any organisation. They demand an understanding of how the business activities depend on IT, of exactly what aspects of IT are essential for critical business activities and of the success criteria for a test.

    Unclear or equivocal answers should result in at least a formal review of policy and capability and may be cause for obtaining independent advice.

    Any answers that transfer responsibility to an outside organisation (“that’s the outsourcer’s responsibility” or “we use the cloud so that’s no longer an issue”) should be regarded as indications of complete failure to understand the fundamental issues in business continuity and should most certainly be a trigger for obtaining independent advice.

    • I agree Mark — these are the questions that it looks like the Ambulance Service of NSW was asking itself several years ago … I am just curious as to why their systems didn’t stack up in the event of a real-life issue. They clearly were not able to maintain business continuity.

      • Indeed we should all be curious. Its not enough to ask the questions once in a blue moon. These are questions that should be asked regularly and relatively frequently. Every three months should be the norm in an essential service. There should also be supplimentary questions to prevent the development of “routine answers to routine questions”. An independent validation at least once per year is probably a good idea.

        Another angle for governance questioning is to understand what nature of failures are actually covered by the BCP and DR arrangements. Disturbingly often the focus is on catastrophic loss of the primary data centre, with very little attention given to the much more likely scenarios of software malfunction, data corruption and loss of access.

        • Absolutely Mark. The message that ICT has to communicate to executives is that security isn’t a one-off project, it’s an ongoing process of improvement and review – and both the exec and ICT are responsible. Need more emphasis on good governance.

  3. Let’s just hope that, whatever it was, they learn from this particular incident and install a better BCP for next time something like this happens.

Comments are closed.