news The Commonwealth Bank of Australia is currently reeling with internal chaos and some service delivery problems, following what appears to be a disastrous misapplication of an operating system patch to thousands of desktop PCs and hundreds of servers last week.
According to sources, on Thursday last week a patch was issued using Microsoft’s System Center Configuration Manager (SCCM) remote deployment tool. It appears as if the patch was intended to be distributed to a number of the bank’s desktop PCs only, but it was mistakenly applied to a much wider swathe of the bank’s desktop and server fleet than was intended.
In a statement last week, the bank didn’t provide any details of what it said was “a problem with an internal software upgrade”, and played down the issue. It noted that while the vast majority of its over 1,000 branches were offering full services and its ATM, and Internet and phone banking services were unaffacted, about 95 branches were only offering limited services, such as access to automatic teller machines.
“Our branch staff in each affected branch are available as usual to assist our customers with their enquiries,” the bank said in a statement issued on Friday. “Customers may experience some increased wait times in some of our branches and when calling our call centres. Our priority is on restoring all services as quickly as possible and we apologise for the inconvenience.”
However, internally at CommBank, the situation appears to be far more dramatic than it appears to be from an external customer point of view. Late last week, sources said that some 9,000 desktop PCs, hundreds of mid-range Windows servers (sources said as high as 490) and even iPads had been rendered unusable due to software corruption issues associated with the patch.
One of the bank’s IT services partners, HP, is believed to have allocated additional resources in an emergency effort to re-image the servers and desktop PCs from scratch with the bank’s standard operating environment and other platforms where appropriate, with the bank lodging a ‘P1’ highest priority incident notice with the company. Internally, some staff at HP have been told to throw every resource possible at the situation. CommBank’s own backup and restore teams are also believed to be throwing resources at the issue wholesale.
Over the past several days Delimiter has received a number of tips from CommBank staff and the customer’s partners with respect to the issue. “I work at Commonwealth Bank Place and in the last two hours several colleagues have had SOE patches pushed out which have totally killed the machines resulting in No Operating System found messages,” one unverified tip stated.
“Execs have received emails advised that all machines at [Commonwealth Bank Place] are to be logged out and removed from the network immediately,” wrote another. “Apparently a virus has been mentioned, from what I have seen it would perhaps be in the SOE update?” And a third comment was: “Heard the one about the CBA losing 20% of their laptops this afternoon after a mysterious system update rebooted machines without notice and into a system error screen?”
One industry source with knowledge of the situation said they had never seen a situation quite like it in Australia. The problem is believed to have affected around a quarter of the bank’s desktop PC machines.
Thousands of desktop PCs down, hundreds of mid-range Windows servers offline and even iPads being rendered inoperable, with an outsourcer throwing every available resource at the situation and even some bank branches unable to serve customers? Well, I wouldn’t say it’s the biggest IT disaster Australia has ever suffered (most would say Customs’ Integrated Cargo System catastrophe was the biggest), but in terms of one-off enterprise IT outages that interrupt normal business, it doesn’t get much worse than this. I would say CommBank will still be fixing this situation for months to come.
The damage to the desktop machines is bad enough. But it’s in the damage to the mid-range Windows servers that the real pain will be felt at the bank. With email usually stored remotely and backed up pretty rigorously, and most documents these days stored on network drives, it’s normally not that huge a deal to have a desktop PC fleet suffering problems (although to see 9,000-odd machines go down in one go really beggars belief). Even if data is lost, the desktop PCs can be re-imaged and providing basic services to staff pretty quickly — ideally using the same remote deployment tools which caused this headache in the first place.
But servers are a completely different beast altogether. Server admins know that most servers are pretty customised to fit the specific need they’re serving. That configuration usually isn’t something you can easily re-image from a base install package — and meanwhile, although the desktop PCs might be back up soon, the servers providing services across CommBank will still be down, helping to further cripple its operations. I really don’t envy the task of the admins at CommBank and HP for the next few months. I wouldn’t be surprised to find out that quite a few people had their annual leave cancelled as all hands will be needed to resolve these issues.
And all, or so it would appear, from one little operating system patch issues mistakenly to the wrong list of machines. Centralised software deployment and administration is fantastic, of course, but when it goes wrong, it really goes wrong. There will be more than a few red faces and angry meetings about this one before all is said and done. Let’s just hope the bank didn’t lose too much data along with the downtime. And I look forward to the inevitable internal investigation and report allocating blame for this little issue.