On Monday long-time Slashdot reader TorinEdge wrote that Microsoft “appears to have botched an internal Office365 cloud services rollout today, with outages confirmed up and down the West Coast of North America. Confirmed roll backs were good early omens, but in the end did not appear to be successful… Symptoms may include: All 365-related services flaking out, borking, alternately approving logins and confirming they definitely do not exist.”
CRN reported service was impacted for five hours. But on Thursday some users were now intermittently unable to access Microsoft Exchange from 12:52 a.m. until 10:50 p.m., “according to a Microsoft email update to Office 365 administrators…”
“Some partners believe the tech giant is grappling with a DevOps crisis.”“It looks like they are pushing out software updates that are causing the outages,” said a channel source impacted by one of the outages. “They have so much going on right now, rolling Teams out at a breakneck pace. I think they are running into an issue where code tested out fine but there is a configuration problem when they deploy it.”
DevOps is a set of practices that, according to the Wikipedia definition, shortens the systems development life cycle and provides continuous delivery of code with high software quality… A senior executive for one of Microsoft’s top partners, who did not want to be identified, said he sees both recent outages as clearly DevOps-related… “Microsoft is a development first company, well known in general for DevOps, so the question is: why is this happening?” said the executive. “I love Microsoft but why is a company that paid $7.5 billion for Github, the leading source code repository company in the world, getting taken down by code that is not being well tested or has a single point of failure. That is ridiculous. If we caused this kind of production outage for a customer we would be fired and possibly blacklisted from the ecosystem. We have to bat 1,000 as a partner.”
The lesson from the outages may well be that a company’s DevOps is only as “good as the humans who configure it and execute upon it,” said the executive. The executive said the outages will definitely have a ripple effect in the channel. “I bet the Google G Suite sales reps threw a party when they saw this,” he said.
“No cloud vendor is immune to downtime,” Microsoft says in a statement quoted by CRN. "Our number one priority is to get to resolution as quickly as possible and ensure our customers stay updated along the way, as was the case here.
“We continuously invest in the resilience of our platform and focus on learning from these incidents to ultimately reduce the impact of inevitable outages…”