Facebook, Instagram and WhatsApp were all brought down for almost seven hours last week in a massive global outage.
Problems began at around 16:45 BST (11:45 ET) on October 4, leaving users unable to access the three platforms, as well as Facebook Messenger and Oculus, for the rest of the evening.
Facebook, which owns all the services, blamed the outage on a bungled server update and said there was ‘no malicious activity’ behind it.
The US tech giant said the problem was caused by a faulty update that was sent to its core servers, which effectively disconnected them from the internet.
But what exactly went wrong and why did it take more almost seven hours to fix? Here is MailOnline’s breakdown of the issue…
Facebook, Instagram and WhatsApp were all brought down for almost seven hours last week in a massive global outage. The US tech giant said the problem was caused by a faulty update, which effectively disconnected its services from the internet (pictured)
Why did Facebook go offline?
Facebook issued a statement saying the cause of the problem was a configuration change to the company’s ‘backbone routers’, which coordinate network traffic between the tech giant’s data centres.
‘This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt,’ the statement said.
Web security firm CloudFlare offered more details about what happened, revealing that Facebook had effectively vanished from the internet.
The social media company made a series of updates to its border gateway protocol (BGP), CloudFlare’s chief technology officer John Graham-Cunningham said, causing it to ‘disappear’.
The BGP allows for the exchange of routing information on the internet and takes people to the websites they want to access.
It is essentially the roadmap that transports you to the location of each website – known as the Domain Name System (DNS) – or its IP address.
As a consequence of the BGP problems, it meant DNS resolvers all over the world stopped resolving their domain names.
Why were Instagram, WhatsApp and Facebook Messenger also down?
It wasn’t just Facebook that went offline – its associated services Instagram, WhatsApp and Facebook Messenger were affected, too. Some people also reported issues with Facebook’s virtual reality headset platform, Oculus.
This is because the tech giant has a centralised, single back end for all of its products.
Downdetector, which tracks outages, said it was the biggest failure it has ever seen, with 10.6 million problem reports around the world. Pictured, the issues starting at 16:44 BST (11:44 ET) on October 4
Facebook runs its own systems through the same servers, meaning everything needed to fix the problem – from digital engineering tools to messaging services, even key-fob door locks – was also taken offline.
Matthew Hodgson, co-founder and CEO of Element and Technical Co-founder of Matrix, said the outage illustrated the advantage of having a ‘more reliable’ decentralised system that doesn’t put ‘all the eggs in one basket’.
‘There’s no single point of failure so they can withstand significant disruption and still keep people and businesses communicating,’ he added.
How many people were affected?
Downdetector, which tracks outages, said it was the biggest failure it has ever seen, with 10.6 million problem reports around the world.
In total, Facebook has 2.9 billion monthly active users.
Users around the world reported problems with Facebook, Instagram and WhatsApp on Downdetector
The issues started at 16:44 BST (11:44 ET), with nearly 80,000 reports for WhatsApp and more than 50,000 for Facebook, according to DownDetector.
From around 22:30 BST (17:30 ET), some users were reporting that they were able to access the four platforms once again. However, Facebook did not work again for many people until at least an hour after that.
WhatsApp said it was back up at running ‘at 100 per cent’ as of 3:30 BST on Tuesday morning (22:30 ET Monday).
Why did it take so long to resolve the problem?
When Facebook’s platforms went offline, engineers rushed to the company’s data centres to reset the servers manually, only to find they couldn’t get inside.
New York Times’ technology reporter Sheera Frenkel told BBC’s Today programme this was part of the reason it took so long to fix the issue.
‘The people trying to figure out what this problem was couldn’t even physically get into the building’ to work out what had gone wrong, she said.
To make matters worse, one insider claimed the outage was further exacerbated because large numbers of staff are still working from home in the wake of Covid, meaning it took longer for them to get to the data centres.