#195 – Saumya Majumder on How Cloudflare Outages Impact the Web and WordPress Performance Solutions

Transcript

[00:00:19] Nathan Wrigley: Welcome to the Jukebox Podcast from WP Tavern. My name is Nathan Wrigley.

Jukebox is a podcast which is dedicated to all things WordPress. The people, the events, the plugins, the blocks, the themes, and in this case, how CloudFlare outages impact the web, and WordPress performance solutions.

If you’d like to subscribe to the podcast, you can do that by searching for WP Tavern in your podcast player of choice. Or by going to wptavern.com/feed/podcast, and you can copy that URL into most podcast players.

If you have a topic that you’d like us to feature on the podcast, I’m keen to hear from you and hopefully get you, or your idea, featured on the show. Head to wptavern.com/contact/jukebox and use the form there.

So on the podcast today, we have Saumya Majumder. Saumya is the lead software engineer at BigScoots with a deep specialization in high performance WordPress engineering and advanced CloudFlare powered architectures. Throughout his career Saumya has built large scale systems ranging from custom caching engines, to migration tools, worker based automations, and edge computing solutions. He’s played a pivotal role at BigScoots overseeing enterprise customers, and developing scalable developer friendly solutions that push the boundaries of hosting for WordPress.

We begin our conversation with a timely discussion about a major CloudFlare outage that recently rippled across the internet. Saumya explains what happened behind the scenes, the nature of these kind of global infrastructure hiccups, and why, even with the most robust systems in place, some downtime is simply inevitable. He offers valuable insights into how BigScoots is able to mitigate these issues for their customers, even automating rapid failovers to keep sites online during outages.

We then move on to explore some of the innovations that the team at BigScoots have been working on. They focus upon site speed and reliability. This includes CDN level page caching, and their close integration with CloudFlare Enterprise. Saumya breaks down how this caching differs from traditional server based caching, and how it ensures that users around the world get fast, local access to website content.

If you’re curious about how hosting companies manage such advanced caching strategies and how CloudFlare might fit into the hosting jigsaw, this episode is for you.

If you’re interested in finding out more, you can find all of the links in the show notes by heading to wptavern.com/podcast, where you’ll find all the other episodes as well.

And so without further delay, I bring you Saumya Majumder.

I am joined on the podcast by Saumya. Hello, how are you doing?

[00:03:04] Saumya Majumder: Hey, I’m doing well. How are you doing?

[00:03:05] Nathan Wrigley: Yeah, very well, thank you. So this is going to be an interesting conversation. I got put in touch with Saumya via Tammy Lister, who has been communicating with Saumya over the last period of time. I don’t know exactly for how long. But the idea is that we’re going to talk about what they’re doing over at BigScoots and the interesting innovations that they’ve got.

By pure coincidence, the day before we recorded this, the Cloudflare, I’m going to call it fun, the fun that Cloudflare had with the entire internet happened. And so I think we’ll digress for a bit at the beginning of the podcast and talk a little bit about that as well, which was unexpected. But given that you are working heavily based upon Cloudflare, it’ll be interesting to talk that through.

Would you mind just spending a moment though, just introducing yourself. Just tell us who you are, what it is that you do at your current role, that kind of thing, and then we’ll get stuck into our conversation.

[00:03:55] Saumya Majumder: I’m Saumya. I work as a lead software engineer at BigScoots, specialising in high performance WordPress engineering and advanced Cloudflare powered architectures.

I also build large scale systems from custom cache engine to migration tools, worker based automations, edge computing and whatnot.

I also look after our enterprise customers, all of our internal WordPress projects and plugins and IPs. And I also build scalable, developer friendly solutions for our clients to ensure that they are getting the best service product out of it.

[00:04:29] Nathan Wrigley: Thank you very much indeed. Now, I’m just going to dwell on that for a little bit. A lot of that seems extremely technical, but also it kind of feels like that you went very much down a particular road very early on.

How is it that you ended up doing all of that interesting, but quite specific stuff? How is it that that happened? Is it something that you pursued out of college or something like that? How is it that you went down that path?

[00:04:51] Saumya Majumder: It’s an interesting question actually. So I remember, back in my second year of college, I started doing projects, like outside projects. So I started dabbling with PHP, like at the very early days of WordPress. So I get into the WordPress and I was like doing coding, changing things, pushing things to the core, tinkering with the WordPress. That was like way back in the days of the WordPress ecosystem.

From that, I was dabbling with PHP and other stuff. So that was like back in the days when I started, and then slowly I started seeing problems and how to solve the solution. So for example, a lot of the companies today, like CDN page based page caching, in today’s 2025 it’s like a very, pretty much common thing across the world. If you go to any premium hosting or any premium package, you kind of expect like CDN based page caching.

You know that that wasn’t the case, even like a few years back. It’s like this level page caching or RAM level page caching, like it’s all on the server. So me and one of my friends, whom we met online due to the WordPress coding things, we actually invented the CDN level page caching. So it wasn’t a thing before that. So there was a plugin that we created called Super Page Cache for Cloudflare that got later acquired by a different company called Optimal.

In that plugin we actually looked at like, okay, all the current solutions, like if you break down how the request is happening or how internet works, like you make a request from wherever in the world, that request then travels through across optical fiber cable, blah, blah, blah, to the ISP data center. Then from there it goes to the data center, well, then it reaches the server from there on. If you don’t have cache, then the server has to populate the entire thing, get the response, give it back to you, if you have the cache.

So we were saying that, you know, this is adding like a huge amount of latency, especially if you are, like the distance between the server and you is larger. Back then there was like MaxCDN, KeyCDN, and all of this provider who are like focusing on static files being served from the CDN.

So that was like already a thing, but we were like, okay fine. But like if static files coming from CDN, that’s great, but the main leap frog forward is if we can move the page. Like, literally serving the page HTML from the CDN itself. So if you are in Australia, the request doesn’t have to come to the US. Like, if it’s cached, it’s literally coming from your neighborhood.

So caching was one of the most complex problems that I kind of always loved solving because it was one of those unsolvable problems in the computer engineering world. So that’s how I like get into it, and then started. I broke a lot of things and fixed them and it’s like a journey. It’s hard to explain, but it’s like a journey of a lot of failure and a little of success, I guess.

[00:07:37] Nathan Wrigley: Yeah, I can imagine. Do you ever get the sense that you are approaching the destination or is this whole thing just, I’ll do this and then I know that in a week’s time, there’ll be something else that I can optimise. Is there ever a moment where you’ve thought to yourself, okay, that’s it, we cracked it for now? Or is it always just, no, there’s another thing?

[00:07:55] Saumya Majumder: It’s always a process, right? The technology is evolving. There’s way, way more to dig deeper. So one of the things we recently released was end DB protection caching. I’m going to talk about it in a moment and also login user caching. Both of these things were in my bucket list for years, and I have done like R and Ds, and R and Ds, and R and Ds to figure out exactly the way to do things. So again, you know, like it’s a process, right? And it takes time.

[00:08:20] Nathan Wrigley: Yeah, that’s lovely. Like you say, we’ll get into those bits and pieces. But as I said at the top of the show, by pure coincidence, we had this, let’s just call it a real collapse in a sense of what Cloudflare provides to the internet as a whole. And I think, depending on where you were and when you were awake in the world, I think for Europeans and maybe the part of the world where you are, it hit us right at the time when we’re all awake. I think maybe if you’re in North America, especially on the West Coast, you might have missed much of it.

But for most of the day here, everything on Cloudflare just declined to work. And it was really interesting how profound that was. And we’ve all heard this problem before. We’ve seen the little drawing of the great big tower built of Lego bricks, and there’s the one little brick at the bottom holding the whole thing up, and it’s called Cloudflare, or it’s called AWS or what have you.

Can you explain to us what the heck happened yesterday? Are you able to sort of get into, do you understand it at this point?

[00:09:15] Saumya Majumder: Yeah. So internet is a magical thing. It works by magic. If I get into explaining how it works, it’s going to be another thing. But the way it works is, and especially in case of Cloudflare, right? Like, a lot of people look at Cloudflare, that it is a CDN provider, like MaxCDN or Akamai or like any of these providers. But CDN is just one bit of Cloudflare. Cloudflare is like a, such a gigantic service that is like built on top of it.

So as a result, what happens is, when you have such a big system working together, there are lots of critical dependencies that happens. You have all these boxes, but all these boxes are depending on one of these config file, or one of these things that is coming from the layer below that, right?

And if anything happens in that one thing, the things at the top are working fine, but it cannot work because the one thing that is below it is gone.

I would also like to say there is no such thing in the world of internet that just works. Everything is supposed to break at some point in time. There’s no such thing. Be it Google, be it Azure, be it AWS, Cloudflare, anything it is. Even if you have your own data center and everything like that, like we have, there’s no way that, like a lot of things can happen even after you are prepared to mitigate all of those things, like you have follower, and a follower, of follower and all this backup system, still things can go wrong. Maybe that didn’t turn out, maybe that didn’t happen.

I saw a lot of memes yesterday on Twitter, like a lot of people was posting like, hey, I just joined as an internet Cloudflare. I pushed a code and that happened. And I understand that it’s funny, but when you look deeper into it, it is actually not funny. It is really like a code red scenario. And trust me, no one, no company wants to get into that code red scenario. Because you have to understand, all of these companies also dealing with a lot of enterprise customers to whom they have promised like 100% percent SLA or 99.99% SLA. And so when they don’t meet that, they have to pay a hefty amount of credit back to them.

So it’s not just the downtime and bad reputation and marketing and all that, it’s literal money being bled out of the company because of that. And it’s like all of those systems.

But at the same point in time, the way technology works, things can mess up. You can do multiple tiers of review of the code, you’re still going to miss a certain edge case scenario, which will only occur if this happened and that happened. And the probability of that happening is probably 0.00001%. But that 0.00001%, it’s not zero. It can happen.

In the world of engineering, we call certain things that are super low priority, like it’s never going to happen. I’m not saying that it cannot happen, it can happen, but the probability of that is so low that spending engineering hours on that at this moment, where we have much more critical things to do, it doesn’t come up, right?

But sometimes things happen. And as a senior engineer, it happens like this. And in case of Cloudflare, what happened is as this is like a such a big system, even if they identified the root cause, let’s say that takes some amount of time for the engineers to figure out, and they push that. And you have to understand, a lot of people are sending requests, requests are going down, and they figured out the root cause. They’re pushing the fix and then like a boatload of requests is coming to Cloudflare.

So it takes time for everything to stabilise, you know? So it is bad. It is bad, but anyone who is thinking like, oh, Cloudflare is bad, if I move from Cloudflare to, I don’t know, X, Y, or Z, or something like that, it won’t happen. I haven’t seen, like a Tweet yesterday where somebody said, send cold emails to people saying, Cloudflare is down. But we don’t use Cloudflare, we use our own VPS and dedicated server for that. And I was like laughing out loud. I’m like, I understand that, you know, your data center did not go down, but that does not mean that it can never go down.

[00:13:06] Nathan Wrigley: It’s kind of guaranteed. I think one of the interesting things that I saw was in the mitigation, the sort of summing up posts that Cloudflare created, there was this whole thing about this unexpected file which kind of doubled in size. It was supposed to be this size, but it doubled in size, and that got propagated. And then for a period of time, the ripple effect of that was that it looked like a DDoS attack. For a period of time it looked as if it may have been malicious actors.

And so the Cloudflare engineers, I think kind of went off, as it turned out, wrong headedly. They went off in the wrong direction, searching for the problem, which probably added a number of hours to the mitigation, and then kind of figured out what was going on. And then, like you said, the whole ripple effect is, it’s not like you turn off a computer, switch the computer back on, and Cloudflare is restored. There’s this whole propagation thing where you find the problem, mend the problem, the problem mitigates, and that is presumably going to take hours and hours and hours. And then you could just see the sort of downtime reports slowly repairing themselves over the internet.

[00:14:08] Saumya Majumder: And you have to understand that, as I said, Cloudflare, people think of Cloudflare as a, either a security company or a CDN company. But Cloudflare is way, way, way more than that, right? The CDN backbone that they have, it’s literally their backbone, the powerhouse on top of which Cloudflare builds their own thing.

So anytime they find a fix of which they call their control plan, you know, pushed the fix to their control plan, that has to get propagated across all of their end edges. And Cloudflare has the highest number of CDN PoPs, you know? So it has to get pushed across all of these places, rebooted and all of these crazy things has to happen in order for everything to go properly. And then all the burst of traffic that is coming on that it has to handle that. It is a crazy thing.

But one of the things that I liked about Cloudflare is that, it’s not that this is the first time Cloudflare had a global outage. They had global outage before as well. There are two things I really love about Cloudflare.

Number one is that they’re super transparent. So anytime things go wrong or situation like this happens, they always push like a detailed blog article explaining exactly what happened, what they did to fix it, and how they’re making sure that this does not happen again in the future. And it never happens in the future.

So if you look at the previous global outage that they had, I think back in June, it was caused because there’s a thing called Cloudflare KV, which had a dependency on GCP. So when GCP went down, so KV went down and as a result the system went down. And from there on, they’re now working on to remove that dependency, building things internally in house to make sure that doesn’t happen.

Previously, there was another, I think last year or something like that, another global outage where the entire main data center went down. There was like multiple failover but the generator didn’t start and then this didn’t start and that didn’t start. And that caused like a huge failover scenario, I think, if you remember that, right?

And from there on, they make sure that, okay, we now have to make sure that we have multiple, that scenario is never going to come back. So they always work towards to make sure everything that happening never happens the second time. And it really does that. But at the end of the day, in the world of technology, things can go wrong. It’s just how it is.

[00:16:11] Nathan Wrigley: What’s kind of curious though, from an end user’s perspective, and you are going to explain to us some of the complexities of the inner workings of BigScoots and how it combines with Cloudflare in a minute, and that’ll be really interesting. But from a non-technical user’s point of view, it just feels like the sky is falling in because so much of the internet has collapsed, so many things that they’re familiar with.

So just a couple of examples which many people would be familiar with. So for example, if you were a user of the social network X, that completely failed. There must be a dependency on Cloudflare at some point there. Also ChatGPT, which is now becoming almost, it’s just a thing which almost everybody at some point of the day is plugged into, that went away.

But then it just rippled out across so many other things. News organisations go down. The ability to log into a variety of things went down. So it may be that your platform itself worked, but you might have had the the Turnstile sort of capture system, which Cloudflare run, enabled, and nobody could log into the proprietary platform that you got because the Cloudflare portion, the Turnstile wasn’t working and so on.

So it just had this enormous effect. And the sort of chilling effect of that is that people then, erroneously I think, sort of view Cloudflare in some way as a bit of a, I don’t know, a giant that needs to be brought to heal in some way. You know, we can never let this happen again, there’s too much dependencies on these small group of massive organisations and what have you.

But by today, everybody’s forgotten that, you know, they kind of moved on with their lives and we’re back to what it was like on Monday. And so there’s no question in there, but I think there’s some insight that I’m sharing.

[00:17:41] Saumya Majumder: Oh yeah, absolutely. So there are a couple of very important things to understand here, right? So first of all, as you said, the people who talks about these kind of things on the social media, trust me, either they’re not engineers, senior engineers, or they don’t understand the problem.

And so these are the people who talks about this exact same thing where a few weeks back AWS went down, and then a couple of months back, GCP went down. And then they were like, well, Facebook went down, they literally just use this exact same word every single time something goes down. But things can go down. That’s like, you have to accept that and move on.

And that’s why when you get onto these enterprise deals with these big companies, they have this SLA agreement, like where they say, we grant to you, as I told you about earlier, right? So all of these companies, GCP, AWS, Cloudflare, if you are like a big enterprise customers of them, you have like an SLA agreement with them. Where they say, okay, we are going to guarantee that we’re going to give you 100% uptime, or 99.999999% uptime. And anytime they miss that mark, they have to pay back a huge sum of money as a credit to the customers saying, okay, we missed on our contract, so this is that credit back to you.

So you have to understand that anytime situation like this happen, it is not only a bad thing on the companies, on the marketing front of it, but it is also a bad thing on the financial side of things. Because you have to understand like all of these big companies, there are these smaller clients who are dealing with companies like, there are smaller clients and there are like giant clients, the enterprise customer who companies are really worried about. And for these giant clients, they have to pay huge amount of money back as credit because things didn’t come back within time. So it is not something that they are not worried about to fix immediately. They’re literally trying as hard as possible to fix that.

So that being said, now talk about the other points that you brought up, the turnstile, the WAF and the other things, right?

So as I said, Cloudflare is not just a security company. It’s like a huge thing. Cloudflare has a thing called Developer Platform where you can literally deploy your own APIs, your AI workload, your workflows, your entire React or entire application on Cloudflare, which is amazing. I use it. I love that platform.

And then that is one side of using Cloudflare, and then there’s another side of using Cloudflare like, for example, using BigScoots. You have let’s say a WordPress website that is hosted on BigScoots, but it is being proxied via Cloudflare to leverage their CDN, their security and all of those features.

So in a scenario like a WordPress site where you are not using Cloudflare as your host, so your Cloudflare is just there as a proxy, making sure that your origin IP is not there, your site is super protected and performance and CDN and whatnot. In that scenario, anytime this kind of problem happens, you can kind of, when this outage was there, the API was still working and we actually, for all of our customers, we leveraged our API to make sure that any request does not proxy via Cloudflare, but instead it just goes directly to our server just for the moment in time until Cloudflare is back in the game.

[00:20:42] Nathan Wrigley: Oh, so you could turn the proxy off via the API.

[00:20:45] Saumya Majumder: Via the API, yes.

[00:20:46] Nathan Wrigley: Right. So the fact that the rest of us couldn’t log in because Turnstile was down, we couldn’t authenticate into the Cloudflare network on the web. The API was still available, so you could turn the proxy off for a variety of your customers, and the domains and the websites that they had.

Oh, that’s really interesting. So they had a few minutes of downtime. Okay, that’s fascinating.

[00:21:04] Saumya Majumder: So what we did is when we saw this outage happening, anytime requests are coming in, it was a code red scenario on our end as well. All hands on deck. So anytime requests are coming in, like people are having problem, we immediately turned on the proxying API to make sure that this site is up and online.

So that way the request is not going via Cloudflare anymore, it’s coming directly to us for the moment, until CloudFare is back on track. And that helped us to mitigate the downtime as much as possible for the customer, even though Cloudflare was technically down.

But if you would have been hosting your Nuxt or React or Next.js kind of application on Cloudflare, where you are using Cloudflare workers and things like that as your host, in that scenario, you couldn’t push anything.

[00:21:49] Nathan Wrigley: Yeah, the API is not going to help you.

[00:21:51] Saumya Majumder: Yes, yeah. It was bad but it’s going to happen. It can happen.

[00:21:54] Nathan Wrigley: Yeah, I think that’s kind of the message, you know? Nothing that humans create is immutable. Everything has a moment of breaking. But, you know, if you were to cast your mind back until, well, just Monday when everything was, you know, just plain sailing, Cloudflare was working as normal, then everybody was entirely happy. We had this period of time, it was maybe something like 8 hours where everybody’s kind of throwing their arms in the air and, you know, moaning on whatever social networks are still working.

But now we’re onto Wednesday, that whole thing is long behind us. That ship sailed, whatever, move on. Confidence, I think basically what you’re saying is you can be confident in Cloudflare. They’re going to have hiccups because they’re like any other company, things will go wrong.

[00:22:33] Saumya Majumder: Everything can have hiccups. So it’s not just, so you have to understand this, right? Again, I’m saying that Cloudflare is not just a CDN provider, but if you look at Cloudflare and all the things that they do, the complexity of it is like mindbogglingly crazy, you know? Like it’s immense, immensely complex. It makes things super easy for you. Okay, you just toggle this on and it’s done. But if look at under the hood, and all the things and chains it has to go through, and that happens in a blink of milliseconds, it’s crazy complicated.

As I said, right, like I’m not saying that Cloudflare is bad. I think Cloudflare is amazing because two things, they have super transparency, so anytime anything happens, the blog article that you are like referencing here, they didn’t hide behind anything like, oh, it was not my problem, like not doing the blame game thing. No, no, no. Like, it was our problem. This is the problem.

For example, in that blog article, they could have completely, don’t talk about the DDoS thingy, right? They could have just said, oh, this was the configuration file problem. We fix this, it’s done. But no, they actually literally walk you through how exactly they process the problem, which is really great. And then they actually learns from their mistakes to make sure that particular mistake never happens again, while they are like growing rapidly and building things, pushing things like crazy, like always pushing new things, which is like amazing to me.

[00:23:50] Nathan Wrigley: I think the article even started, if it wasn’t the first set of words, it was definitely in the first couple of sentences. It was something like, we let you down. It was full ownership, I think. So bravo to them.

And you’re right, the complexity behind it, you know, like you said earlier, the internet, the fact that anything works on the internet is an utter miracle of engineering, of computer engineering.

You know, the fact that we’re on a platform that we are staring at each other. I can see your image, you can see my image, you can hear my audio, I can hear your audio. You are on the, a different side of the planet, but it’s happening like you’re stood next to me. And the millions of packets of information that have flown during the course of this conversation, it’s insane. And Cloudflare add a whole layer of other stuff on top of that, which makes it even more insane.

[00:24:33] Saumya Majumder: Yeah. And you have to ask the question, like, why all these big companies are using Cloudflare like if it is so bad. Because they are doing things that nobody else even think about doing at a scale. And it’s like mindblogglingly crazy. It’s crazy.

[00:24:46] Nathan Wrigley: Yeah, yeah, it really is. So we’ll leave that for another day. But obviously over at BigScoots, you’ve really attached your wagon, if you like, to Cloudflare. And when you agreed to come on the podcast to talk to me, it became obvious to me that the pay grade that you are at is very different to the pay grade that I’m able to keep up with.

So we’re going to talk about what you’re doing over at BigScoots. I’m going to try to keep up, but if I misunderstand something, or I have to ask you to repeat something, I hope that’s okay with you. But I’m just curious because Tammie Lister, like I said at the beginning of this episode, she’s somebody whose opinion I respect a lot, and she said that you are doing some really innovative, interesting things with your connections to Cloudflare at BigScoots. So just lay out some of the interesting engineering work that you’ve been doing. I’ll try to hold on.

[00:25:30] Saumya Majumder: First I’m to Tammie is great. Tammie is amazing. But yeah, I mean, I think BigScoots have been one of the first to utilise Cloudflare Enterprise in the hosting world. I know we didn’t do any kind of huge marketing like other hosts, but we have been the first to leverage Cloudflare Enterprise in our hosting ecosystem. And it was such early days, like back then, all of these things, this market wasn’t there. So we were building things that people didn’t even test it out.

So as I said in the beginning, like I, along with one of my colleagues, we invented the CDN level page caching. This is way before APU and all of that. So all of those things actually build upon the architecture systems as we build on, including APU and the workers and stuff.

So at BigScoots, the Cloudflare thing, especially the Cloudflare Enterprise thing opens up a whole new door for us because it now allowed us to provide CDN level page caching for every single user at a super high cache hit ratio. I mean it’s like, every time you hit a page, chances of that getting, coming out of cache is much higher, compared to if you are, or like a free plan or any other plan, right?

So that was the beginning. And on top of that, we build our own proprietary plugin called BigScoots Cache, which allows you to not only leverage and take advantage of the Cloudflare page caching, but giving you the ability to fine tune every aspect of page caching that you would like on webpage.

[00:26:56] Nathan Wrigley: I’m going to pause you right there. Firstly, because I’m sure that almost everybody in the audience, because their WordPress aligned, is going to understand what a cache is. They’re going to understand this process of kind of, okay, let’s remember something for next time so that when we need it next time, it’s kind of ready. But they may not understand how Cloudflare does this on their Enterprise plan.

So what is it that’s different? Because we may be familiar with, I don’t know, a WordPress plugin and we’ve got some idea that there’s a cache. It’s sitting on the server somewhere in a file, it’s an HTML file or something like that. You are describing something not in one location, but like really just spread globally so it’s ready at the point of least distance from wherever somebody is. So tell us a bit more about that.

[00:27:35] Saumya Majumder: So let me explain that with like an analogy, right? So before CDN level page caching, I think pretty much everybody would remember, like we used to have caching plugins. I’m not going to name anything, but they were caching plugins. So when you turn them on, what they essentially did was they would create like an advanced-cache.php. You have everything of that file inside your WordPress installation.

What that used to do is, when you send a request, let’s say you are in Australia, right, and your server is in US, so you want to open example.com, and that requests flows through under the ocean, it goes to the data center, it goes to the server, the server receives the request, it started processing that, run all the database queries and all of that, and then it got the HTML to show it to you.

Back then what it used to do is then, advanced-cache.php would kick in, it would create a copy of the HTML, store that locally on the server so the next time if someone requests for that page, instead of asking the server, hey, please process the PHP and database and all of that, it would require much less amount of server resources because it’s just like, WordPress is like warming up. The request goes to advanced-cache.php, then it says oh, I have that cache file, sends that cache response back to you.

But even in this scenario, if you are making this request from Australia and your server is in US, you have to understand that the latency is very high, because the request has to go from Australia to US and then whatever gets there is, you know, response from there and come back from US to Australia. So the traversing time is pretty high.

From there on, and back then we are thinking about MaxCDN, you know, KeyCDN and like putting static files on the CDN so that, yes, the page is being generated by the server, but the static files are being served literally where you are. Like, if you are in Australia, in Sydney, so maybe the CDN PoP in Sydney is like, when you make a request for that, the static file is coming from Sydney.

That’s where we thought about, what if we can put this page HTML, instead of in the server, we can put it on the CDN? There were two benefit out of this. First, it is in insanely fast. Because if this page HTML is across the world, so if you are in Sydney making the request and the request is like, oh, okay, I have this page cache to me, here you go, the response, you get that in like less than 100ms, you know?

Same thing happens for someone sitting in India and Germany and some other places of the world, because it’s cached across the globe. So it’s not just coming from a single place. And anytime it is not cached, the request goes to the server, HTML processed, and by the time the response is sent out, it got cached. It’s cached across the world.

Now, that was the page caching part of it, right? And then there’s other things, the object cache and OPcache, that’s like whole another different level. But I’m not going to get into that. I’m just going to stay with, because then it’s going to get way too long.

So that’s where this object caching and Cloudflare Enterprise came into play, right? Cloudflare Enterprise then allowed us to make sure that we can cache all these pages across the globe with a very high cache hit rate. Cache hit rate means, when something gets cached somewhere, let’s say someone makes a request to that file and that cache is expired from there and it’s not there. So the request, again, has to go to the origin and get processed and come back to you.

So that is generally the case with the lower tier plans with Cloudflare. So with Cloudflare Enterprise you get a very high cache hit ratio. So when it’s getting the cache, it stays on the cache for a very long time. On top of that, we got tiered cache and regional tiered cache and all of those crazy things.

Which that means is, we have tiering systems. So when you make a request, the request first gets cached in the upper tier. And when a lower tier, so let’s say, how can I explain this to you? So let’s say you are in Phoenix, okay? And in Phoenix there’s a data center, or a PoP that is called, in case of CDN, a PoP is there in Phoenix but the upper tier PoP is Chicago.

So let’s say someone made a request from Chicago, the page was cached in Chicago data center, okay? Now, as we have this tiered cache system, when you, from Phoenix, is making the request, instead of that PoP directly sending the request to the origin, it would first internally within the intranet of Cloudflare, not the internet, okay? The intranet of Cloudflare. The internal network like, hey, does anyone in the upper tier has this page cached to you? And if they say yes, they would fetch it from the upper tier, which is like crazy fast because there’s no traffic, and it’s like a internal network of Cloudflare.

And if it does not, then it pass on the request to the upper tier, because the upper tier is the only one who has the power to pull the request from origins. It goes to the upper tier. Upper tier pulls it from the origin, creates a copy, and it’s upper tier, and then send it back to the lower tier. So in that way, in the tiered architecture, it makes sure that the cache hit ratio is insanely high.

[00:32:24] Nathan Wrigley: Let me just sort of read that back to you just to make sure I’ve understood. And I’m imagining that, the simplest way my head is understanding that is a bunch of concentric circles. So in the center is me, and I wish to find something on, let’s say, the outer circle. So the first thing I’m going to do is go to my inner circle, and if the inner circle doesn’t have it, we need to go to the next circle out, and the next circle out, and the next circle out.

Now in the old world, if you like, or the non-enterprise version of Cloudflare, at some point we have to go further out of the circles in order to find what it is that we’re looking for. But what I think you are saying is that on the enterprise level, that outer circle is constantly pushing things towards the inner circle on a much more local basis. So rather than having to go out circle, another one, another one, another one, it can just hop one circle out, get what it needs, and then hop right back. In other words, every single thing is always closer, geographically, than it would be in any other setup.

[00:33:22] Saumya Majumder: Yes, and on top of that, if you look at the opposite architecture of this, right? So imagine you are in Phoenix, Phoenix doesn’t have it in cache. Phoenix sends a request to origin, now someone from Mississippi makes a request, they don’t have it in cache, their PoP makes a request too.. So all these PoPs are making requests to the origin because they don’t have it in their own local cache, which is bad because that would then mean the request to the origin would increase dramatically, which we are trying to reduce.

But in this sense we have, imagine like a fixed set of upper tier data center, then we have like a middle tier and then the lower tier, right? So if lower tier doesn’t have it, it asks the middle tier, middle tier checks if any of the middle tier across the world have it. If they do, immediately send it. And that’s happening within the internal network of Cloudflare and not on the open internet, okay? It’s like crazy fast.

[00:34:11] Nathan Wrigley: Right, okay. So again, forgive me, I’m going to make a leap of faith here, I could have this wrong. I’m guessing that on the Cloudflare side, they have their own bespoke hardware to route all of this stuff. So like you said, you described it as an, it’s like an internet intranet, almost, the scale that they’re on. But they’ve got their own hardware, which will be able to route that information presumably more quickly, and with less, I don’t know, less latency than you and I might have.

[00:34:36] Saumya Majumder: Yeah, it’s a intranet, it’s not internet. It’s a private channel, right? So no one talking there except for Cloudflare. And the best part of that is, so imagine let’s say you are making a request from Mississippi, and there is like a upper tier data center in Mumbai, India, right? So what happens is, even though it’s not cached in US, it’s going to see that, okay, I have it cached in Mumbai, let’s take it from there instead of making a call to the origin, reducing the call origin, yeah.

[00:35:05] Nathan Wrigley: Okay, that bit I didn’t understand. So the entire network is aware of where the closest thing is even before it needs to have it. I got it. Okay. That’s fascinating. And do they own the cables? Do Cloudflare own the cables connecting these things?

[00:35:18] Saumya Majumder: Yes. Yes, they have their own data center, their own backbone, all of that. And on top of that, like at BigScoots we even have direct physical connections to Cloudflare service. That’s called CNI. That’s like a next step. So again, let me kind of paint a picture. This is you as a user, right? This is Cloudflare sitting in the middle, acting as a reverse proxy, and this is origin, okay?

So the way it works is you make a request, right? So let’s say you, a request is received by this in a reverse proxy Cloudflare. Then it process that thing, whether it has to show you a WAF page, whatever the logic is, right? Does it have it in cache and all of that? You know, if it is not being blocked or challenged, do I need to show it in cache? Do I have it in cache? You’re talking to the internal network, all of that. And that’s happening in this middle tier, right?

And this middle tier is now connected to their entire Cloudflare chain, right? So if, let’s say Mumbai has it, and it pulls from Mumbai, give it back to you. So the request never goes to the origin, right?

Now, for whatever reason, you make a request to Cloudflare, Cloudflare checks it’s internal network, it doesn’t have it itself, so it has to make a request to the origin, right?

There’s the interesting part. This bit of connection that is you and the Cloudflare, that’s happening over the open internet, right? Because like you making and the request goes by the open internet and lands to Cloudflare, right? And then this is your origin, so your Cloudflare to origin, right, that also generally happens by open internet. Cloudflare then makes a request, and that request goes by the internet and, you know, lands on the data center.

But here’s the magical part that we have done. As we run and own our own data center, what we have done is we have connected a physical cable, like literally optic fibre cable with super insanely high bandwidth with the Cloudflare servers, with our servers. So what happens is, anytime Cloudflare has to fetch something from our origin, instead of sending that request by the open internet, which could be slow, there could be congestion and whatnot, it then sends via that private network that we have created, that private optical fiber cable and lands directly to our origin. Like, oh, this is hosted on BigScoots. We need to talk to BigScoots. Okay, send via this channel, which is not part of the open internet. And boom, it gets there, comes back, it’s like insanely fast.

[00:37:32] Nathan Wrigley: Okay. How did that happen? Like, is that some sort of agreement that you have struck up directly with Cloudflare so that you can tap, you know, in a sense it feels like you’ve become a third party piece of their network infrastructure almost.

[00:37:47] Saumya Majumder: Think of like, if Cloudflare is like a one gigantic network, our systems are also plugged into their network so that they can use the intranet system to fetch data directly from us, instead of using the open internet, which is much slower, there could be congestion and whatnot. To making that request between the Cloudflare, the proxy and the origin, making that instantly fast.

[00:38:10] Nathan Wrigley: So how did that whole thing come about? How is it that you fell into this agreement? Because I don’t know if many other organisations do this, you know, outside of the web hosting space, maybe this is a typical thing where you could follow a roadmap from another company that had done it. I’ve not heard of this, so that’s kind of interesting. How did that relationship come about?

[00:38:26] Saumya Majumder: If you don’t run your own data center, it is very hard to do this.

[00:38:29] Nathan Wrigley: Yeah, I do not.

[00:38:30] Saumya Majumder: Yeah, because you have to literally connect your servers and routers and everything to the Cloudflare network, you know? So most of the hosting companies out there, they don’t run their own data center on their own space. They actually lease, what I call lease their hardwares and services from other cloud providers. Whereas we run our, you know, our private cloud, our private system, our own data centers, you know?

So like, for example, some company could use AWS or GCP or Azure and then create their own flavor of it and run Cloudflare through it. So they actually don’t have physical access to those data center’s other servers. Whereas we do. If we see something, we can literally pull up the drive, we can do things at our data center, we can change things, we can attach those things physically, which pretty much none of the hosting provider that I know of has access to.

[00:39:19] Nathan Wrigley: It’s so interesting. Honestly, we could go on about this for absolutely ages. But basically, the long and the short of it is, you’re making things as fast as it’s possible for electrons to be. In a distributed network where some things don’t know things, and other things do know things. It’s all an enterprise in trying to figure out how to make it so that everything knows everything as fast as it is possible for electrons to fly around through the optical cables that there are spread throughout the world.

[00:39:47] Saumya Majumder: I haven’t even described the servers.

[00:39:47] Nathan Wrigley: I’m nowhere near finished because I want to get into what it’s like for somebody using, we’re a WordPress podcast, so I guess at some point we need to sort of grind it into that. So how would it benefit just some normal human being who’s got a WordPress website? What does all of this clever technology that you’ve created and that you’ve combined with Cloudflare over at BigScoots, what does it bring?

[00:40:09] Saumya Majumder: It brings insanely fast speed. Insanely fast speed, super improved Core Web Vitals, and super DDoS products and all of that. It brings all of that. And I don’t want to talk about this kind of things, which I know the audience might not be interested about. I want to talk about more other interested things that the users can use.

So I was talking about BigScoots cache, which is our own IP, right? So we created our BigScoots cache plugin, top two are manage this entire Cloudflare caching system to work with that. And not just that, it gives you, if you are an advanced user, it literally gives you the ability to fine tune and manage every aspect of caching system that you want, every aspect of it.

So let’s say for example, we by default set the cache TTL, CDN cache TTL to let’s say X, but you have like a bunch of pages where you want, I want the TTL to be lower. There’s a hooks for that. You can use that.

Or maybe, let’s say whenever we have intelligent cache purging systems. So whenever you push up to create a post or update a post or something like that, what happens is anytime you push that button, like publish or update, behind the scenes the BigScoots cache plugin intelligently, not only clearing cache for that particular page, but it also knows all the other important pages like taxonomy pages, like archive pages and all that, like author pages that are linked to that article, and then clearing cache for those as well.

So you can also use other hooks. So let’s say you have some fake archive pages that we have seen a lot. Let’s say you are using a theme where you are showing list of articles on a page, which is like technically a page where you are using like a short code, which is not like a real archive page. So the system doesn’t recognise it as an archive page, but you want to clear that page cache whenever something of this tag or this category is published. There’s a hook for that. You don’t have to do that yourself. If you come to us and tell us like, this is our problem, this is the problem, we can actually write the code for you and do it for you. Like, we can literally just set that up for you. We provide like fully managed system.

[00:42:10] Nathan Wrigley: So I’m guessing that the level that you’re at there is you’ve got to have a fairly deep understanding of the sort of caching infrastructure, or would what you are offering be available, not necessarily to deploy, but could anybody understand this with a rifle through your documentation or is it fairly, propeller hat, tinfoil hat stuff?

[00:42:28] Saumya Majumder: We have like a proper documentation for every single hook there is. At the very top we talk about, like this is for the advanced audience. And if you don’t know what hooks are and things like that, it is going to be hard for you to understand what’s going on. But if you know, if you are familiar with actions and filters and things like that, it is going to be pretty straightforward for you.

So that’s why I said, if you don’t know, but you have a problem, and that happens a lot of time, people come to us, we just literally just write a snippet and just make that happen for them.

So you don’t have to know all of that crazy things, you know? It’s there if you are an advanced user, the documentation is there, but if you are not, it’s also there. On top of that, BigScoots cache has its own REST API, which you can use to clear cache, like you can literally use BigScoots cache REST API to clear cache. Imagine you have built like a Laravel system, or some backend system where you are adding something to your e-commerce site and you want to clear cache. When that happens, you can literally leverage BigScoots cache REST API to do that. So that’s like the, on the end of BigScoots cache. Then inside our BigScoots portal.

[00:43:34] Nathan Wrigley: Ah, that was where I was going next actually. Go on, yeah.

[00:43:36] Saumya Majumder: Yeah. We have, I think we have the most advanced and fine grain control to Cloudflare Enterprise that no one else in the industry provides. So I don’t know if you got a chance to look at our enterprise settings page. We really allow users to fine tune things exactly the way they want. So for example, let’s say you, do you want to protect your login pages from bad bots and actors, so that they can’t DDoS that? There’s a toggle for that. Turn that on, it’s done.

You want to enable our own advanced hardening production, which is not using Cloudflare hardening production, it’s using our own proprietary algorithm for that. You want to use that, feel free. Turn on, that toggle is there.

You want to change your image optimisation settings, do that. You want to enable Rocket Loader to every single thing starting from cache settings, speed optimisation settings, there are like bunch of things that you can play around with. You want to block AI bots, do that. You want to block bad bots, like manage, challenge bad blocks altogether, just turn a toggle, it’s done.

So we have so many settings there. I think, if you go take a look at just that settings, you would be blown away. Like, all the things that we allow our customers to customise and fine tune.

Let’s say, for example, you want to block requests from certain countries or continents, and now settings is there. Just choose the countries or continents, requests are blocked. You want to manage, challenge, you don’t want to block, you want to challenge the request from certain countries and countries, you can just go to the settings inside our portal, choose the contains and countries from where you want to challenge. So you could have a combination. So you want to block requests from these countries and continents, challenge from these continents and countries and don’t do anything for the rest of them. So you can play around with this to a whole new level, like you can just do anything you want.

[00:45:19] Nathan Wrigley: It’s absolutely fascinating. And it kind of makes me feel that your target audience would not be really the bricks and the mortars shop, the mom and the pop website?

[00:45:27] Saumya Majumder: There actually are. Yeah, like you you won’t believe how many times we have got a request like, hey, you know what? In our analytics, we are seeing that we are getting a lot of requests from Thailand, and that’s like broken our tools like that, so I want to either challenge or block that. So we are like, you go to the settings, choose the Thailand, click save, it’s done. So it’s like as simple as that.

[00:45:45] Nathan Wrigley: Yeah, I’m kind of imagining though, that you are kind of ideal customer, for want of a better word, maybe that’s the wrong wording, but would be kind of agencies, WordPress agencies, that kind of thing, who could obviously make use of this. They’ve probably got teams of people who can dedicate time to figuring out how BigScoots works, and maybe having a constant conversation with you to optimise the websites that they’ve got and, you know, maybe some of their clients are what we might call enterprise clients and things like that.

If that’s the case, there’s always this merry dance of agencies trying to find the perfect host and kind of figure out, okay, which company do we want to go with this year? And all of that. Do you make it straightforward for people to sort of come to you and say, okay, we’ve got 150 websites, it’s really important that we don’t have any downtime? Do you have some sort of onboarding, migration, something along those lines?

[00:46:30] Saumya Majumder: So we have a lot of enterprise customers, and for every single one of them we have a proper systematic onboarding flow. So that’s making sure that they do, we do migrations with zero downtime, have multiple peer reviews. Then if they have taken our performance optimisation packages and things like that, we would actually optimise their performance and speed metrics for them. And then if they have taken our engineering and services projects, then we would actually do all the, like if they have any technical problems, we would actually go on write code for them, solve their problems.

So we go very hand in hand with our enterprise customers doing onboarding call, making sure they’re happy from end to end. And whether that’s agencies or just normal enterprise customers, it’s for all of them.

And I also want to talk about the settings that you just talked about. So we build all of these things, keeping in mind that they are dead simple to use for anyone. But that doesn’t necessarily mean that they have to do it. A lot of the times customers comes to us and like, hey, we want to do this. As we provide managed support, we actually go into the exact same settings and do that. And that actually solves the problem a lot because now anybody can go to the settings and just do this. Be it our own team or, because it doesn’t have to be escalated, it doesn’t have to come to a specific team. Anybody can do that. And we are constantly growing the more things that people can do to leverage that out. And yes, agencies and enterprise are taking huge advantage of that.

[00:47:54] Nathan Wrigley: Yeah, honestly, it’s absolutely fascinating. You never know, hopefully you and I, our paths will cross at some point in the year 2026. Maybe I’ll see you in Mumbai or something like that.

But what I’m going to do is I’m just going to say, if you’re curious about any of this, I will provide links to everything that we talked about. So if you head over to wptavern.com and you search for the episode with Saumya, so S-A-U-M-Y-A, you’ll be able to find it over there. Honestly, I feel like we’ve just scratched the surface. I feel like there’s another 8 hours in the pair of us, really could get into the weeds of it.

But thank you so much for peeling back the curtain a little bit on what you’re doing and how it all works with Cloudflare. Thank you so much.

[00:48:28] Saumya Majumder: No problem. Thanks for having me.

On the podcast today we have Saumya Majumder.

Saumya Majumder is the lead software engineer at BigScoots, with a deep specialisation in high-performance WordPress engineering and advanced Cloudflare-powered architectures. Throughout his career, Saumya has built large-scale systems ranging from custom caching engines to migration tools, worker-based automations, and edge computing solutions. He’s played a pivotal role at BigScoots, overseeing enterprise customers and developing scalable, developer-friendly solutions that push the boundaries of hosting for WordPress.

We begin our conversation with a timely discussion about a major Cloudflare outage that recently rippled across the Internet. Saumya explains what happened behind the scenes, the nature of these kinds of global infrastructure hiccups, and why, even with the most robust systems in place, some downtime is simply inevitable. He offers valuable insights into how BigScoots is able to mitigate these issues for their customers, even automating rapid failovers to keep sites online during outages.

We then move on to explore some of the innovations that the team at BigScoots have been working on. They focus upon site speed and reliability. This includes CDN-level page caching, and their close integration with Cloudflare Enterprise. Saumya breaks down how this caching differs from traditional server-based caching, and how it ensures that users around the world get fast, local access to website content.

If you’re curious about how hosting companies manage such advanced caching strategies, and how Cloudflare might fit into the hosting jigsaw, this episode is for you.

Useful links

BigScoots

Cloudflare

Super Page Cache plugin

Blog post about recent outage, 18th November 2025

Cloudflare for Enterprise

Introducing BigScoots Cache

#195 – Saumya Majumder on How Cloudflare Outages Impact the Web and WordPress Performance Solutions

Useful links

Leave a Reply Cancel reply