Even before COVID-19 drove up a year’s worth of bandwidth demand growth in less than a month, network operators have seen a 50 percent annual increase in demand. 5G is expected to tax capacity even more. Steve Vogelsang, Nokia’s CTO of IP and Optical Networks discusses the challenges — and the solutions as telecom companies prepare for the future of wireless.
The following is a transcript of this conversation. Some parts have been edited for clarity.
Michael Hainsworth: Network operators accustomed to a 50 percent increase in demand year-over-year have found themselves delivering that data boost almost overnight, thanks to COVID-19 and shelter-in-place. And new technologies embedded into 5G will make expanding the network or accommodating a crisis that much easier and faster and more efficient. But what are the biggest challenges that come with meeting this exponential demand for new bandwidth? For insight, I turn to Steve Vogelsang, the Chief Technology Officer of IP and Optical Networks at Nokia. He says, “Supply will only meet demand if the network evolves.”
Steve Vogelsang: Yeah, it’s interesting because the recent weeks have been very challenging, specifically related to the COVID-19 outbreak. What we’ve seen is demand increase literally overnight. We typically see a 50 percent increase, 40- to 50 percent per year, in networks. And with COVID-19 and people working from home, we’ve seen that jump, from happening within a year, to now being over the period of just a few days. And this has had pretty fundamental impacts on the network. It came out of nowhere, we couldn’t have anticipated it. Even the best capacity planning wouldn’t have been able to deal with it.
Now what’s been interesting is the networks that we’ve built have responded very nicely to this demand. So we’ve seen this huge increase that’s happening in new areas of the network at new times of day, and our networks that we’ve built largely have responded to handle this. And I think it’s really a testament to our philosophy of designing our IP and optical equipment under the idea of performance without compromise. We want to make sure that as we’re adding capacity, we’re allowing our customers to do that without compromising performance and functionality.
In many ways, this is a precursor to what we see coming with 5G. If you think about 5G, what it’s doing is first increasing capacity through technologies like massive MIMO, and opening up new spectrum so we get better utilization of the spectrum and access to more spectrum. And that really is geared toward meeting the demand, this constant increase in bandwidth demand.
5G is also geared toward new applications. There’s new functionality such as low latency and ultra-liability that are geared toward the more industrial type of applications. So allowing industries to start utilizing the 5G network to find ways to better operate and run their businesses. If you think about that from a transport network perspective, the transport network is going to have to respond to new applications and new functionality that right now we can’t really predict. We’ll see how those evolve over time.
MH: Well, tell me more about that architectural evolution because we’re not just presently pushing the limits on bandwidth, but on distance too.
SV: That’s a big challenge, particularly for transport networks. If you think about the performance of an optical transport network, it’s really a function of capacity and distance. If we go longer distances, we typically have to dial back the capacity in order to get those longer reaches. If it’s shorter distances, we can increase the capacity on a given wavelength.
When you think about the shifts in the network, it becomes very difficult to do capacity planning. Because if I need to extend bandwidth today over long-distance in the application shift tomorrow and it’s over a shorter distance, I may have the ability to flex the bandwidth. What we’ve really been looking to do is with new technologies and new generations over optical technology is to see if we can help operators simplify that. So we really want to try to deliver a new performance paradigm where we can enable that capacity really over any distance.
MH: And when you mention these new industrial applications in the cloudification of Industry 4.0, distance becomes critical, but not in the long-distance side of the equation either.
SV: Yeah, that’s exactly right. A lot of these industrial applications ultimately require very low latency. This means that compute, which we’ve been for years now really centralizing into more central data centers – and I’ve talked about enterprise cloud migration, which further centralizes some of the compute functions – but we’re now seeing a need to distribute other compute functions.
Those functions that are required to operate a factory get very fast response times so that a robot inside the factory has the instructions it needs. It needs to know where it goes. Or an assembly line where you’ve got different equipment that needs to be coordinated. That requires distribution of compute. And so we want to centralize things to get efficiencies but we’ve got to distribute them to enable these industrial applications. And you can imagine the implications on the network. It just makes things very, very complicated.
MH: Well then, how do we deal with the complexity in this evolution of the transport within a network, without increasing the complexity to the point where we can’t handle it? Or is the answer just throw it at machine learning and let algorithms handle it because we simply can’t do it ourselves?
SV: There’s a couple of things that we’ve been doing. One is to start thinking about the advances that we have in technology to simplify, rather than always pushing on the highest capacity across the shortest link. If you think about the optical world, what’s been happening for years is every new generation of coherent technology, people talk about the peak bit rate they can get across a very short fiber that really has no applicability in a real network.
What we’ve been doing at Nokia is looking at this problem a bit differently and saying, well, with each new generation of silicon, with our new DSPs, with the integration of photonics into very small packages, how can we utilize that combination of silicon photonics in small packages, next generation of DSP instead of going after the latest highlight of performance across a short fiber link, how do we utilize that to make the network simpler? So whether I’m going a long distance or a short distance, I get the exact same performance end-to-end by utilizing that technology. And that’s been a big focus area of ours.
The other thing you mentioned there was AI and machine learning. We’re starting to look at those technologies. We’ve been doing a number of experiments. A big area that we see a lot of potential in is utilizing machine learning to help operators understand what’s going on in the network.
If you think of a typical day in a NOK as an example, it’s just screens filled with alerts, alarms, information flowing from the network. It becomes impossible for the network operators to understand. And often what they do is they just dismiss a lot of that information because it’s overloading them. What we can do with machine learning is we can analyze all of that information.
We’ve started to find that we can very quickly take huge volumes of alerts, statistics, information coming from the network, and through machine learning systems make sense of it so that we can tell events that are occurring that may not produce exactly the same set of alarms and statistics but are similar. A machine can recognize that they’re similar and suddenly correlate these things. So if something we saw two days ago pops up again, where an operator wouldn’t really see that as the same event, machine learning systems can identify that. So you can very quickly determine what’s happening. And then most importantly, what’s the action that needs to be taken to address the situation and improve network performance?
MH: So as we look at the evolution of the architecture of a wireless network from 4G to 5G to move from the simple, “Let’s just pump as much as we can down a pipe as quickly as we possibly can” to a more contextual relationship with those ones and zeros. Let’s start right at the front lines with the radio access network. We used to talk about fronthaul, we used to talk about midhaul, now we’re talking about any haul.
SV: It’s a recognition that one-size-fits-all is no longer the best solution in networks. Given all of the challenges that we’ve already discussed, we need to have the right solution for the right locations in the network. And so in some cases that will mean fronthaul. Fronthaul allows the centralization of a lot of the radio resources. So you can imagine scenarios where you want to reapply that capacity to deal with events. An example that I often will use is something like a venue where you’re holding large numbers of people, but they’re not there all the time. And so with a fronthaul type solution, we can light up, if you will, radio capacity in that event. But when the event’s not on, we can redirect a lot of those resources for other purposes. It could be an event in another part of the city or it could be something else. So that’s one of the advantages.
Now the disadvantage of fronthaul is it requires a lot of capacity. You need very high optical transport in order to get into and out of those sites. That’s traditionally where backhaul was the right answer. With backhaul, we push all of the radio functionality out to the site, and the end result is you have a lower bandwidth connection. So you have lower costs in terms of the IP and optical transport, but that capacity now is fixed to that location. So this is why we’ve started to look at midhaul solutions. It’s sort of a compromise where we centralize some of the functionality and distribute others. And you also get a compromise in terms of the capacity required.
When we step back and say, “Well how will 5G networks get deployed?”, ultimately it’s a combination of all of those. There’ll be cases where the right answer is to use a fronthaul, other cases where the right answer is to use backhaul and other cases where the right answer is to stick to some form of midhaul. And over time, we’re going to see various types of midhaul.
So when we think about how we build these networks, we need to make sure that we can handle all of these scenarios. We can handle them all efficiently and hence we now talk about anyhaul. We build packet transport networks that can handle the time-sensitive networking requirements of things like fronthaul; can handle the new encapsulations and requirements like eCPRI as an example, for a midhaul style network; and of course have all the functionality required for something like backhaul, where you have higher layer IP and MPLS capabilities to handle the traffic. It’s a combination of all those and we constantly look to building those solutions and bringing them to the market.
MH: So I understand then why it’s important to have a common optical network fabric, but I can’t imagine it doesn’t come without challenges.
SV: Over time, we’ve made the optical network much, much more flexible in providing an end-to-end fabric. And that’s flexibility in terms of how we can route those optical signals. So we’ve gone to wavelength routing across multiple degrees. I’ve deployed many of those degrees now. But there’s also flexibility in the size of those signals, something called flex grid. These systems (wavelength routing with flex grid) are being deployed. We continue to believe there’s significant value in utilizing this optical fabric. And one of the big things is really not complicated. People ask me, “What’s the value?” and simply put, there’s a big investment when you build an optical network in the transponders or transceivers – the coherent technology that’s used to get the bits onto and off of the optical network.
When you have an optical fabric and you do experience a fiber cut, what that ultimately means is we can take all of that transponder capacity and we can reroute it over different fiber paths. So you’re not losing the transponder capacity when you have a fiber cut, you’re able to retain it and just reroute it at the optical layer. And that’s really a huge benefit in terms of the cost required to build a highly robust network that can handle fiber cuts.
MH: And that’s all part of the challenge of planning for failure. How does a CSP take advantage of this technology to ensure that it doesn’t become the headache that we’ve seen in the past? As you point out, when a construction worker cuts a fiber line, entire regions freeze up, or at least they have in the past.
SV: That’s something that we can address now with the optical fabric and wavelength routing, where we can redirect onto a different path. If you think about the way that this would be dealt with without having that optical rerouting, you would end up having to invest in having sufficient transponder capacity and router capacity to deal with any particular fiber cut. It’s possible, you can do it, but it requires a different investment profile.
Now what I will say is the challenge when you’re doing an optical fabric with wavelength routing is, as I’ve mentioned, the performance of the optical network in the transponders is ultimately a function of the distance and the number of pieces of equipment that you’re traveling through. So when you’re rerouting, suddenly the equation changes, and you may be going a longer distance for some of those fiber-optical transmission lines.
Traditionally that would mean complete reconfiguration, changing modulation formats, all sorts of different things that you would have to do. In the end, it’s very hard to determine if we have enough capacity when we reroute. This is where we see an upcoming opportunity to rethink that process and say, “Hey, what if we could deliver coherent transponder technology that could maintain performance regardless of distance?” That’s an area that we’ve been looking at and we think we’ll be able to bring some pretty interesting solutions to market. And the end result for a CSP is, not only do you get protection when you have a fiber cut, but you know the network is going to continue to perform at the same capacity when you get a fiber cut.
MH: What though of trying to boost the amount of ones and zeros we can pump down a line? Ever since I started researching the Shannon limit, I’ve had Del Shannon’s Runaway stuck in my head. Let’s define the term, if not the musician.
SV: Shannon’s limit is essentially the maximum bits per second per hertz, or spectral efficiency, that you can get across a nonlinear fiber optic transmission. It will vary based on the distance, but you basically have a curve that says, “I can’t get efficiency in the fiber that exceeds that curve.” That’s something that Shannon calculated a number of years ago and it’s held true to this day.
In the fiber optic transmission space, as you mentioned, we’ve been improving the efficiency of how many bits we can send down a bit of spectrum in the fiber pretty consistently. Looking back at 2000 until today, we’ve increased over 200 times the capacity of the fiber. If you think in terms of Shannon’s limit and dB, in terms of measuring Shannon’s limit in dB, I think we’ve improved by something like 25 dB over the last 20 years, and we only have about three dB left. Every three dB basically, you’re doubling the performance. We’re really getting so close to that limit that it’s starting to become impractical to push the technology to try and get that last few dB of capability out of these links.
This is why we’re changing our thought process. Because we know we can’t really go much further in terms of applying technology to increase fiber capacity, we start thinking about the realities that we’ll need to put more fiber into the ground, light up more fiber, complicate the networks because we have more links. So how can we apply the new capabilities and new capacity that we get from silicon, from silicon photonics, from our DSP technology, to simplify the network? That’s a big, big focus. And I think you’ll see over the coming years, more and more of the industry will start talking about network simplification rather than trying to push fiber capacity.
MH: As part of pushing up against Shannon’s limit in the solution being more fiber, what about other means of simplification of this issue? We’re integrating coherent optics into a router. Is that part of the simplification solution?
SV: Yes, it is, potentially. It’s a hot topic today because what’s happened is we’re now starting to, as an industry, develop standard coherent interfaces. Traditionally, coherent interfaces were proprietary, which serves the industry very well, and would allow vendors the flexibility to really push the limits and get the most efficiency through innovation without being tied back to a standard. But because we’re getting so close to Shannon’s limit, it’s now becoming possible to say, “Well, okay, we may not get the absolute best performance, but we can create some standard interfaces.” And that opens the possibility of integrating more efficiently into routers.
One of the things that’s happened as we’ve gone through the evolution of coherent technology over the last few generations is the size of a coherent transponder has decreased. If we go back to 2010, 2009, early days of a coherent, we had kind of double-wide cards that were state-of-the-art for a coherent interface.
That’s decreased over time to now we have pluggable modules. So you have a CFP2, DCO, coherent module. And now the latest, this new standard that’s been developed in the OIF, is something called a 400G ZR module. And that can fit potentially into a QSFP-DD form factor. That’s significant from a router perspective because it’s really the first time that we have a pluggable coherent transceiver that matches the capacity of the router. Traditionally they were larger than a typical router interface and so you’d have to give up some capacity.
Now that we’re seeing that gap kind of close, where potentially I don’t have any density trade-off in the router. The industry is all abuzz talking about, “Okay what does this mean and how do I re-architect the network with coherent optics directly plugged into the router? And ultimately, what is the value?” And this is something I think we’re working through as an industry. There are some very basic use cases for 400 ZR optics where there’s routes, or interfaces, where I may just want a direct point-to-point connection between two routers. 400 ZR can do that at 400 gigabits over a pretty decent fiber span.
MH: So if there’s no density trade-off in this scenario, there’s got to be a trade-off somewhere. What do we need to know?
SV: There are still rather significant performance trade-offs that we see. The technology, 400 ZR, that can be integrated into a router today, it’s really designed for a single point-to-point span. A single fiber span, point-to-point, could be literally a single 400G ZR interface, just getting a 400G capacity on the fiber. It does have the option to add multiplexing, DWDM multiplexing, and some amplification to extend the reach a little bit further, maybe 200 or 120 kilometers, and could be fairly significant fiber capacity. But again, the trade-off there is it’s a single point-to-point fiber span, and that’s what 400G ZR is designed to do.
MH: So as we see the architectural evolution in it shows us the value of a common optical network fabric, what though of a CSP who says, “I’ve got gear from other people, it just sounds like you want to sell me more of your gear.”
SV: Well, one of the things that’s happened with the optical fabric over the last few years is we’ve gone to, what we call, open line systems. And so we’re increasingly building networks that will have an end-to-end optical fabric that is carrying traffic from multiple vendor transponders, and that will continue over time.
I’d say at Nokia, we’ve been very much committed to the concept of open networking. We recognize ultimately that for a CSP, they really need the flexibility to be able to select the vendor that delivers the best solution for them. Whether they’re looking at the price per bit, whether they’re looking at the long term value, we obviously think that we can deliver those solutions, but we recognize that our customers will want the ability to assess and bring other vendors into the network where it makes sense. And so we’ve been committed to those open types of structures, including with the optical line system, where we have open line systems today.
MH: So as we’re seeing this traffic increase, it sounds like the solution, when we’re not seeing a commensurate revenue increase, is this evolution of the architecture of the network.
SV: Yeah, that’s correct. There’ve been a number of things that have been happening over the last few years that have enabled operators to keep up with this 50 percent increase. And we see some more opportunities to continue that.
The first thing is that as vendors, we’ve been driving down the cost per bit of our equipment, and that’s through new generations of silicon. So we will continue to drive down the cost per bit, drive down the power per bit. And that’s a big factor in helping operators keep up. It’s not the only factor.
Another key thing that’s been happening is re-architecting the network to remove layers, to remove hops in the network. There’s been a big push, if you look at the way the internet is structured, toward more distributed peering. So the distance for an access provider from where they need to deliver traffic to their consumers, and where that comes onto their network has decreased over time as the big content providers have distributed their backbones and more dense peering. We see that will continue to a certain degree.
We’re also seeing a de-layering of the network. We used to have redundancy schemes that would span across both the IP layer and the optical layer. We’re now building end-to-end networks where we think through the various redundancy schemes and sure they’re implemented in one layer or the other. That provides a pretty significant saving. So we’ve been able to keep up with this bandwidth. I think we can continue to do that. It just requires us to not only drive that cost per bit and the equipment that we deliver, but also constantly reassess and reevaluate the network architecture to make sure we’re extracting the efficiencies that are necessary to continue meeting the demand.