Accelerating AI innovation through data centre digital twin simulations

24 April 2024

Stuart Redshaw, CTIO and co-founder, EkkoSense

Stuart Redshaw, CTIO and co-founder, EkkoSense

While analyst firms such as McKinsey acknowledge the strong business case for the deployment of AI applications, they also recognise that establishing a clear route towards unlocking GenAI value is challenging. That’s why we believe an entirely new approach to data centre digital twin simulation is necessary when it comes to accelerating AI adoption in the data centre.

Of course the deployment of a new generation of AI applications represents a significant opportunity for businesses to differentiate and gain market advantage. Consequently, there’s huge pressure to get things right! It is critical that your organisation’s IT infrastructure and data centres are ready to support GenAI apps, and that you’ve made all the right choices in terms of preparation.

So for data centre teams tasked with responding to the rapid pace of change triggered by continued AI innovation, what’s really needed is straight answers to some basic questions. Given that high-density AI workloads can easily hit 10kW per rack (and that will keep on rising), data centre infrastructures will inevitably be stretched – but by how much?

Key questions that need clarifying here include: ‘how can I predict and maintain optimum performance?’, ‘what equipment do I need to replace, and when?’, ‘what spaces can I use for high-density loads such as AI?’, ‘where are my capacity bottlenecks?’, and ‘what is my best cooling strategy?’.

Given the scale and cost of this challenge it’s simply not practical for IT teams to build full-scale test facilities, particularly when there are so many questions and variables involved that need addressing fast. So, whether you’re a CFO validating potential AI investments, a CIO trying to identify capacity bottlenecks for increasing IT loads, or a data centre manager looking to specify the most appropriate cooling strategy, some way of simulating potential AI data centre scenarios before committing to major infrastructure investments is essential.

Simulating for innovation

First it’s important to understand the type of simulation you’re looking to undertake. When you consider a traditional simulation application such as a flight simulator, you’re dealing with a solution whose purpose is to train pilots to understand specific controls and perfect precise manoeuvres in given conditions in order to fly safely and efficiently. Everything is defined, it’s all about practice, repetition and learning, and the whole process – despite being highly technical – sits at the low end of the innovation scale.

What we’re doing at EkkoSense with data centre digital twins operates at the other end of the simulation spectrum for innovation. Whereas flight simulators or Olympic cycling simulation aims to be unbelievably accurate so that repeat training effectively becomes mind and muscle memory, we’re doing something else entirely.

Our focus has been on providing organisations with comprehensive grey space modelling and simulation across their full, end-to-end data centre infrastructure. We wanted to provide IT teams with a data centre digital twin approach that would extend beyond the white space, enabling them to respond to their AI infrastructure concerns by carrying out ‘what-if’ analysis queries, tracking the performance of their current capital investments, simulating extensions to existing capacity, or even planning entire new builds.

The target has been to rapidly simulate data centre change in response to AI deployment plans. The approach has been to draw on first principles maths and physics models so that businesses could create precise cooling, power and capacity simulations with accuracy levels running at 99% plus. However, we knew this kind of AI sandbox simulation couldn’t deliver if we dialled in too much detail.

While it would be possible to create data centre digital twins that simulated every single switch position, setpoint and power circuit, the models would have to be unbelievably sophisticated to get every detail exactly right. That’s why we’ve focused on creating an innovation tool for data centres – one that uniquely enables people to ask questions about rapidly evolving issues such as ultra-high density AI compute deployments, but that doesn’t overlook the associated heat transfer, thermodynamics, energy and cost components that need to underpin every decision.

The whole point of an innovation tool for data centres is to allow businesses to simulate change quickly, note the broad effects and results, and draw whatever conclusions they need to. Thanks to the accessibility of our EkkoSim data centre digital twin software, data centre teams can accurately analyse and predict the most suitable designs, map metered data against predictive models, and triage those data centre assets that would benefit most from configuration assessments. Critically, this new approach also equips customers with a rapid simulation capability that allows changes to their IT load so they can understand exactly how increased GenAI workloads might impact their infrastructure.

The value of the data centre digital twin approach for innovation simulation is that it provides people with the visibility they need at one of the most pivotal points in data centre history. Of course, it’s still underpinned by really complex statistical mathematics and the power of EkkoSense’s 50-billion-point data lake – but that’s all hidden within the engineering. What CIOs and CFOs and data centre management get to see though, is whether it’s possible to take their infrastructure forward so that they can bring in the capacity needed to deliver AI computation at scale within their business timeframe. So, when they’re asked: ‘are we ready for that?’, they will – for the first time – be able to answer with confidence.