Flow Computing's PPU Claims to Boost CPU Power by 100x

July 3, 20248 Views architecture, drive, eco, el, electronic, ic, lt, nec, technology 8 min read

CPUs have long been the dominant computational device in electronics thanks to their ability to handle any computational task, but due to the specific nature of some key applications, other processor types, such as GPUs and NPUs, have also emerged. Considering that CPUs are now often the bottleneck in computational applications, researchers and engineers alike continue to look for new ways to improve their capabilities, and a new startup believes it may have the solution in what it calls a Parallel Processing Unit. What challenges do CPUs face compared to GPUs and NPUs, what exactly can this new PPU do, and could such Parallel Processing Units become critical in future designs?

Key Things to Know:

Flow Computing’s Parallel Processing Unit (PPU) aims to significantly enhance CPU performance by managing data flow efficiently, particularly benefiting AI and machine learning applications.
The PPU can integrate with existing CPU architectures without requiring major modifications to legacy code, offering a versatile solution for various platforms.
Despite its potential, the PPU is still in development, and practical implementation faces challenges such as cost, compatibility, and security concerns.
If successful, the PPU could revolutionise computing by enabling more efficient and sustainable processing, driving innovation in software development and advanced applications.

Challenges of CPUs and the Rise of GPUs and NPUs

When the world was introduced to the Intel 4004, the world saw the immense benefits of fully integrated processing systems, sparking a massive race by manufacturers to build the best machines. However, as computers moved into the commercial sector, it became apparent that such processing devices were not able to handle specific tasks such as graphics processing and floating-point numbers, which quickly gave rise to co-processors such as GPUs and FPUs.

However, as computers moved into the realm of scientific computing and servers, the need for highly efficient processors only continued to grow, and the rapid expansion of new programming languages and software concepts prevented them from being as efficient as they could be. This has been especially true with x86 devices, as they need to maintain backwards compatibility (and, as such, are full of circuits which are rarely used, taking up valuable die space).

Efficiency Challenges and Memory Access Issues

The way in which CPUs access memory has also been a concern for computers, often leading to performance bottlenecks. The limited amount of cache on CPUs combined with the relatively slow speed of external memory means that CPUs are rarely able to operate at their full speed. While multiple cache levels can help to alleviate these concerns, any branching instruction can quickly lead to cache misses, thus causing a significant slowdown in performance.

Additionally, the nature of applications is also changing, with future applications looking towards artificial intelligence. Considering that AI is heavily reliant on massive parallel operations on floating point numbers, modern CPUs are simply not up to the tasks of executing such applications efficiently. Because of this, engineers are looking to incorporate neural processing units into CPUs so that such tasks can be offloaded and executed more efficiently.

But the use of dedicated hardware also means that GPUs and NPUs suffer from poor latency, with messages having to be passed from a CPU to a GPU or NPU across a bus. Thus, GPUs and NPUs are only ever as fast as the CPU that controls them, meaning that they are often sitting idle. As such, CPUs are faced with a range of issues that prevent them from being as powerful as they can be, while GPUs and NPUs offer far better energy efficiency but struggle with latency.

The Future of Computing with Parallel Processing Units

The CPU is often referred to as the brain of a computer, and its ability to execute instructions is limited. While the CPU can execute multiple instructions in parallel, it is still limited by its hardware, with large portions of the CPU unused during any given time. Furthermore, traditional CPUs are designed to handle sequential code, meaning that if one task is dependent on the result of another task, then these two tasks will be executed one after the other.

Flow Computing, a startup backed by the state-owned VTT in Finland, is addressing these inherent limitations by developing a co-processor that can manage data flow efficiently. Their co-processor, called a Parallel Processing Unit (PPU), acts as an intermediary between a regular CPU and external code, helping to free up resources on the CPU to perform more useful work. This parallel processing approach allows the CPU to offload tasks, thereby maximising resource utilisation and significantly boosting performance. Such enhancements are particularly beneficial in applications requiring intensive computations, such as AI and machine learning, where rapid data processing is crucial.

Seamless Integration and Versatility of the PPU

Flow Computing claims their PPU can seamlessly integrate with existing CPU architectures without requiring significant modifications to legacy code. This compatibility ensures that the PPU can be adopted across various platforms, providing a versatile solution for enhancing computational efficiency in diverse applications.

According to Flow Computing, their PPU can help with legacy code that has been written with parallel programming in mind. The startup also expects that future CPUs will integrate their technology, and this will help to boost performance by as much as 100 times. However, such a figure may only be achievable with code that is heavily reliant on parallel execution and may not be applicable to all applications (such as games and other linear programmed applications).

The potential of Flow Computing’s PPU extends beyond just performance improvements. By enabling more efficient parallel processing, the PPU can help reduce power consumption and heat generation in CPUs, making computing systems more sustainable and cost-effective. This is particularly relevant as the demand for energy-efficient computing continues to grow in both consumer and enterprise markets.

However, the PPU is still in development, and the use of FPGAs means that it is far from being a practical device. The use of such hardware also means that software is also limited, with only a few tools available to developers looking to exploit the new architecture.

Development Challenges and Future Potential

Despite these challenges, Flow Computing’s PPU represents a significant leap forward in processing technology. The ability to manage data flow at such a granular level could revolutionise the way CPUs operate, paving the way for more advanced and efficient computing solutions. As the technology matures, it is expected that more tools and resources will become available to help developers fully harness the power of PPUs.

Overall, what Flow Computing has developed is exciting, and the use of parallel processing could be the future of computing. However, the PPU is far from being a practical device, and the 100x performance boost may only be achievable under ideal conditions.

The integration of Flow Computing’s PPU into mainstream computing could also drive innovation in software development. By enabling faster and more efficient processing, developers can explore new possibilities in application design and functionality, potentially leading to breakthroughs in various fields such as artificial intelligence, data analytics, and real-time processing systems.

Future Challenges and Possibilities of Flow Computing’s PPU

As the world continues to rely on technology to power everyday life, the need for increased CPU performance will only continue to grow. However, it is not just the core frequency that will need to increase, but the total number of cores as well as their parallel capabilities. This is already being seen with the introduction of chiplets and the use of dedicated cores for specific tasks such as AI and graphics, but the use of a co-processor that can handle massive parallel operations could be the key to future computing systems.

The Flow Computing PPU could be the answer to future computing needs as it has been designed with parallel computing in mind. While the use of 256 RISC cores may seem excessive, it is important to consider that each of these cores can be assigned to handle individual tasks as well as divide up work on larger tasks in parallel. This would make the PPU ideal for running multiple threads simultaneously as well as executing complex vector operations commonly found in AI and graphic tasks.

However, the use of such an PPU in a mass-production device presents a number of challenges. The first is cost; devices that are designed for high performance often come with a higher cost. While the PPU may help to future proof devices against increasing performance needs, the increased cost could make such devices less popular with consumers.

The second challenge is compatibility; such a PPU would not be able to handle all tasks efficiently. As such, it would be essential that a computer using such a PPU has the ability to detect tasks and automatically assign these to either the CPU or PPU. Such an operating system would be complex to develop, and the PPU would essentially be a coprocessor and not a co-processor, as it would not be able to handle all tasks.

The third challenge is that the use of such a PPU could cause potential security issues to arise. With 256 RISC cores available, all of which can access external resources, it also potentially opens up such a system to unauthorised access (due to the increased number of entry points). As such, the PPU would need to integrate strong security features that would prevent such access, and this would, in turn, make the PPU more complex to manufacture.

Overall, the PPU being developed by Flow Computing could usher a new era of computing into the world of high-performance computing, and its use could lead to the development of more efficient systems that are able to handle multiple tasks simultaneously.