#Identify the barrier that reduce the the efficiency of modern computing

Tue, May 22, 2018

Modern computing devices are fast, but not as fast as it can be. One of the principle of software programming is abstraction, which make developing software faster and easier than its hardware counterpart, for the price of efficiency if the abstraction is not ideal.

One of the most inefficient abstraction is sequential execution, which to be fair, cannot blame von-neumann. In the early days of hardware development, there is only so much circuit unit to support one core. However, today’s hardware is more capable and can support up to thousand of cores. With sequential execution, all non-synchronous operation must be manually initialized and synchronized, which can be hard to done right. This is the reason why today’s programs often running on an single thread. For hardwares, sequential execution is one of the problems, too, for the sequential operation have no practical usage unless combined with branch operations, which can cause the pipeline,which attempt to parallelize the execution of instructions to break, as it must keep the abstraction of sequential execution. The problem isn’t stopped here, as the program is executed in sequential, the meaning of any symbol can change with time and it could be hard to track the meaning of any symbol, forcing the processor to give up the possibility to predict the outcome of branching condition. This limitation is not only apply to conditional jump instructions, but also can apply to other implied branching such as permission checking. With this abstraction exist, hardware or optimiser cannot rearrange the order software code is running to increase the performance.

The second reason program can only run in an reduced speed is linear memory abstraction. Which is again, an legacy from early hardware. Today’s computer often abstract memory address to be an linear, addressable and any given program at any given time, can access all memory it have be allocated. This forced hardware to be designed in a way that it can allow synchronous main memory access and make guess about what memory program will need after the program have started execution. This neglected the nature of cache exists in progressor and limited the ability for processor to load data into cache while another program is executing. And since program run in virtual memory space, check and protection must take place before any memory access, force processor to include mmu to reduce the need to call operating system on page fault. For softwares, too, with inability to forcest memory access, program have to manage memory by reference counting, or garbage collector or ownership or manually, which is not standardized and blackbox to processor, which is not accelerated and have additional burden on engineer or runtime.

The third reason reason programs are running slower than it can is because of abstraction of uniform ability of processing units. With the requirement to implement identical functionality on all processing cores, it could be infeasible to significantly increase the number of cores, or be forced to limit the functionality of entire product(happened on CPU, and GPU respectively). Even if the device has additional programmable core to process data(such as both CPU and GPU), it usually require separate tool to develop on them which make it hard for developers to actually use all processing power available unless make significant investment.

In the upcoming articles, I will discuss other factors that abridge today’s computing device’s power. And at the end of this series, I will purpose a new way of hardware abstraction, hoping it could address all problems mentioned.