Speeding up code with clever data manipulation

Kasikci presents a method to improve a program’s ability to use data in a straightforward, efficient way

Baris Kasikci Enlarge
Prof. Baris Kasikci

The importance of program optimization is a no-brainer – making programs faster and more resource friendly are some of the core goals of programming. But current methods to analyze programs and identify their bottlenecks, inefficiencies, and other weaknesses face issues that run the gamut from labor-intensiveness to inaccuracy.

Prof. Baris Kasikci has proposed a new technique to tackle this issue broadly and automatically. In his project called “Huron: Taming False Sharing via Memory Layout Transformations,” Kasikci presents a method to improve a program’s ability to use data in a straightforward, efficient way by better use of a coding principle called locality. His project has earned an Intel Faculty Award, a renewal of his initial award for the project’s proposal.

The way program optimization currently works is typically through the use of special tools called profilers. When programs are run through profilers, they’re able to identify its runtime and memory usage hotspots. From there, actually addressing the issues becomes labor-intensive – a developer must inspect the program, understand the code, and make manual adjustments to the way it functions.

Other approaches use automated performance fixes to work around this. These simulate the effect of manually fixing a program with a profiler by analyzing multiple runs of the program and collecting data on how each instance is performing. Later, it uses this data to optimize the program automatically the next time the code is going to be compiled and deployed.

“The automated approach is better, but it’s very limited,” says Kasikci. These automated solutions stick to manipulating the text part of the program – the actual code. A vital part of how well a program performs is actually in the data part of the program – the information it works with.

The goal of Kasikci’s project is to achieve better performance and handling of performance bugs by addressing this other key component. His Huron tool restructures the data layout of a program after analyzing locality.

Locality can mean different things in different contexts, but one example of locality is what’s called spatial locality. This trait is essentially the proximity of different data elements to one another in memory. The more easily accessible the next element needed by the program is, the better the performance.

“Data locality is not always perfect,” Kasikci says. “A computer’s hardware is built to take advantage of this locality. There are algorithms built into processors that will allow you to run your programs faster if it exhibits good locality.”

In certain cases there are insidious performance bugs that destroy locality. These issues are very hard to identify when writing the code.

“Code is very high-level, it’s almost like English,” he says. “It’s hard to imagine everything that’s going to happen as it’s going to be executed on a machine. That’s why an automated tool can help you do that.”

The idea behind Kasikci’s solution is to reorganize the data layout, literally moving elements around, so that the program exhibits better locality. It’s a challenging undertaking, and there are tradeoffs. When you move data, you change the way it’s formatted and affect the way it needs to be read into a program. Because of this, the tool also has to modify the code to accept this newly formatted data. Modifying code is also a risky maneuver – you can introduce new performance issues in the process.

“You can perform these data manipulations to a certain extent, and beyond that you will have diminishing returns,” Kasikci says. “It’s about navigating that tradeoff automatically and finding a good balance while creating a more efficient program.”

Huron was developed to take on one especially nasty performance bug in particular, called false sharing. This bug arises when two different data elements needed by code reside on the same line in the memory cache. If these data elements are accessed by different processor cores at the same time, this can cause a performance spike.

Huron detects and repairs all false sharing bugs in the 21 benchmarks that Kasikci used with 100% accuracy. Moreover, Huron achieves significant speedups of up to 11x and on average 3.82x. Overall, Huron is 33.33% more accurate than state-of-the-art detection and repair tools and it provides up to 8x and on average 2.11-2.27x greater speed up.

“Our approach is completely different than the current state of the art,” Kasikci says. “By actually reorganizing the memory layout to improve locality, we’re improving performance while eliminating these types of bugs as a side effect. We’ve shown that we can beat manual developer fixes in most of the cases automatically.”