Front cover image for Interfaces for efficient software composition on modern hardware

Interfaces for efficient software composition on modern hardware

For decades, developers have been productive writing software by composing optimized libraries and functions written by other developers. Though hardware trends have evolved significantly over this time--with the ending of Moore's law, the increasing ubiquity of parallelism, and the emergence of new accelerators--many of the common interfaces for composing software have nevertheless remained unchanged since their original design. This lack of evolution is causing serious performance consequences in modern applications. For example, the growing gap between memory and processing speeds means that applications that compose even hand-tuned libraries can spend more time transferring data through main memory between individual function calls than they do performing computations. This problem is even worse for applications that interface with new hardware accelerators such as GPUs. Though application writers can circumvent these bottlenecks manually, these optimizations come at the expense of programmability. In short, the interfaces for composing even optimized software modules are no longer sufficient to best use the resources of modern hardware. This dissertation proposes designing new interfaces for efficient software composition on modern hardware by leveraging algebraic properties intrinsic to software APIs to unlock new optimizations. We demonstrate this idea with three new composition interfaces. The first interface, Weld, uses a functional intermediate representation (IR) to capture the parallel structure of data analytics workloads underneath existing APIs, and enables powerful data movement optimizations over this IR to optimize applications end-to-end. The second, called split annotations (SAs), also focuses on data movement optimization and parallelization, but uses annotations on top of existing functions to define an algebra for specifying how data passed between functions can be partitioned and recombined to enable cross-function pipelining. The third, called raw filtering, optimizes data loading in data-intensive systems by redefining the interface between data parsers and query engines to improve CPU efficiency. Our implementations of these interfaces have shown substantial performance benefits in rethinking the interface between software modules. More importantly, they have also shown the limitations of existing established interfaces. Weld and SAs show that a new interface can accelerate data science pipelines by over 100x in some cases in multicore environments, by enabling data movement optimizations such as pipelining on top of existing libraries such as NumPy and Pandas. We also show that Weld can be used to target new parallel accelerators, such as vector processors and GPUs, and that SAs can enable these speedups even on black-box libraries without any library code modification. Finally, the I/O optimizations in raw filtering show over 9x improvements in end-to-end query execution time in distributed systems such as Spark SQL when processing semi-structured data such as JSON
Thesis, Dissertation, English, 2020
[Stanford University], [Stanford, California], 2020
Stanford University
1 online resource
Submitted to the Computer Science Department