Array Programming with NumPy

Charles R. Harris,K. Jarrod Millman,Stéfan J. van der Walt,Ralf Gommers,Pauli Virtanen,David Cournapeau,Eric Wieser,Julian Taylor,Sebastian Berg,Nathaniel J. Smith,Robert Kern,Matti Picus,Stephan Hoyer,Marten H. van Kerkwijk,Matthew Brett,Allan Haldane,Jaime Fernández del Río,Mark Wiebe,Pearu Peterson,Pierre Gérard-Marchant,Kevin Sheppard,Tyler Reddy,Warren Weckesser,Hameer Abbasi,Christoph Gohlke,Travis E. Oliphant
DOI: https://doi.org/10.1038/s41586-020-2649-2
2020-06-18
Abstract:Array programming provides a powerful, compact, expressive syntax for accessing, manipulating, and operating on data in vectors, matrices, and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It plays an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, material science, engineering, finance, and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves and the first imaging of a black hole. Here we show how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring, and analyzing scientific data. NumPy is the foundation upon which the entire scientific Python universe is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Because of its central position in the ecosystem, NumPy increasingly plays the role of an interoperability layer between these new array computation libraries.
Mathematical Software,Computation
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper "Array Programming with NumPy" aims to solve the following problems: 1. **The need for efficient array programming**: - Array programming provides a powerful, compact, and expressive syntax for accessing, manipulating, and processing data in vectors, matrices, and high - dimensional arrays. NumPy is the main array programming library in the Python language and is widely used in research analysis pipelines in multiple fields such as physics, chemistry, astronomy, earth science, biology, psychology, materials science, engineering, finance, and economics. - For example, in astronomy, NumPy is used in the discovery of gravitational waves and the first imaging of black holes. 2. **Unifying and standardizing the array interface**: - Before the appearance of NumPy, there were two main array packages in Python: Numeric and Numarray. Although these two packages had similar functions, there were some differences, which led to the split of the community. The emergence of NumPy combined the advantages of these two packages and became a unified solution that had the best of both worlds. - With the development of time, new array computing libraries keep emerging, and these libraries usually develop their own NumPy - like interfaces and array objects. In order to prevent these new libraries from splitting the user community again, NumPy is transforming into a central coordination mechanism, defining a clear array programming API and dispatching it to specific array implementations as needed. 3. **Supporting modern storage and hardware**: - Modern scientific data sets often exceed the memory capacity of a single machine and may be stored on multiple machines or in the cloud. In addition, the need to accelerate deep learning and artificial intelligence applications has led to the emergence of special - purpose acceleration hardware such as GPUs, TPUs, and FPGAs. - Due to its in - memory data model, NumPy cannot directly utilize these storage and special - purpose hardware at present. However, distributed data and parallel execution on GPUs, TPUs, and FPGAs fit very well with the array programming paradigm. Therefore, NumPy has added the ability as a central coordination mechanism to support the operation of external array objects through a well - defined API. 4. **Improving interoperability and extensibility**: - In order to support different types of arrays (such as sparse arrays, distributed arrays, etc.), NumPy provides "protocols" (or operation contracts) that allow specific arrays to be passed to NumPy functions. NumPy dispatches the operations back to the original library as needed. - These protocols enable users to write code once and easily switch to NumPy arrays, GPU arrays, distributed arrays, etc. when needed without a large amount of code modification. 5. **Challenges in maintenance and development**: - NumPy is not only a basic array library but also has become the standard API for tensor computing and the central coordination mechanism among array types and technologies in Python. With the rapid development of data science, machine learning, and artificial intelligence, NumPy is facing new challenges, such as the development of new devices, the evolution of existing special - purpose hardware, and the diversification of data science practitioners. - In order to meet these challenges, NumPy needs continuous financial support and the participation of a new generation of developers to meet the needs of data science in the next decade. In summary, this paper explores how to solve various problems in modern scientific computing through array programming by introducing the history, functions, design principles of NumPy and its core position in the scientific computing ecosystem.