Welcoming Ralf Gommers as Director of Quansight Labs


I am so excited to welcome Ralf Gommers to Quansight as the Director of Quansight Labs. Ralf started with us full-time on April 1, and he will be working from the Netherlands for the time being. Ralf has a long history with the SciPy and NumPy communities including serving as a director of NumFOCUS. I have long been impressed with Ralf's community-minded spirit and willingness to serve the SciPy, NumPy, and PyData communities even while working full-time at another job. I'm thrilled that Quansight can support him directly in devoting his full-time job to helping the NumPy ecosystem directly. This is only possible because of funding for Quansight Labs provided by industry sponsors, grants, our new "Community Work Orders", and our open source Support subscriptions. More on that in a bit.

Continuing Support for Open Source

One of my favorite parts of building Anaconda, Inc. was creating and growing the Community Innovation team which built Conda, Dask, Dask-ML, Numba, Bokeh, JupyterLab, Panel, GeoViews, Holoviews, Datashader, Plures, Intake, and several other open-source innovations. That team continues under Peter Wang who co-founded Anaconda with me. That team still does wonderful work, supporting and contributing to Conda, Numba, Dask, Dask-ML, Pandas, scikit-learn, and other projects in the PyData ecosystem.

When I spun-out Quansight from Anaconda last year, one of my major goals was to grow and expand these open-source efforts that we started by continuing to direct industry, non-profit, and government funding to the NumPy, Pandas, and Jupyter ecosystems. We organized Quansight Labs last year as a public-benefit division of Quansight's activities in order continue and grow the number of people who can be full-time employed working on foundational libraries in these open-source ecosystems.

As we started the Labs last year, we first identified a few key innovation projects that we would work on and support. JupyterLab and PhosphorJS were the first projects we supported at Quansight, and we have continued to work on those projects with others in the community. In addition, we have continued to support the work of Stefan Krah on Xnd as an important project that generalizes and re-factors NumPy (allowing multiple-languages to have general tensors and computational infrastructure). This is similar to how Arrow generalizes Pandas. Xnd now has support for generalized ufuncs on many array-like containers supporting many kinds of data --- including ragged arrays of categorical and string types. The project also has support for GPU arrays and GPU functions on those data-structures. These capabilities are now being surfaced into Python-level objects such as a NumPy-like array object so that it can be used with Dask, xarray, and other higher-level projects. The C-libraries in Xnd remain available for the NumPy community to use and perhaps can help other projects in Python and other languages.

So Many Tensors

Last year, I also spoke in several venues about the confusing state of the array-computing world in Python. While Python has received an incredible investment from Google and Facebook and Amazon in open-source libraries like Tensorflow, PyTorch, and MxNet, these libraries have provided powerful new functionality but while also essentially just re-tracing the steps of NumPy and SciPy instead of contributing to those libraries and ecosystems. This richness and diversity has been exciting as it has brought many more people to the Python community, but it has also left me wondering what will become of the SciPy and scikit-* community and user-base. Many of us have labored for many years to create the diverse, organic and multiple-stakeholder ecosystem that NumPy, Matplotlib, scikit-learn, scikit-image, Jupyter, Pandas, SciPy, yt, PyMC3 (and many more) inhabit. Will this all be replaced by new libraries like Tensorflow, or PyTorch and their associated ecosystems (witness PyMC4 has moved on to Tensorflow and odetorch is basically scipy.integrate.ode except on PyTorch).

Certainly our community must expand to include these impressive frameworks. In many ways in 2018 I was in the same position I found myself in in 2005 with multiple array libraries disrupting the SciPy community. History does indeed repeat itself, but perhaps usually not this quickly.

After much contemplation and visiting with the PyTorch and Tensorflow teams last year, I realized we can make progress not by re-writing NumPy again, but perhaps by revisiting the two efforts that we also pursued while NumPy was being developed. In particular, back in 2005 we also worked on PEP 3118 to create a new buffer protocol so that multiple Python objects could share memory. This is actually what Xnd is at its core --- a generalized cross-language buffer protocol, with generalized array-map functions that can work on those buffers and support all kinds of data beyond what NumPy supports.