I am so excited to welcome Ralf Gommers to Quansight as the Director of Quansight Labs. Ralf started with us full-time on April 1, and he will be working from the Netherlands for the time being. Ralf has a long history with the SciPy and NumPy communities including serving as a director of NumFOCUS. I have long been impressed with Ralf's community-minded spirit and willingness to serve the SciPy, NumPy, and PyData communities even while working full-time at another job. I'm thrilled that Quansight can support him directly in devoting his full-time job to helping the NumPy ecosystem directly. This is only possible because of funding for Quansight Labs provided by industry sponsors, grants, our new "Community Work Orders", and our open source Support subscriptions. More on that in a bit.
Continuing Support for Open Source
One of my favorite parts of building Anaconda, Inc. was creating and growing the Community Innovation team which built Conda, Dask, Dask-ML, Numba, Bokeh, JupyterLab, Panel, GeoViews, Holoviews, Datashader, Plures, Intake, and several other open-source innovations. That team continues under Peter Wang who co-founded Anaconda with me. That team still does wonderful work, supporting and contributing to Conda, Numba, Dask, Dask-ML, Pandas, scikit-learn, and other projects in the PyData ecosystem.
When I spun-out Quansight from Anaconda last year, one of my major goals was to grow and expand these open-source efforts that we started by continuing to direct industry, non-profit, and government funding to the NumPy, Pandas, and Jupyter ecosystems. We organized Quansight Labs last year as a public-benefit division of Quansight's activities in order continue and grow the number of people who can be full-time employed working on foundational libraries in these open-source ecosystems.
As we started the Labs last year, we first identified a few key innovation projects that we would work on and support. JupyterLab and PhosphorJS were the first projects we supported at Quansight, and we have continued to work on those projects with others in the community. In addition, we have continued to support the work of Stefan Krah on Xnd as an important project that generalizes and re-factors NumPy (allowing multiple-languages to have general tensors and computational infrastructure). This is similar to how Arrow generalizes Pandas. Xnd now has support for generalized ufuncs on many array-like containers supporting many kinds of data --- including ragged arrays of categorical and string types. The project also has support for GPU arrays and GPU functions on those data-structures. These capabilities are now being surfaced into Python-level objects such as a NumPy-like array object so that it can be used with Dask, xarray, and other higher-level projects. The C-libraries in Xnd remain available for the NumPy community to use and perhaps can help other projects in Python and other languages.
So Many Tensors
Last year, I also spoke in several venues about the confusing state of the array-computing world in Python. While Python has received an incredible investment from Google and Facebook and Amazon in open-source libraries like Tensorflow, PyTorch, and MxNet, these libraries have provided powerful new functionality but while also essentially just re-tracing the steps of NumPy and SciPy instead of contributing to those libraries and ecosystems. This richness and diversity has been exciting as it has brought many more people to the Python community, but it has also left me wondering what will become of the SciPy and scikit-* community and user-base. Many of us have labored for many years to create the diverse, organic and multiple-stakeholder ecosystem that NumPy, Matplotlib, scikit-learn, scikit-image, Jupyter, Pandas, SciPy, yt, PyMC3 (and many more) inhabit. Will this all be replaced by new libraries like Tensorflow, or PyTorch and their associated ecosystems (witness PyMC4 has moved on to Tensorflow and odetorch is basically scipy.integrate.ode except on PyTorch).
Certainly our community must expand to include these impressive frameworks. In many ways in 2018 I was in the same position I found myself in in 2005 with multiple array libraries disrupting the SciPy community. History does indeed repeat itself, but perhaps usually not this quickly.
After much contemplation and visiting with the PyTorch and Tensorflow teams last year, I realized we can make progress not by re-writing NumPy again, but perhaps by revisiting the two efforts that we also pursued while NumPy was being developed. In particular, back in 2005 we also worked on PEP 3118 to create a new buffer protocol so that multiple Python objects could share memory. This is actually what Xnd is at its core --- a generalized cross-language buffer protocol, with generalized array-map functions that can work on those buffers and support all kinds of data beyond what NumPy supports.
The second thing we worked on while writing NumPy is to develop a very nascent array protocol (called __array_interface__). This work could be generalized and we can define an actual, formal array protocol for Python that could unify how libraries in Python talk to arrays (I typically call arrays what other people like to call tensors). This protocol and interface would allow downstream libraries like SciPy to talk to and be built on a protocol which could then be implemented by PyTorch, Tensorflow, or anything else to support GPUs, FPGAs, or whatever comes next. This indirection also opens the door to some very interesting possibilities as well that have been envisioned for years by Lenore Mullin (of APL fame) in her formal Mathematics of Arrays.
The uarray project is our early effort in this direction. After some exploration, we (primarily Hameer Abbasi) have made recent progress on an approach that would allow libraries like SciPy to rely not on NumPy directly but on an intermediate interface object that could have PyTorch, Tensorflow, XND, CuPy, or NumPy, as their backend. This project is not ready for prime-time, but stay tuned for how this project develops.
Creating a PyData Core Team
While we are excited by our innovations in JupyterLab, XND, Uarray, and others still emerging from our talented team, it was clear to us that Quansight Labs must not just be about innovation. We must also build a core team of maintainers for the entire NumPy, Pandas, and Jupyter ecosystems. We needed to build a place where open-source developers could find a home with like-minded people. I wanted to create the place that I would have wanted to join when I first started writing SciPy and NumPy. I have had the great fortune of knowing so many talented engineers who have built amazing things like Pandas, Matplotlib, Jupyter, scikit-learn, Mayavi, and more. Quansight Labs is about giving developers like that a place to work and support their creations that get used by so many.
How much of our efforts will be innovation versus maintenance will ultimately be determined by our sponsors and funding sources, but I believe it is critical that we find ways to support the NumPy, Pandas, and Jupyter ecosystems or else they will be replaced by better supported ecosystems.
Building this "PyData core team" for NumPy, SciPy, Pandas, Matplotlib, JupyterLab, PhosphorJS, and important Scikits is a specific goal of Quansight Labs. Hiring Ralf Gommers is a critical step to that goal. Over the next 3-4 years we plan to sell enough open-source support, sell enough training and mentoring workshops, and establish enough open-source partnerships to hire 30-40 developers all over the world on this team.
To accomplish our goals we will need many millions of dollars annually. We plan to earn that money from companies eager to protect their investment in NumPy, Pandas, Jupyter, and the related ecosystem. Matt Harward and I will use our understanding and industry connections and knowledge built over the past decade to ensure Quansight Labs succeeds. We are already well ahead of our Anaconda track-record financially and we are also joined by very capable developers and community managers which Ralf Gommers will join. We also have the support of talented back-office leaders, sales professionals, and partners as well as a young Venture/Angel Fund (Quansight Initiate) whose investments will also eventually support the Labs.
How will we Fund this?
How we will get the money to achieve our goals is the million dollar question, literally. We have long-term ambitious plans that will use all of what we have learned over the past decades of working on the problem of creating and supporting open source. It will involve many people, many customers, and a few key startups that we will discuss more about in the coming year. Primarily, however, we will fund Quansight Labs with industry conversations and sales efforts. We have identified several sales "playbooks" that we will continue to refine in order to direct industry dollars to Quansight Labs. These include 1) direct industry sponsorships, 2) non-profit and government grants, 3) Community Work Orders, 4) Open Source Partnerships, and 5) Open-source support subscriptions.
In addition, profits from Quansight training, mentoring, staffing, and consulting (in machine-learning, big data, visualization, and open-source), will also fund the Labs. If you would like to learn how your organization can benefit from our experience and connections, please get in touch.
There are millions of people and tens of thousands of companies who now depend on NumPy and SciPy even as hardware architecture is changing (many-core, GPUs, TPUs and FPGAs), file formats and file systems are changing, and cloud/remote computing becomes the norm. In order to keep taking advantage of these innovations, all of the projects in the NumPy, Pandas, and Jupyter ecosystems need full-time people to make improvements and modifications that will ensure these libraries can be relied on for years to come. Everyone who uses these libraries can benefit form an affordable open source subscription from Quansight which will give you direct access to Quansight Labs members.
Hiring Open Source Developers
Many (particularly large) companies have recognized the benefit they receive by hiring open-source developers directly on projects they care about. This is a common way that open-source will be supported in the future. However, this approach is not really accessible to every company that needs the connections, influence, and market intelligence that comes from directly supporting open-source developers. In fact, what many people find is that you can't really hire just one open-source project developer and get the same benefit as you can by "hiring" the entire Quansight Labs team for 1/2 of the cost.
Instead of hiring that one person to work on JupyterLab or SciPy or Pandas in your organization where you have to manage, motivate, and connect their output to your organizational goals, you simply make a contribution to Quansight Labs, or purchase a Community Work Order to identify your priorities. Those developers are then able to be mentored and managed by me (Travis), Ralf Gommers, Anthony Scopatz, David Charboneau, Chris Colbert, Pearu Peterson, and others while being surrounded by talented open-source enthusiasts. You end up getting more impact because of the network effects of our team and the fact that people like to work together and on teams.
Open Source is fundamentally a social activity. When you hire a single open-source developer, you run the risk of having that person not integrate with your company culture or be absorbed by your company priorities leading to what open-source communities call "corporate-capture" of open-source developers. Partner with Quansight and Quansight Labs, and you can achieve your open-source impact as well as ensuring the open-source you rely on is maintained and moved in the direction you need. Reach out to us and let us know how we can help you.