Access your data science: Jupyter

Data science is all about finding new ways to use and view datasets, but sometimes it can be difficult to work with that data once it reaches a certain scale. Even early on in the development of Jupyter, people working on the project recognized the potential that it had to make computer science more streamlined. The interface was simple enough to be non-intimidating, but it was also capable of being engaging and doing complex things.

A screenshot of a Jupyterlab session showing a simulation of the Lorenz equations

On an episode of Open Source Directions we were joined by two core developers of Jupyter; Carol Willing and Matthias Bussonnier, both of whom see the difference it is making in developers lives. They explained more about the project and the direction it is going. To see the webinar follow this link to YouTube, or visit our Open Source Directions page for the RSS podcast version.

The core of this project fills a need for interactive computing while having literate computing (as opposed to literate programming). It fulfills a crucial role for exploratory programming, which enables the user to interact and play with their data. Even if someone does not have a background in computer science, they can rapidly leverage their computing power with Jupyter. Jupyter provides a default interface for interactive computing, but it also defines a protocol for how communication with computing takes place. Part of what makes it so great is the ability for users to customize so much of this project. This customizability comes from the way that Jupyter is built around a standard messaging specification designed for communication with a number of front ends. With all this in mind, it is no wonder that in 2018 there were approximately 8 million Jupyter users globally, and just for comparison, that same year there were only about 21 million total developers worldwide.

"8 million Jupyter users globally, and only about 21 million total developers exist worldwide"

Though the concept behind Jupyter began in 2001 as IPython with Fernando Pérez, there have been many contributers who have helped shape this project and ecosystem into what it is today. Among the contributors are John Hunter (matplotlib), Travis Oliphant (Numpy), Brian Granger, Min Ragan-Kelley, Paul Ivanov, and Thomas Kluyver just to cite a few. In addition to the variety of talented contributors, the community is also composed of diverse users. As it is an open source project, it has been adopted in many areas of the world where it would be difficult to obtain a license to proprietary software due to cost. Furthermore, while many open source projects tend to appeal to specific language users, Jupyter users come from Python, Julia, R, and other language communities. Use cases can be seen anywhere that data science happens, whether that is in academia, industry, or government, which has helped make this such a ubiquitous tool. To promote this growth even further, developments like JupyterHub and Binder have helped to increase accessibility within the education field and have provided an unprecedented level of scientific collaboration.

With such a broad appeal, Jupyter is only going to gain a larger following as the data science field continues to grow. Looking ahead at the future, there are many plans in the works for the Jupyter community. At the time of our webinar in February of 2019, the JupyterLab team was pushing hard to release 1.0 and implement real-time collaboration. Aside from the technical development, there is also a growing need for sustainability development, particularly when it came to fundraising. Additional support from companies, organizations, and users will be a necessity moving forward because of the incredible amount of effort which goes towards making Jupyter a living project that can adapt to the needs of its users over time. In addition to monetary support, much of Jupyter’s success has come through other libraries which are used in Jupyter, so it is important not to overshadow their importance.

A picture of an orange and gray Jupyterhub logo

Jupyter leadership is currently exploring whether to hold a conference in 2019, which would help increase community cohesion and engagement. In general, greater efforts also need to be made to empower the community with discourse, GitHub, etc. With these areas of focus, Jupyter will continue moving forward and everyone in data science should be excited to see it progress. The software is free, so if you are ready to begin having more interactive access to your datasets, then get started on their website here. If you would like to help contribute to this project yourself, a list of communication channels can be found here.

6 views0 comments
..... ..... .....
..... ..... .....
...... ......