Wednesday, March 20, 2013

Getting started in open source, PyData style

I'm attending/volunteering at PyData, and it's been great. Tutorials on pandas, Matplotlib, and NumPy were very useful, and meeting real-world data scientists is fantastic. Volunteering is a great excuse for saying hello to people (like Google director of research Peter Norvig!), as is helping attendees with their Matplotlib questions.

I was moved to blog by a panel featuring heavy contributors to major open source Python packages, including SciPy, pandas, IPython, scikit-learn, and CherryPy. This discussion and my observations of the PyCon sprints have changed my mindset: contributing to open source software has become much less intimidating! I want everyone to have this feeling, but particularly female students, because we are especially underrepresented in the OSS (open source software) world. Even small contributions to projects could be a way to impress employers, and could help change the ratio in CS overall.


The moderator asked the panelists about non-typical answers to the question of how people can get started contributing to OSS. Here are my favorite (paraphrased) quotes:
Documentation is HIGHLY needed. Improving documentation can be a great way to get started if you don't want to contribute code yet. Not only will it help you learn the project and the language, but the beginner perspective is often lacking from current documentation
Check out the pull requests on GitHub and contribute to the discussion. You can do this without having contributed any code yourself. Your perspective as a user is important.
Contribute cookbook-style examples, to help people see the forest for the trees and get into using the package. You could possibly throw it up as a gist, and/or perhaps as a IPython notebook. These are very helpful to expand the community and help developers see how to best grow the library for users. We [developers] don't see the successful use cases, instead we see the bugs.
Here were quotes that were a bit more conventional or more difficult to achieve, but still worth thinking about:
Check out the issues tracker on GitHub and see if there are any problems you can try to fix. On scikit-learn's repository we have labeled some issues "easy", so they would be good for new programmers.
Join someone who already knows what they are doing. Talk to them at sprints at conferences and perhaps code next to them. Tell them what your "pain points" are and maybe they can do something to help. In fact, developers want to know things that are issues for average users, for example: "this was difficult to understand in the documentation" or "this was tricky to use". Let us know, so we can have a friendlier product and grow the community.  
"Scratch your own itch" — discuss or fix a problem that you yourself have.

Consider helping with a project that's not so mature, you can have more influence.