R Community Explorer – Google Summer of Code Projects

By February 21, 2020 Blog

By Benaiah Ubah, Claudia Vitolo and Rick Pack

Introduction

Google Summer of Code (GSoC) is an annual 3-month open-source software development (coding) program that provides a platform for mentors and students (mentees) to collaborate on open source projects. This article highlights our accomplishments in the final coding phase of the 2019 GSoC project: “Data-Driven Exploration of the R Community”. The first part of the project explored R-Ladies chapters, the second part explored all R user groups available through Meetup.com and in the last phase we explored Google Summer of Code projects under the R Project over the past 12 years.

What We Achieved

1. Aggregating all R-GSoC projects into a CSV file presenting names of students, mentors, and projects and computing summaries for students, mentors, and projects and storing them in JSON format

2. Assigning all 215 R-GSoC projects into a work-product category among: Package, Infrastructure, Data, Database, GUI, Visualization, Documentation and Application

3. Updating the names of students and mentors to maintain consistency – some names are abbreviated, some are just Google user names and others appear differently across projects

4. Charting work-product distribution using grouping functions from d3.js and charting functions from echarts.js

5. Building a dashboard using similar tools described in our article here

6. Creating a word-cloud from the projects’ topics using d3.js and d3-layout.cloud.js libraries, and charting the top 20 frequent words

While you may not read about R-Google Summer of Code (R-GSoC) activities every day via blog posts and Twitter, many important R contributions have emerged from R-GSoC activities. Example past R-GSoC projects include enhancements of Toby Dylan Hocking’s animint [animated interactive plots] package and statistical modeling R packages like the Stan-using BayesHMM.

This is a screenshot from our R Community Explorer’s “Past GSoC R Projects” section:

Our previous articles discussed our celebration of R-Ladies and the general R User Group community through open-source dashboards that highlight the growth, geographical distribution, and activity of the R community on Meetup. We hope applying this similar approach to exploring R-GSoC projects will encourage more R-GSoC proposals, increase consideration of prior projects, and attract more participants to the R ecosystem.

Dashboard Summaries

   The following summaries are displayed on the dashboard:

+ Most active mentors

+ Students returning as mentors

+ Students returning for another GSoC

+ Counts and averages of projects, students and mentors

+ Count of projects co-mentored by former GSoC students

+ Work-product distribution

The dashboard could be found at this link: https://benubah.github.io/r-community-explorer/gsoc.html

A Few Highlights

Google has funded 215 R projects, accomplished by 189 students and 202 mentors in the past 12 years of GSoC. The number of projects is quite significant – thanks to Google’s generosity towards the R-Project by giving them adequate GSoC slots each year.

The word-cloud and bar chart of top 20 words on the dashboard show that in the past 12 years of Google Summer of Code under R, data analysis, package development/enhancement and biodiversity applications have been the most popular. Modeling, interactive visualization, optimization and performance improvement have also taken top positions within GSoC projects.

From 2013, the number of mentors per year at least doubled the number of participating students. This is as a result of policies by the R org admins who require at least two mentors for each project so as to reduce student failure rates and improve mentor availability throughout the program.

25 Google Summer of Code students (13% of all students) under the R-Project have returned as mentors and they have co-mentored about 72 projects (33% of all projects) in the past 12 years

Future Directions

The R-Project participated in the Google Code-In contest for the first time in 2019 and we are glad to explore the resulting data and report our findings. We generally hope that aggregating and reporting activities around many popular and unpopular aspects of the R language will bring greater visibility to the hard work of several contributors, highlight opportunities around Google programs, and continue to give the global R community a feel of the popularity of R over the years.