The March 2020 ISC Call for Proposals is now open. Once again, we are looking for ambitious projects that will contribute to the infrastructure of the R ecosystem and benefit large sections of the R community. Our goal is to stimulate creativity and help you turn good ideas into tangible benefits.
It is very likely that everyone who reads this post will be reorganizing aspects of their everyday lives to cope with the challenge of the Covid-19 virus. Accordingly, we are suggesting a theme for this call for proposals: What can we do to improve the R infrastructure for locating, accessing, cleaning and reporting on data related to the epidemic that will be useful now and in the future?
In the recently published post COVID-19 epidemiology with R, researcher Tim Churches highlights some of the challenges presented in acquiring accurate “real time” data. These include locating sources, writing code to scrape Wikipedia, a site whose structure may change every time it is updated, digging out data embedded in multiple different languages etc and providing mechanisms for researchers to store data, share code and exchange ideas.
But don’t be constrained by the theme. There is other work that needs to be done and we want to hear about ideas that we may be able to facilitate.
As always, “Think Big” but structure your proposal with intermediate milestones. The ISC is not likely to fund proposals that ask for large initial cash grants. We tend to be conservative with initial grants, preferring projects structured in such a way that significant initial milestones can be achieved with modest amounts of cash.
As with any proposed project, the more detailed and credible the project plan, and the better the track record of the project team, the higher the likelihood of receiving funding. Please be sure that your proposal includes measurable objectives, intermediate milestones, a list of all team members who will be contributing work and a detailed accounting of how the grant money will be spent.
To submit a proposal for ISC funding, read the Call for Proposals page and submit a self-contained pdf using the online form. You should receive confirmation within 24 hours.
The deadline for submitting a proposal is midnight, April 2, 2020.
Services include R consulting, development, and training; contributes to multiple R open source projects including golem, framework for building robust Shiny apps
SAN FRANCISCO, March 3, 2020 – The R Consortium, a Linux Foundation project supporting the R Foundation and R community, today announced that ThinkR has joined the R Consortium as a Silver Member. ThinkR provides R engineering, training, and consulting, and is based in France.
“We provide R Language infrastructure, engineering and training to our clients, and at the same time we believe it is important to give back to the R community by participating in open source projects, holding meetups and training, and promoting R in many ways. Joining the R Consortium will help us to expand our support for R even more, and allow us to work toward building better R infrastructure that helps R developers and our customers,” said Diane Beldame, CEO, ThinkR. “Joining the R Consortium will allow us to better support and promote the R community and that is a big benefit for our clients.”
ThinkR developers devote a part of their time to R and Data Science communities. This includes supporting various R packages on Github, holding meetups and other conferences connected to R, posting development tips on the ThinkR blog, and responding on Stackoverflow and other Slack communities.
“We are excited to welcome ThinkR to the R Consortium. ThinkR is on the front lines of providing R to industries in ways that immediately contribute to their customers’ success,” said Joseph Rickert, RStudio’s R Community Ambassador and R Consortium Board Chair. “At the same time, ThinkR contributes to the R community with open source projects and much more, and we’re very pleased they will be involved in moving the R Consortium forward.”
ThinkR has clients in a wide range of industries including public institutions, Pharmaceutical, Energy, Banking, Electronics Manufacturing, Research, and more.
The R Consortium is a 501(c)6 nonprofit organization and Linux Foundation project dedicated to the support and growth of the R user community. The R Consortium provides support to the R Foundation and to the greater R Community for projects that assist R package developers, provide documentation and training, facilitate the growth of the R Community and promote the use of the R language. For more information about R Consortium, please visit: http://www.r-consortium.org.
About Linux Foundation
Founded in 2000, the Linux Foundation is supported by more than 1,000 members and is the world’s leading home for collaboration on open source software, open standards, open data, and open hardware. Linux Foundation projects like Linux, Kubernetes, Node.js and more are considered critical to the development of the world’s most important infrastructure. Its development methodology leverages established best practices and addresses the needs of contributors, users and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org
Google Summer of Code (GSoC) is an annual 3-month open-source software development (coding) program that provides a platform for mentors and students (mentees) to collaborate on open source projects. This article highlights our accomplishments in the final coding phase of the 2019 GSoC project: “Data-Driven Exploration of the R Community”. The first part of the project explored R-Ladies chapters, the second part explored all R user groups available through Meetup.com and in the last phase we explored Google Summer of Code projects under the R Project over the past 12 years.
What We Achieved
1. Aggregating all R-GSoC projects into a CSV file presenting names of students, mentors, and projects and computing summaries for students, mentors, and projects and storing them in JSON format
2. Assigning all 215 R-GSoC projects into a work-product category among: Package, Infrastructure, Data, Database, GUI, Visualization, Documentation and Application
3. Updating the names of students and mentors to maintain consistency – some names are abbreviated, some are just Google user names and others appear differently across projects
4. Charting work-product distribution using grouping functions from d3.js and charting functions from echarts.js
5. Building a dashboard using similar tools described in our article here
6. Creating a word-cloud from the projects’ topics using d3.js and d3-layout.cloud.js libraries, and charting the top 20 frequent words
While you may not read about R-Google Summer of Code (R-GSoC) activities every day via blog posts and Twitter, many important R contributions have emerged from R-GSoC activities. Example past R-GSoC projects include enhancements of Toby Dylan Hocking’s animint [animated interactive plots] package and statistical modeling R packages like the Stan-using BayesHMM.
This is a screenshot from our R Community Explorer’s “Past GSoC R Projects” section:
Our previous articles discussed our celebration of R-Ladies and the general R User Group community through open-source dashboards that highlight the growth, geographical distribution, and activity of the R community on Meetup. We hope applying this similar approach to exploring R-GSoC projects will encourage more R-GSoC proposals, increase consideration of prior projects, and attract more participants to the R ecosystem.
The following summaries are displayed on the dashboard:
+ Most active mentors
+ Students returning as mentors
+ Students returning for another GSoC
+ Counts and averages of projects, students and mentors
+ Count of projects co-mentored by former GSoC students
Google has funded 215 R projects, accomplished by 189 students and 202 mentors in the past 12 years of GSoC. The number of projects is quite significant – thanks to Google’s generosity towards the R-Project by giving them adequate GSoC slots each year.
The word-cloud and bar chart of top 20 words on the dashboard show that in the past 12 years of Google Summer of Code under R, data analysis, package development/enhancement and biodiversity applications have been the most popular. Modeling, interactive visualization, optimization and performance improvement have also taken top positions within GSoC projects.
From 2013, the number of mentors per year at least doubled the number of participating students. This is as a result of policies by the R org admins who require at least two mentors for each project so as to reduce student failure rates and improve mentor availability throughout the program.
25 Google Summer of Code students (13% of all students) under the R-Project have returned as mentors and they have co-mentored about 72 projects (33% of all projects) in the past 12 years
The R-Project participated in the Google Code-In contest for the first time in 2019 and we are glad to explore the resulting data and report our findings. We generally hope that aggregating and reporting activities around many popular and unpopular aspects of the R language will bring greater visibility to the hard work of several contributors, highlight opportunities around Google programs, and continue to give the global R community a feel of the popularity of R over the years.
By Heather Turner, Chair of Forwards, the R Foundation taskforce for underrepresented groups in the R Community
In this post I will give the background to the Forwards Southern Africa 2020 project, for which we are running a crowd-funding campaign until February 5, 2020.
On March 6-7, 2020, Johannesburg will host the fourth satRday to be held in South Africa. satRdays are community-led, regional conferences, that support collaboration, networking and innovation within the R community. They were initiated by an R Consortium funded project, that ran pilot events in Budapest and Cape Town in 2016/2017. The conference series has been expanding around the world since then, with ten events in 2019.
For Joburg satRday 2020, I was invited to be a keynote speaker. As chair of Forwards, the R Foundation taskforce for underrepresented groups, I saw this as an opportunity to create an initiative focused on building the R Community in Southern Africa.
A first step was to offer a workshop on R package development, using the materials developed under the R Consortium project, Forwards Workshops for Women and Girls. This project ran package development workshops for women in New Zealand, Budapest and Chicago. Since there are still some funds left in the grant, we are able to offer some scholarships to women in Africa to attend the Joburg workshop and satRday. Women with visa-free access to South Africa may apply; the deadline for applications is midnight SAST, January 31.
The next step was to look beyond South Africa, to neighbouring countries. The following map shows cities in Africa with R-Ladies groups (purple), R User Groups (blue) or both (blue-grey):
The AfricaR consortium that took off at the start of 2019 has really helped to support the R community across Africa and has lead to the founding of several R User Groups, as well as the first satRday in East Africa (Kampala 2019) and the first satRday in West Africa which will take place in Abidjan, February 1, 2020. In Southern Africa, there are strong R User Groups and R-Ladies groups in both Cape Town and Johannesburg, but the R Community is only just starting to go beyond South Africa, with the establishment of Eswatini useRs last year.
UPDATE: The Adidjan satRday event was a big success! Here’s a photo of the full group. Videos of the talks should be available online soon.
The Forwards Southern Africa Project aims to build on this foundation, by organizing free workshops and meetups in collaboration with local partners in Eswatini, Botswana and Namibia. This project is also supported by the WhyR Foundation and AfricaR. The details of the events are still being finalised, but the planned itinerary is as follows:
Windhoek, Namibia (March 4, 2020, TBC)
In partnership with the Department of Statistics and Population Studies, University of Namibia:
Introduction to R for data analysis workshop (1 day)
Launch event of the first R User Group in Namibia
Manzini, Eswatini (March 11-12, 2020)
In partnership with the recently established Eswatini useR group. Registration is open for this 2 day event, that includes:
Introduction to R for data analysis workshop (1 day)
Data visualization workshop (1/2 day)
Meetup including talk on the R community and resources available for newcomers
Gaborone, Botswana (March 14, 2020)
In partnership with WiMLDS Gaborone and PyData Botswana:
Introduction to R workshop (1/2 day)
All these events can be supported via the crowdfunder where further updates will be posted. Updates will also be shared on the Forwards Twitter.
By Aurora González-Vidal (president) and Antonio Maurandi (vice-president), Users Murcia R (UMUR)
UMUR (Users Murcia R) is an association whose first official act was the organization of the X National Spanish meeting of R users in Murcia (2018) which established an inflection point in this annual meeting. We brought two amazing speakers: François Husson, who accompanied us, and Julia Silge, who participated by video call. Since then, we have been doing meetings every other month (workshops, talks…) with attendance of 35-45 people. We are trying not only to unite local people but also to provide them the chance to meet their references in the R community and make them participants in the R spirit. Recently we also had the opportunity to meet Max Kuhn, at the XI National meeting of R users celebrated in Madrid (2019).
Our secret as a young organization achieving great numbers of participation is that we are formed at least by 2 small groups and individuals that were independently “spreading the word of R,” each of them in their own environments.
The first informal group is called 00Rteam. We are based in academia and teach a large number of R courses aimed at specific audiences at the university level: PhD students (writing scientific papers with Rmarkdown, introduction to R and Rstudio, data tabulation in R, hypothesis contrast in R, multivariate analysis in R), teachers (automatic learning with R) and administrative staff (R4U).
Another informal group is made up a group of mathematicians and economists from the Faculty of Economics and Business, who are working on integrating R in their daily teaching. They use R and RStudio to create interactive pedagogic materials by exploring packages like rmarkdown, shiny, swirl and exams for teaching Statistics on the courses offered by the faculty.
Apart from that, our board of directors includes professionals who are actively spreading R in engineering businesses and banks and who have personal blogs and have authored manuals about R. This mix of interdisciplinary and enthusiastic people being in charge of an association has been able to attract a pool of interested people that is bringing us a lot of joy and knowledge interchange in Murcia.
An important characteristic of UMUR is that the number of women on the board is higher than the number of men. We are sensitive to gender equality, and we want to be an example of parity in the technology space showing that we are diverse in many ways. We think this fact is the key to success.
Signed: Aurora González-Vidal (president) and Antonio Maurandi (vice-president)
Thank you to Carlos Ortega, Principal Data Scientist, Teradata, for providing this summary and pictures from the conference
The XI Conference of R Users (XI Jornadas de Usuarios de R), held November 14 – 16, Madrid, Spain, was organized by the Asociación Comunidad R Hispano. The ambitious program and the invited international speakers made the participation massive, exceeding 200 attendees. The Conference was divided into two locations, Repsol (Spanish Gas and Oil company) and UNED (Spanish Distance Learning University), highlighting the university-business combination that has been one of the key factors in the success of the conference.
On Thursday, November 14, the opening ceremony was held at the Repsol Campus auditorium and attended by Emilio López Cano (president of the Asociación Hispano R Community), Julio Gonzalo (deputy vice chancellor for research at UNED), Enrique Dameno (Director of Digitalization and Integrated Customer Management of Repsol), and Teresa García (Repsol).
Max Kuhn (R Studio) gave a lecture on “Modeling in the Tidyverse,” and after that, in the round table “R in business,” the crucial role of data scientists in solving problems in diverse areas was covered. Raúl Vaquerizo (Pont Group), Noelia Ruiz (Mutua Madrileña), Jorge Ayuso (Telefónica España), Enrique Lasso (Repsol) and Carlos Ortega (Teradata) participated in the round table.
On the 15th and 16th, at the School of Education of the UNED, an extensive and vibrant program was developed with workshops, communications sessions, “lightning sessions,” poster sessions, round tables and invited conferences. Bernd Bischl (University of Munich) gave a lecture on MLR3, Jo-Fai Chow (H2O.ai) presented “Automatic and explainable machine learning in R,” and Max Kuhn gave a workshop on “Designing R modeling packages.”
Following the multidisciplinary philosophy of using R to handle any kind of data, communications sessions dealt with applications in genetics, data analysis, model and project management, society and culture, surveys and education, medicine and veterinary and economics and company. In addition to these monographic sessions, the “lightning sessions” dealt with many different topics.
A round table on Data Journalism was held to close the conference, moderated by Leonardo Hansa (R-Hispano) in which Virginia Peón (Indigitall), Alba Martín (Newtral), Antonio Delgado (Datadista) and Carmen Aguilar (Sky News) participated. The importance of knowing how to treat the data in an appropriate and honest way was highlighted, so that information that reaches the public is truthful.
In the closing ceremony, the prize for the Best Young Work of the Conference was announced, which went to Rocío Aznar Gimeno (Technological Institute of Aragon) for the work “Multilevel mixed models: An application of the lme4 library to estimate the fetal weight percentile in twin pregnancies.”
Congratulations to our very own Hadley Wickham, Infrastructure Steering Committee Chairperson, for winning the “Nobel Prize of Statistics.” The award is given to a person under the age of 41, in recognition of outstanding contributions to the profession of statistics. According to Wikipedia, the COPSS Presidents’ Award, along with the International Prize in Statistics, are considered the two highest awards in Statistics.
The award citation recognized Wickham’s “influential work in statistical computing, visualization, graphics, and data analysis” including “making statistical thinking and computing accessible to a large audience.”
In previous years, the award has primarily recognized theoretical contributions to statistics. This year is the first time it has been awarded for practical application.
Hadley is Chief Scientist at RStudio, a Platinum member of the R Foundation, and Adjunct Professor at Stanford University and the University of Auckland. The skills with statistics runs in the family: his sister is an Assistant Professor of Statistics at Oregon State University.
Hadley builds tools – both computational and cognitive – to make data science easier, faster, and more fun. His work includes packages for data science – a pioneering a suite of tools for R known as the “Tidyverse”: including ggplot2, dplyr, tidyr, purrr, and readr – and principled software development (roxygen2, testthat, devtools). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website, http://hadley.nz.
R is a fast-growing language for statistical
computing and graphics backed by a powerfully inclusive community of users and
developers. The R community received a significant boost when some
enterprises came together to establish
the R Consortium in 2015. Since then, the R Consortium has clearly proven its
purpose by operating transparently and in an unbiased manner – supporting the R
Foundation, infrastructure that broadly affects the R community, tools that
enhance the R software, R user-groups, events and diversity on a global scale.
R Consortium’s top level projects – R-Hub, R-Ladies, the
RUGS program, Events sponsorship, the
R Community Diversity and Inclusion program – , working groups and other ISC funded projects
highlight the significance of R Consortium’s involvement as a major supporter
of several critical developments around R in recent times.
To further enhance transparency, measure impact
and achieve even greater community inclusiveness, the R Consortium in Fall
2018, funded a new data-driven initiative to provide a way for the R community
to discover and track its activities over the years. This infrastructure is
dedicated to curating and rendering R Consortium activities via dashboards
using open-source technologies – all data and code are available at this GitHub repository that
is primarily maintained by me.
For a start, the ISC approved the development of
dashboards that highlight R Consortium’s accomplishments, with a focus on ISC
Funded Projects, RUGS program and Events/Marketing program. I
am delighted to communicate that, this initial scope has been successfully
covered and the corresponding milestones delivered. The next iteration of
development would include more aspects of R Consortium’s activities that have
broad impact on the larger R community. The following sections of this article
presents reasons why a data-driven initiative is useful for tracking R
Consortium’s activities, the deliverables for this project, benefits and future
Why a data-driven initiative to track R
1. In the past 5 years, R Consortium has
supported many R initiatives that encompass user-groups, events, diversity,
technical infrastructure, documentation, developing teaching materials, working
groups, etc But, how could the impact of these initiatives be measured in
numbers over the years? How could the global distribution of activities like,
the user-group and event support programs be ascertained?
2. ISC funded projects (both completed and
ongoing) are usually curated on a single web page. This initiative provides a way for searching
for these projects by year, grant-cycle, status, primary investigator, etc.
3. Before embarking on this project, there was no
way of ascertaining the distribution of funding across work-products. A data-driven infrastructure will help those
without experience applying for ISC grants, by giving them an overview of
work-products and cash-grant ranges that have received more funding over time.
4. R Consortium’s decision makers may find a
data-driven initiative helpful in planning future programs and packages.
5. Prospective R Consortium members that are
contemplating joining the R Consortium, could easily find and understand R
Consortium’s past accomplishments in a broad, transparent, insightful, and
6. Finally, comparing R Consortium’s mission
statement with its accomplishments from a data-driven perspective, is something
that the R Foundation, the global R community, present and future members of
the R Consortium would like to track and provide feedback on over time, for the
long-term growth and stability of the R ecosystem.
We now present to the R community, a suite of dashboard pages that render the corresponding R Consortium activities in a data-driven manner:
projects dashboard: Easily find ISC projects with enough
information to contact project owners for those thinking of contributing to
projects. Find most popular work-products ad cash-grant ranges for those
without experience applying for grants.
RUGS program dashboard:
Understand the global distribution of funded user-groups and their
funding-level distribution. Find information about these groups and how to get
in touch with those within your reach.
Events / Marketing dashboard:
Understand the global distribution of sponsored events.
Find aggregated summaries around all of ISC projects, RUGS program,
Events/Marketing program and the R-Ladies project.
It would be interesting to explore more of R
Consortium activities like working groups, and ISC projects that have
observable global impact on a running basis.
The R Consortium is committed to supporting the R community by funding projects that create important infrastructure and fortify long term stability for the R Community. The R Consortium’s Infrastructure Steering Committee (ISC) has developed a grant program that looks to help the broader R community.
The Call for Proposals opens today, September 13, 2019, and runs for a full month, through October 14, 2019.
In this round, the ISC is looking for projects that:
Are likely to have a broad impact on the R community.
Have a focused scope. Simple is better than over-ambitious. Larger projects can often be broken up into smaller steps.
The process for submitting a proposal has been has been updated annually to ensure that the process is as smooth as possible. Full details on proposal requirements, examples of previous projects, suggestions for what to avoid, and more, are included here.
Identifying all R user groups on Meetup.com required
more effort than R-Ladies groups. While R-ladies groups are centrally created and their names
follow a standard convention, the names of other R user groups are more difficult to predict.
We extended Curtis
Kephart’s technique for using string matching to retrieve upcoming R events
Match among all data science groups on Meetup
(7700 +) those with strings like “r user”,
“r-user”,“r-lab”,“phillyr”,“rug”,“bioconductor”,“r-data”,“rug” in their Meetup
URL names. We then performed a second round of string matching to search for
strings like “programming-in-r”, “r-programming-”, “-using-r”, “r-language”,
and “r-project-for-statistical” in the groups’ topics field.
Retrieve all user groups that mention
“r-project-for-statistical-computing” in their topics separately.
Retrieve all R-Ladies groups separately, which
was necessary to avoid missing some groups.
For this dashboard, the following procedure was followed:
used the meetupr
package to extract R user groups from Meetup.com
the existing find_groups()
functions in meetupr
to meet our requirements
and switched from the defunct Meetup API keys to OAuth 2.0 authentication
system. This switch was quite complicated and will be discussed further in
the data retrieved from Meetup via meetupr
from data frames to JSON, GeoJSON and CSV
the data by committing the JSON/GeoJSON/CSV files to the GitHub repository of
a static HTML dashboard interface based on an open-source Bootstrap template
the stored data via the dashboard interface
the process of extracting R user groups, data transformation and storage.
the dashboard via GitHub Pages
Tools We Used
this project as this combination offers great flexibility with automation and
used a mix of these tools to develop the dashboard:
R, RStudio and the following packages:
meetupr, curl, jsonlite and leafletR
Gentelella Admin Dashboard Bootstrap HTML template
Travis CI to automatically build the project, execute R scripts and bash commands
Bash commands to call R scripts and commit modified files to GitHub
Kephart (RStudio) for contributing
code that helped us with ideas on identifying R user groups on Meetup.
thank the authors of the meetupr package for their excellent work.
Special thanks to Jenny Bryan, Erin LeDell, and Greg
Sutcliffe for their help
over the last month with implementing the requirements for the new Meetup OAuth
2.0 authentication system.