Category

Blog

What’s new with R Consortium funded projects in Q2 2018

By | Blog, R Consortium Project

In an effort to provide greater transparency with respect to R Consortium activities, the ISC provides quarterly updates for all R Consortium funded projects. The following is our update for Q2 2018.

histoRicalg — Preserving and Transferring Algorithmic Knowledge

The HistoRicalg project is seeking participants — both active and in a review capacity — to help select issues in older algorithms that should be addressed.
We are setting up a working group to identify possible issues in older algorithms.
Some concerns have already been identified and we are starting to address them. See wiki for details.

Forwards Workshops for Women and Girls

foRwards is pleased to announce upcoming R workshops in Melbourne and Auckland. We thank the R Consortium for funding. See the GitHub page for details.

Code Coverage Tool for R

While the software development goals of this project have been achieved through the covr package re-released in summer 2017, we continue to make progress on the secondary goal to integrate package best practices into the R Community. We pursued this along two threads.

First, we conducted a survey on the understanding and use of open source licenses and their implications for the R Community. We blogged about the results here.

Second, we reviewed the Linux Foundation Core Infrastructure Initiative Best Practices Badge Program. Initially, we considered branching a version of the CII tailored to R, however, in further discussions, it appears the CII Best Practices Badge Program can be adopted as is. We are currently conducting a survey of the R Community soliciting feedback on having the CII Best Practices Badge Program be a recommended practice from the R Consortium, as well as to identify any necessary enhancements to the questionnaire.

Look here for refactoring and updating the SWIG R module

R User Group Support program

The RUGs program continues to enroll additional user groups. As of June 14th 87 groups are participating in the program: 2 Array level, 14 Matrix level and 71 Vector level. Additionally, we have sponsored 10 small conferences since the beginning of the program. The RUGs program will run through September 30, 2018.

stars: Scalable, spatiotemporal tidy arrays for R

The stars project is underway. Look here for details and to get involved and here for examples and reports.

Sat R Days

satRdays is growing amazingly well!

We’ve got 8 events coming up in the next year, with plans to add more events too. At satRdays we’re baking in a commitment to diversity and it’s going amazingly well. The most recent event in Cardiff, UK had 11 of 14 speakers coming from under-represented groups.

We’re looking for more folks who want to organise satRdays events, particularly outside of the Europe region. If any one is interested you can read our growing docs, including about what its all about at knowledgebase.satrdays.org and chat to us on the global R User Group leaders Slack bit.ly/ruglslack You can also come one board to help with central stuff like the website, building up the documentation, marketing, and supporting new event organisers.

Conference Management System for R Consortium Sponsored Conferences

One of the next big areas for us to think about the next wave of central funding from the Consortium and whether she should do anything to have an official entity for the central administration associated with these conferences.”
Conference Management System for R Consortium Supported Conferences “After making performing an extensive review, Odoo was identified as the most feature-rich platform for hosting events however, there were limitations that particularly impacted academic-oriented events.

We discussed the option of extending Odoo as it was based on Python, but it was felt that trying to make a one size fits all solution was not the best approach.

The alternative we discussed and will now move forward with is a flexible and minimal solution in Hugo, the basis for blogdown.

Our revised proposal will see rapid development and iteration using satRfdays as the test subject. This will mean other R events can leverage the solutions developed for satRdays and the technology will be proven by the next UseR! iteration

Quantities for R

The r-quantities project has reached the third milestone. The first prototype has been polished and aligned with recent developments in the units package. Efficient parsers have been implemented to read data with units and/or errors into quantities objects. The documentation has been extended to provide a comprehensive guide on working with quantities in two common data wrangling workflows. Further details about these developments can be found in the three articles published so far in r-spatial.org.

Proposal to Create an R Consortium Working Group Focused on US Census Data NA.

The working group held its first meeting on August 8th. If you are interested in getting involved, write to us at rconsortium-isc@lists.r-consortium.org

Ongoing infrastructural development for R on Windows and MacOS

Development of the new version of Rtools, and rebuilding of C++ libraries used by packages. We are now in the process of testing base-R and all CRAN packages with the new GCC 8.1 toolchain.

Developing Tools and Templates for Teaching Materials

We are in the planning phase at this stage. We’ll soon set up GitHub repository and website for more visibility, share planed features, progress, and give an opportunity for the community to provide feedback.

Joint profiling of native and R code

OS X support has been added and the main package has been renamed to jointprof to avoid a name clash with an existing package. Try it out, happy to take your feedback!

Maintaining DBI

The third DBI project is focused on technical and non-technical issues. We would like to present DBI at R meetups in Zurich and Berlin, and we have submitted a talk for the next satRday in Amsterdam. The renaming of duplicate columns in the output introduced in DBI 1.0.0 caused problems for RSQLite and will be reverted. The sqlr package by Nicolas Bennettaims at providing a backend-agnostic way to define the structure of a database, i.e., generate DML statements from R code similarly to SQLAlchemy for Python.

R-Ladies

Growth : The growth we saw at the start of 2018 has continued with now 25,000 R-Ladies (members signed up on meet-up). With 17 new groups in this quarter (5 in the US, 1 in Canada, 4 in Latin America, 3 in Europe, 2 in Australia, 1 in Asia and 1 Remote) increasing to more than 115 R-Ladies chapters worldwide (on meetup.com). Additionally a new R-Ladies remote was launched to allow R-Ladies far from a chapter/city to be involved in an R-Ladies group.
Improving infrastructure: Move to R-Ladies global meet-up Pro account to help align chapters expenses, new initiatives for Slack community in development.
Long term planning: Progress made on Charity set-up.
Supporting Rconsortium and RStudio along with R-Forward to improve the conference organisation and diversity requirements to make all future R conferences inclusive.

PSI application for collaboration to create online R package validation repository

The PSI AIMS SIG will lead the creation of an online repository / web portal, where validation which is of regulatory standard for R packages can be submitted and stored for free use. We will define a set of “Validation Criteria”, demonstrate it by applying it to the dplyr package, and then encourage contributions from R users to document validation of other packages and load them to the shared free access portal.

In June 2018, we attended the PSI Conference in Amsterdam to promote the idea and make contacts with potential future collaborators. Our next steps will be continue work on the Validation Criteria Framework engaging key opinion leaders at the R/Pharma conference in August. If you have experience in R validation and are interested in working with us on this project please contact taylorlyn@prahs.com.

R Documentation Task Force

Beta packages with limited functionality are being prepared for release.

 

Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results

By | Blog, R Consortium Project

Core Infrastructure Initiative Best Practices logoBased on our Fall 2017 survey, where the R Consortium asked about opportunities, concerns, and issues facing the R community, the R Consortium conducted a new survey this past month to solicit feedback on using the Linux Foundation (LF) Core Infrastructure Initiative (CII) Best Practices Badge Program for R packages. With your feedback, the R Consortium will base its recommendation for using the CII.  Your feedback will also help us and the Linux Foundation evolve the CII with the needs of the R Community, and FLOSS projects in general, in mind.

Introduction

With over 12,000 R packages on CRAN alone, the choice of which package to use for a given task is challenging. While summary descriptions, documentation, download counts and word-of-mouth may help direct selection, a standard assessment of package quality can greatly help identify the suitability of a package for a given need – commercial, academic, or otherwise. Providing the R Community of package users an easily recognized badge indicating the level of quality achievement would make it easier for users to know the quality of a package along several dimensions. In addition, providing R package authors and maintainers a checklist of “best practices” can help guide package development and evolution, as well as help package users know what to look for in a package.

The R Consortium has been exploring the pros and cons of recommending that R package authors, contributors, and maintainers adopt the Linux Foundation (LF) Core Infrastructure Initiative (CII) “best practices” badge. This badge provides a means for Free/Libre and Open Source Software (FLOSS) projects to highlight to what extent package authors follow best software practices, while enabling individuals and enterprises to assess quickly a package’s strengths and weaknesses across a range of dimensions. The CII Best Practices Badge Program is a voluntary, self-certification, at no cost to submit a questionnaire and earn a badge. An easy to use web application guides users in the process, even automating some of the steps.

More information on the CII Best Practices Badging Program is available: criteria, is available on GitHubProject statisticscriteria statistics., and videos. The projects page shows participating projects and supports queries (e.g., you can see projects that have a passing badge).

What did we learn?

Will the CII Best Practices Badge Program provide value to the R Community’s package developers or package users? 90% of survey respondents say ‘yes’ with 77% saying it has benefit for both developers and users. Perhaps not surprisingly, 95% of respondents had never heard of the CII before, but 74% would be willing to try it. This is according to 41 respondents, 56% of whom have been developing R packages 4 years or more, and over 60% who have developed two or more packages.

Of the six categories covered by the CII – licensing, documentation, change control, software quality, security, code analysis – over 55% of respondents found all criteria to be somewhat or highly beneficial. Over 80% found documentation and software quality criteria to be somewhat or highly beneficial. The details are provided in the table below.

Table: Expected degree of benefit for each CII criteria category

Using an open ended question, we asked respondents why the CII is good for the R Community? Here is a summary of the responses. The CII…

  • helps users discover and select R packages that adhere to software development best practices.
  • shows R developers through the badge criteria what is possible or desirable for FLOSS, especially if developers do not have a software engineering background.
  • provides an additional degree of assurance to the user community around package quality as well as provide a way for developers to assert more formally that they follow such best practices.
  • gathers and presents lessons learned from other FLOSS projects so developers don’t need to re-discover them.
  • creates an incentive to adopt a consistent set of practices throughout the R ecosystem.

While respondents were generally very positive about the use of the CII, concerns did arise:

  • Does the CII Badge have the “correct” or “best” set of criteria?
  • Achieving a badge does not necessarily mean a given package well designed or implemented.
  • How does the CII help to ensure the validity of self-certification, e.g., through automated tools?
  • Even if every R package had a badge, the issue of finding a needed package among over 12K packages remains.
  • Can the CII criteria be streamlined to reflect only the needs of R packages, including those that are more data and documentation than code?
  • Will automated tools be available for performing at least parts of the assessment, e.g., as found in R’s devtools?
  • A badge program could penalize developers who do not have time, money, or skills to meet the criteria, making their packages less desirable if they do not achieve a badge.
  • Introducing more process comes with additional burdens for package developers, perhaps reducing overall ecosystem participation.
  • Is there a way to distinguish tests for validating statistical software numerical computations and statistical properties?

Suggestions from the respondents on how best to take advantage of the CII Badge Program include:

  • The CII should be sure to reflect the existing quality criteria provided through CRAN.
  • Integrate the CII with CRAN or Bioconductor, e.g., display badges on respective package CRAN pages to give CII more visibility and so that users can identify more easily which package to use.
  • Use the CII to encourage package developers to train themselves in best practices.
  • Develop an automatic framework that will create/enforce all the criteria whenever possible.
  • Make the security criterion conditional based on what the package does. If a package never goes outside the R session, does it need a dedicated security expert?
  • Require packages implementing a statistical method be backed up by a peer-reviewed article.
  • Make it easier to recognize which criteria categories passed and by what percentage in a high level visual representation, perhaps incorporated into the badge itself.
  • “Not use it at all, it creates false impressions and discriminates against good domain packages in disciplines that simply use software rather than seek rewards.”
  • “Encourage R-Core to adopt these practices for R itself. Also, loosen the approach to LICENSE files on CRAN so as to make compliance easier.”
  • Keep it simple.

As you can see, there is quite a range of sentiment expressed regarding introducing such a badging program. Some concerns seem to be based on misunderstandings, for example, the badging process does not require a “dedicated security expert,” and there already is some degree of automation in the process. The R Consortium is grateful to the respondents for taking the time to provide their insightful and thoughtful responses. We will continue to work with the CII team to explore addressing the issues raised above, including clarifying misunderstandings where we can do so.

Since initiating this survey, however, multiple package have already taken the plunge to try the CII badge program:

foghorn R package to summarize CRAN Check Results in the Terminal https://github.com/fmichonneau/foghorn
osrm Shortest Paths and Travel Time from OpenStreetMap with R https://rgeomatic.hypotheses.org/category/osrm
R_Matrix R package for Sparse and Dense Matrix Classes and Methods

A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, …

http://matrix.r-forge.r-project.org
base64enc R tools for base64 encoding https://github.com/s-u/base64enc
ggplot2 An implementation of the Grammar of Graphics in R https://ggplot2.tidyverse.org
covr Test coverage reports for R https://github.com/r-lib/covr
datastructures Implementation of core data structures for R. https://dirmeier.github.io/datastructures
madrid.air R package to parse air quality data published by http://datos.madrid.es/. https://github.com/nramon/madrid.air
pandas Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical… http://pandas.pydata.org
An R Package for Quick Uncertainty Intervals ciTools is an R package that makes working with model uncertainty as easy as possible. It gives the user easy access to confidence or prediction intervals… https://github.com/jthaman/ciTools
dodgr Distances on Directed Graphs in R https://ATFutures.github.io/dodgr
netReg Network-penalized generalized linear models in R and C++. https://dirmeier.github.io/netReg
DBI A database interface (DBI) definition for communication between R and RDBMSs http://dbi.r-dbi.org

 

If you’re a package developer, we hope you’ll join the package developers above and start your own CII Best Practices Badge. The survey will remain open to collect your feedback on the experience.

Interview R/Medicine 2018 conference organizer Micheal Kane

By | Blog, Events

The first annual R/Medicine conference is being held September 7-8 in New Haven, CT, and is a collaboration between R Consortium and the Yale School of Public Health Biostatistics Department. As the first public activity of the R in Medicine working group, it’s set to be a key event to bring together the medical community that leverages R in medical research and clinical practice.

Leading the conference committee is Michael Kane, who is an Assistant Professor in Yale University’s Biostatistics Department. With the conference coming soon he agreed to answer a few questions about the R community and the conference itself.

Tell us about the medicine industry’s use of R?

Michael: My first exposure to medicine using R came from my internship at Revolution Computing (now Microsoft R). At the time most companies used SAS and Revolution had started providing validated versions of R and some packages, similar to SAS’s validation process, which would allow R to be used in submissions to the FDA. Since then, the rules have changed, and R sees a lot more use in this space because it provides inexpensive access to powerful tools for designing and analyzing health, genetic, and clinical data. 
We are currently using and developing these tools to find subtypes in immunotherapy studies for treating cancer. Patients can respond very differently to cancer therapies depending which stage of cancer they have, how many previous treatments they’ve had, and the diversity of the tumor environment. By understanding how factors like these are related to prognostic heterogeneity, we can do a better job prescribing people with cancer the most effective possible treatments.

What drove you to create an event to bring together the R medicine community?

Michael: This conference was inspired by R/Finance. The committee does a fantastic job of providing an entertaining and informative conference. The richness and diversity of the talk subjects show how vast finance is and, at the same time, the speakers and other attendees are completely accessible. We want to bring that same sense of inclusiveness and collaboration to medicine, where sometimes practices become siloed. We hope people realize that we, in medicine, are also part of a rich large and rich area of research and we hope the conference helps to jell the community.

What is the organizing committee’s goals and measures of success for this first event?

Michael: Our goal for the first year to better understand the community as a whole. We are expecting submissions from the clinical trials community, the genetics and omics community, and the epidemiology community. We are hoping we get submissions from both academia and industry. We want to see how people are using R to advance human health. I’ll consider the conference a success if attendees find at least one talk where they are surprised, entertained, and delighted by a use of R that hadn’t occurred to them.
Our other goal is to reinvest in the conference. If we are successful, and we are able to secure enough sponsorship, then we would like to make it easier for people to attend the conference. This would include providing more awards for travel, particularly for students.

How do you see working with R Consortium as critical for driving consensus and critical mass in the medicine community?

Michael: The R Consortium has become the umbrella for the entire R community. Their approval lets the community know that this is the conference to go to if you are using R in Medicine.
We thank Micheal for his time, and hope that if you are in the medical community using R that you look to attend this event.

R Consortium is soliciting your feedback on R package best practices

By | Blog, R Consortium Project

With over 12,000 R packages on CRAN alone, the choice of which package to use for a given task is challenging. While summary descriptions, documentation, download counts and word-of-mouth may help direct selection, a standard assessment of package quality can greatly help identify the suitability of a package for a given (non-)commercial need. Providing the R Community of package users an easily recognized “badge” indicating the level of quality achievement will make it easier for users to know the quality of a package along several dimensions. In addition, providing R package authors and maintainers a checklist of “best practices” can help guide package development and evolution, as well as help package users as to what to look for in a package.

The R Consortium Infrastructure Steering Committee (ISC) is exploring the benefits of recommending that R package authors, contributors, and maintainers adopt the Linux Foundation (LF) Core Infrastructure Initiative (CII) Best Practices Badge. This badge provides a means for Free/Libre and Open Source Software (FLOSS) projects to highlight to what extent package authors follow best software practices, while enabling individuals and enterprises to assess quickly a package’s strengths and weaknesses across a range of dimensions. The CII Best Practices Badge Program is a voluntary, self-certification, at no cost. An easy to use web application guides users in the process.

More information on the CII Best Practices Badging Program is available: criteria, is available on GitHubProject statistics and criteria statistics. The projects page shows participating projects and supports queries (e.g., you can see projects that have a passing badge).

As a potential initiative for the R Community, we encourage community feedback on the CII for R packages. Also, consider going through the process for a package you authored or maintain. Your feedback will help us and the Linux Foundation evolve the CII to further benefit the needs of the R Community, and FLOSS projects in general. Please provide feedback using this survey.

On conduct and diversity in the R Community

By | Announcement, Blog, Events, News, R Consortium Project

An explicit goal of the R Consortium is to help create a welcoming space for everyone, no matter their race, ethnicity, gender, gender identity and expression, socio-economic status, nationality, citizenship, religion, sexual orientation, ability, or age. Diversity and inclusion are essential to foster true collaboration, move ideas forward, and create long-term sustainable community.

R Consortium recently sponsored R/Finance 2018, where it was found that there were insufficient diversity and inclusion practices, including the absence of a prominently displayed Code of Conduct. This illuminated shortcomings with our existing processes for sponsoring conferences. We are troubled and disappointed to have sponsored a conference that does not reflect our core beliefs in diversity and inclusion.

The Infrastructure Steering Committee (ISC) has approved the creation of a new working group to address diversity and inclusion issues in the R community. The R Community Diversity and Inclusion Working Group (RCDI-WG), which will include members from R Community groups that promote diversity, such as R-Ladies and FORWARDS, event organizers, and key industry members, will focus on three areas:

  • Work with conferences organizers to ensure diversity is addressed as a priority in both their program committees and speaker lineups.
  • Establish recommended Code of Conduct and Diversity Guidelines for R Community events, which will be adopted by the R Consortium and required for any event that the R Consortium participates in.
  • Have an ongoing conversation on opportunities to drive diversity and inclusion across the R Community.

This group is open to any member of the R community, and you can join by signing up for the mailing list. The group plans to have a kickoff meeting soon to work on the Code of Conduct and Diversity Guidelines, with the goal to have them established later in summer 2018. Look for updates on progress on the R Consortium blog.

Announcing the R Consortium ISC Funded Project grant recipients for Spring 2018

By | Announcement, Blog, News

The R Consortium supports the R Community through investments in sustainable infrastructure, community programs and collaborative projects. Through the The Funded Project Program, now in it’s fourth year, the R Consortium has invested more that $650,000 USD in over 30 projects that impact the over 2 million R users worldwide.

We are pleased to announce the Spring 2018 grant recipients. We will provide updates on these projects throughout the year. Congratulations to all grant recipients, and look forward to our session at useR!2018 this July where many of our funded projects will showcase their work and tips for leveraging the grant program for driving open collaboration.

Maintaining DBI

Grantee: Kirill Müller

DBI, R’s database interface, is a set of methods declared in the DBI R package. Communication with the database is implemented by DBI backends, packages that import DBI and implement its methods. A common interface is helpful for both users and backend implementers.

The Maintaining DBI Project which follows up on two previous projects supported by the R Consortium will provide ongoing maintenance and support for DBI, the DBItest test suite, and the three backends to open-source databases (RSQLite, RMariaDB and RPostgres).

Ongoing infrastructural development for R on Windows and MacOS

Grantee: Jeroen Ooms

The majority of R users rely on precompiled installers and binary packages for Windows and MacOS that are made available through CRAN. This project seeks to improve and maintain tools for providing such binaries. On Windows we will upgrade the Rtools compiler toolchain, and provide up-to-date Windows builds for the many external C/C++ libraries used by CRAN packages. For MacOS we will expand the R-Hub homebrew-cran with formulas that are needed by CRAN packages but not available from upstream homebrew-core. Eventually, we want to lay the foundation for a reproducible build system that is low maintenance, automated as much as possible, and which could be used by CRAN and other R package repositories.

Developing Tools and Templates for Teaching Materials

Grantee: François Michonneau

The first-class implementation of literate programming in R is one of the reasons for its success. While the seamless integration of code and text made possible by Sweave , knitr, and R Markdown was designed for writing reproducible reports and documentation, it has also enabled the creation of teaching materials that combine text, code examples, exercises and solutions. However, while people creating lessons in R Markdown are familiar with R, they often do not have a background in education or UX design. Therefore, they must not only assemble curriculum, but also find a way to present the content effectively and accessibly to both learners and instructors. As the model of open source development is being adapted to the creation of open educational resources, the difficulty to share materials due to a lack of consistency in their construction hinders the collaborative development of these resources.

This project will develop an R package that will facilitate the development of consistent teaching resources. It will encourage the use of tools and lesson structure that support and improve learning. By providing the technical framework for developing quality teaching materials, we seek to encourage collaborative lesson development by letting authors focus on the content rather than the formatting, while providing a more consistent experience for the learners.

PSI application for collaboration to create online R package validation repository

Grantee: Lyn Taylor (on behalf of PSI AIMS SIG)

The documentation available for R packages currently widely varies. The Statisticians in the Pharmaceutical Industry (PSI) Application and Implementation of Methodologies in Statistics (AIMS) Special Interest Group (SIG) will collaborate with the R-Consortium and representatives from pharmaceutical companies on the setting up of an online repository /web portal, where validation which is of regulatory standard for R packages can be submitted and stored for free use. Companies (or individual R users) would still be liable to make their own assessment on whether the validation is suitable for their own use, however the online repository would serve as a portal for sharing existing regulatory standard validation documentation.

A unified platform for missing values methods and workflows

Grantees: Julie Josse and Nicholas Tierney

The objective is to create a reference platform on the theme of missing data management and to federate contributors. This platform will be the occasion to list the existing packages, the available literature as well as the tutorials that allow to analyze data with missing data. New work on the subject can be easily integrated and we will create examples of analysis workflows with missing data. Anyone who would like to contribute to this exciting project can contact us.

histoRicalg — Preserving and Transfering Algorithmic Knowledge

Grantee: John C Nash

Many of the algorithms making up the numerical building-blocks of R were developed several decades ago, particularly in Fortran. Some were translated into C for use by R. Only a modest proportion of R users today are fluent in these languages, and many original authors are no longer active. Yet some of these codes may have bugs or need adjustment for new system capabilities. The histoRicalg project aims to document and test such codes that are still  part of R, possibly creating all-R reference codes, hopefully by teaming older and younger workers so knowledge can be shared for the future. Our initial task is to establish a Working Group on Algorithms Used in R and add material to a website/wiki.

Interested workers are invited to contact John Nash.

Proposal to Create an R Consortium Working Group Focused on US Census Data

Grantee: Ari Lamstein

The Proposal to Create an R Consortium Working Group Focused on US Census Data aims to make life easier for R programmers who work with data from the US Census Bureau. It will create a working group where R users working with census data can cooperate under the guidance of the Census Bureau. Additionally, it will publish a guide to working with Census data in R that aims to help R programmers a) select packages that meet their needs and b) navigate the various data sets that the Census Bureau publishes.

 

Wanted: Your input on the next generation of R-Hub

By | Announcement, Blog, R Consortium Project

R-Hub, which was originally conceived as a useful tool for R package developers to build and test R packages on a variety of platforms, was the first project funded by the R Consortium. The initial version was released in June 2016. Now that the capabilities of R-Hub have progressed well beyond the proof of concept stage, the R Consortium is looking for ideas from the R community on how we can make it and even more useful for R users.

We would like to know how you think we could improve existing functionality and what new features you would like to see. So far, we have come up with the following list of future goals for R-Hub. We welcome comments and suggestions:

  • Enable organizations to deploy repositories and build infrastructure locally for use in controlled corporate environments.
  • Provide a system to manage source code, builds, and binary packages in a repository that offers confidence and trust to R users.
  • Enable end-users to use packages with confidence by providing tools to assess code pedigree, license, quality, security, and package maintenance for individual packages.
  • Encourage and enable package developers to provide metadata for their packages to help end users discover packages.
  • Provide package authors and maintainers a broad testing matrix that works on multiple architectures, operating systems, and R runtime engines.
  • Provide package developers with feedback required to assess and ensure broad compatibility for their packages.

We would very much appreciate comments on this vision for future development along with your assessment of the current system, including your answers to such questions as:

  • What value does R Hub provide you today?
  • What does R Hub not do well?
  • What other aspects of package development should R Hub add?
  • How could R Hub best serve the corporate package development, deployment, and management process?
  • Is there anything that CRAN isn’t providing that you would like to have?

Please send your comments to the following email address:  isc@r-consortium.com

Note that you may try R-Hub here.

What’s new with R Consortium funded projects in Q1 2018

By | Blog, R Consortium Project

In an effort to provide greater transparency with respect to R Consortium activities, the ISC is initiating process to provide quarterly updates for all R Consortium funded projects. The following is our update for Q1 2018.

Quantities for R

The r-quantities project has reached the first milestone with the design and implementation of an initial working prototype, which can be downloaded and tested from GitHub. Further details about the integration process that was necessary for the units and errors packages, as well as the next steps, were published in r-spatial.

stars: Scalable, spatiotemporal tidy arrays for R

The last full update was in November 2017. Recent activity includes work on merging datasets. Check out the project progress on Github.

Interactive data manipulation in mapview

The project is waiting for Barret Schloerke and the RStudio team to complete updating the leaflet package to leafletjs 1.3.1 which will enable major updates to mapedit. Once this is done, the project will mapedit accordingly and added new features as a response to the leaflet update.

Refactoring and updating the SWIG R module

Planning documents are available on our website.

R User Group Support program

So far this year, the R User Group Support program has disbursed nearly $27,000 in grants to 60 user groups and 9 small conferences. The option to participate in the R Consortium’s meetup.com PRO account has proved to be a very popular benefit. 40 groups have elected to participate so far. You can keep up with the activities of these groups on our Meetup page.

The RUGs program will run through September 30th. Look here for details on how to participate.

Establishing DBI

The “Establishing DBI” project is about to be completed. Schema support in DBI is perhaps the most exciting news. Almost all packages have been updated on CRAN, a few final technicalities need to be resolved. Expect a blog post soon on the new project page.

Joint profiling of native and R code

Unfortunately, pprof (and therefore also gprofiler) were not accepted by CRAN due to missing Go binaries on the build machines. Nevertheless, there has been some adoption by the community: for instance, one user was able to use joint profiling to understand a performance problem in the tidyselect package. Work on the project will resume soon. This will include adding support for OS X and adapting the packages so that they will be accepted by CRAN.

R Documentation Task Force

This project still needs help on implementing methods. To join send an email to Andrew dot Redd at hsc dot utah dot edu, expressing your interests, skills or expertise as it relates to R documentation. Also email if you have ideas or concerns but do not wish to play and active role.

Conference Management System for R Consortium Supported Conferences

The project has completed a thorough evaluation of different open source solutions for managing R conferences, and is now compiling a report to facilitate next steps.

Sat R Days

The second SatRday conference was recently held in Cape Town. New conferences are being scheduled.

R-Ladies

R-Ladies expanded by 20 new groups (7 in the US, 4 in Latin America, 4 in Europe, 4 in Africa and 1 in Asia) in the first quarter of 2018, increasing to more than 90 R-Ladies chapters worldwide.

 

Package Licensing: Would the R Community like some help? Feedback from the trenches

By | Blog, R Consortium Project

Editor’s Note: This post comes from Mark Hornick, who leads the Code Coverage WG and serves on the Board of Directors

In the Fall of 2017, the R Consortium surveyed the R Community to understand opportunities, concerns, and issues facing the community. Taking into account that feedback, the R Consortium recently surveyed package authors and maintainers on a number of topics surrounding R package licensing. Questions revolved around motivations for choice of license, comfort level in understanding license meaning and implications, importance of corporate adoption of R, and whether guidance on licensing from the R Consortium would be valuable.

While there are a significant number of people in the R community who respond they understand and intentionally choose the license(s) they apply to their package software, a much larger group are unclear about which license to choose and what the implications of that choice are. These implications affect not only the individual package, but the R Community and corporate, government, and academic users of those packages as well.  Of roughly 7400 invitations to complete the survey, the R Consortium received more than 1100 responses – a response rate over 14%.

In this blog post, we summarize that feedback and offer next steps that the R Consortium and R Community may take based on this feedback.

Who responded to the survey

Of respondents, 42% are relatively new to R package development with 3 or fewer years of experience, 31% have 4-6 years, and 27% have more than 7 years of experience. As the following table shows, the majority of package authors have been working with R for less than 6 years and writing up to 5 packages.

The largest subgroup of responders (44%) have produced one package over their career. However, 39% of responders have not pushed a package to CRAN over the past year.

The most popular license used among respondents is ‘GPL-3’ at 35% with ‘GPL-2 | GPL-3’ a close second at 34%, ‘GPL-2’ next at 24%, and ‘MIT’ at 21%. However, there are a mix of other licenses cited, including LGPL, BSD, Apache2, and Creative Commons, among others.

What do I want others to be able to do with my package?

When it comes to open source software, there are many ways to think about how software could be used. For example, you may want everyone to be able to freely use your software by its API, but have concerns about what happens if the underlying code is modified – derivative works. On the other hand, you may want to impose licensing requirements on the software that uses your software as well, e.g., software that uses my package must be licensed in the same way as my package. The license choice can significantly affect how and whether a given package can be used in corporate, academic, or government environments.

From the survey, 60% of respondents want other software developers to be able to use their package(s) without imposing license requirements on the software that uses their package (via API), with only 15% disagreeing.

The majority of respondents were neutral as to whether they wanted to ensure that software using their package(s) must apply the same license that they chose, with 29% agreeing and 19% disagreeing.

As expected, respondents want to ensure that derivative works of their package(s) remain open source, with 74% agreeing. However, only 25% agree that derivative works should require the same license as the package used.

How do you choose a license for your package?

In the survey, we asked which factors contribute to the choice of package license.  Sixteen percent of respondents indicated license choice defaulted to the license of dependent packages, whether used exclusively through their API or if they borrowed code or header definitions. A sizeable 65% indicate that it is a conscious choice based on their understanding of open source and other license terms. But this is tempered by responses described in the next section regarding comfort level with understanding open source licenses.

The open comments section for this question revealed more details, e.g., some respondents consult websites, blogs, and books for license recommendations, or get advice from package reviewers. Some respondents admit they haven’t thought deeply about the choice of license and don’t understand the differences between licenses since the choices and legalese can be overwhelming. Some use what other respected package authors have used (without necessarily understanding why a given license was chosen for such a package) or as determined by corporate or government dictates or requirements. Yet other respondents indicated making an arbitrary or random choice since R package submission requires that some license must be chosen.

The open comments also highlighted some potential misconceptions, such as if a package author chooses GPL-2 for their package, they are unable to change that to a more permissive license later. The ability to change a license depends on multiple factors, e.g., licenses of dependent packages or lifted code, whether all authors give their consent, etc. Some respondents state they want licensing that enables more users of their code rather than fewer. Others see GPL as a way to ensure commercial usage of their packages occurs fairly. Some respondents choose BSD as it provides most freedom to package users.

 

Open Source License knowledge

For the R Consortium to understand whether resources should be applied to the problem of licensing, we asked package developers about the level of understanding of open source licenses. While 12% outright stated they do not feel comfortable interpreting or applying open source licenses, 62% find license details and options confusing – even if they understand the basic premise of open source licenses.

Only 23% felt confident in choosing the right open source license(s) for their packages, while about 1% claim to have access to Legal Counsel to guide their choice of open source licenses. Another 1% claim to have sufficient legal background to choose the appropriate licenses(s) for their packages.

While licensing is important when trying to use software in corporate settings, only 24% of respondents consider the license of an R package important in determining whether or not they use it – 35% are neutral and 40% don’t think it’s important.

A majority (56%) of respondents believe that corporate adoption of R technology (engine and packages) is important for the R Community – 36% are neutral while 8% feel corporate adoption is not important. Consistent with this, 56% of respondents feel the R ecosystem should make it easy for corporate use of R – 37% are neutral and 6% disagreeing.

Tools and Guidance

As open source communities and technology continues to evolve, there are more tools available to assist with license choice. For example, code scanning tools exist in other open source communities to identify potential licensing issues. While following the advice of such tools is optional, most if not all developers want to “do the right thing” with respect to licensing. Testament to this is that over 71% of respondents indicated they would welcome the availability of a license scanning tool to flag package license issues – only 3% disagreed.

With the objective to enable package developers to make an informed choice of license, respondents were asked if they would like the R Consortium to provide guidance on open source license choices and implications. Over 89% indicated they would. One respondent put it best “I want whatever is best for making sure the CRAN community thrives in the long-term.” This is the intent of the R Consortium as well.

The R Consortium thanks the respondents to this survey for taking the time to share their experience, concerns, and needs. As a next step, the R Consortium will work with the R Community to provide best practices for good “license hygiene.” If you would like to be part of this activity, please reach out to the R Consortium by responding to this post.

R Consortium welcomes R-Ladies as a top level project

By | Announcement, Blog, News, R Consortium Project

In 2016, R-Ladies started their effort for a global expansion, with the help from the R-Consortium. Back then there were only 4 active chapters (San Francisco, Taipei, Twin Cities  and London) and the goal was to expand to 5-10 cities within the next year. The enthusiasm within the R community for local R-Ladies chapters far exceeded any possible expectations! As of March 2018, the organization has over 90 chapters and almost 19,000 members.

There are R-Ladies chapters in 45 countries around the globe, with many chapters hosting monthly events.  

With this fast growth, it became apparent to the R Consortium ISC that this project needs long term investment for success. The diversified voice of R-Ladies, speaking not only as a group representing gender minorities in tech but also a group attracting new R users, aligned with the R Consortium’s defined Code of Conduct and its desires for building a more diverse and inclusive R community.

R Consortium ISC is pleased to announce and welcome R-Ladies as a top level project. R-Ladies has shown a big commitment within the R community and becoming a top level project will provide them a longer term budget cycle (3 years instead of 1 year) to support their community. R-Ladies will also have a voting seat on the ISC (represented by Gabriela de Queiroz).

We invite all in the R community to congratulate R-Ladies on this milestone, and look forward to ensuring they have the infrastructure and funding to bring more diversity to the R community.