window.intercomSettings = { app_id: "w29sqomy", custom_launcher_selector:'#open_web_chat' };Skip to main content
Category

Blog

R-users & Community: give us your feedback on a R Certification to teach & verify skilled R Professionals

By Announcement, Blog

In the past few years, we have seen an increase in the demand for R – both from employers looking for skilled R-users and professionals looking to further improve their skills. Due to this supply and demand gap, there have been various teaching channels created in an attempt to extend knowledge of the language. Even with the abundance of R teaching material, we still face a dearth of qualified, skilled R users. The inability to differentiate self-taught data scientists from qualified personnel creates confusion for employers and difficulties for quality professionals to separate themselves from the rest.

R Consortium started a working group that has identified an absence of a system to certify qualified R professionals as a cause for this problem. As a response to this, the group is working to create a certification for R that will allow professionals and students to acquire fundamental skills and knowledge of the language. Creation of this certification also aims to help recruiters identify and assess the skills of potential recruits. This group will be driven by the needs of the current R professionals and data science recruiters. More information about this initiative can be found here.

In order for this working group to create a valuable certification, we encourage community feedback in this initiative. Your feedback will help the working group to evolve this certification to best serve the needs of the R community. Please respond to this survey to help in the creation of this certification.

Progress on driving diversity standards in the R community – an update on RCDI-WG

By Blog, R Consortium Project

R Consortium announced four months ago the formation of the R Community Diversity and Inclusion Working Group (RCDI-WG), a focal point to address diversity and inclusion issues in the R community. It’s been great to see the interest and participation from the community, with over 50 individuals representing working groups, events, meetups, and projects coming together to drive forward sustainable diversity standards and practices.

The group has begun working on the initial deliverables outlined.

Code of Conduct

One of the working group’s key deliverables was a sample code of conduct that organizers can use for their conferences, meetups, and other events. We consulted the Ada Initiative’s and Geek Feminism’s guides for codes of conduct, and based our work on the existing R-Ladies code of conduct, which itself is based on Geek Feminism’s code of conduct.

We’d encourage all of you to review the proposed code of conduct and provide feedback/comments via pull request or an email to the mailing list

Speaker Diversity working group

Conferences are at their best when everyone participates. The Speaker Diversity group focuses on collecting and disseminating tips that can help conference and event organizers increase the diversity of their speaker lineup through conscious recruitment and retention. Through curated articles, conversations with event organizers and participants, and our collective experiences, we aim to tease out strategies that have worked in the past and highlight behaviors that can lead organizers away from achieving their speaker diversity goals.

Recently the group drafted an initial version of an article outlining five major area that organizers can focus on to help increase the diversity of their event’s speaker lineup. In short, M.A.P.S.S (Mission, Advertise, Pipeline, Selection, Sharing) outlines strategies that can be taken, at each phase of planning an event, to help achieve a diverse and inclusive speaker lineup. If you are interested in exploring how to increase speaker diversity at conferences and events, join in the conversation on the email list or follow our progress in the Github repository.

Conference best practices checklist

One challenge conference organizers highlighted is ensuring that while organizing a conference in good faith of trying to drive diversity and inclusivity, often some items were forgotten or overlooked.

This inspired the workstream for driving the Conference best practices checklist. The aim of this checklist is for it to be a non-exhaustive but illustrative reference system for organisers of conferences (and even events). Conference organisers can use this to help them build out a tasklist, stimulate conversation within their organising team, or publish their alignment with it to help speakers, attendees, and sponsors make an informed decision about whether to attend.

You should review the list and let us know via pull request or email of suggestions for improving this.

 

It’s great to see progress on bringing the greater R community together to create a welcoming space for everyone, no matter their race, ethnicity, gender, gender identity and expression, socio-economic status, nationality, citizenship, religion, sexual orientation, ability, or age. We welcome everyone to review the meetings and deliverables, and join the mailing list to further drive the conversation

What’s new with R Consortium funded projects in Q3 2018

By Blog, R Consortium Project

In an effort to provide greater transparency with respect to R Consortium activities, the ISC provides quarterly updates for all R Consortium funded projects. The following is our update for Q3 2018.

R-hub

Most of the work on R-hub was maintenance, i.e. updating software, fixing failures, updating SSL certificates.

R-Hub in 2018 January-June, some numbers:

  • 18 platforms, including Windows, Linux, MacOS and Solaris builders
  • 9098 builds
  • 843 different packages
  • 450 users

Detailed work and issue tracking can be found here and here.

PSI application for collaboration to create online R package validation repository

The AIMS SIG is planning to create an online repository for R package validation which could be used when R is used for pharmaceutical industry regulatory analysis. Since the main hurdle for widespread use of R in late phase trials is ensuring adequate validation documentation, AIMS is designing a framework which will specify a set of requirements, including metadata and examples of tests, which together would form evidence of the quality of an R package. Initially we will use dplyr as an example, and will make this “evidence of validation” available to the wider community on GitHub.

Whether the evidence provided is sufficient will be the decision of the end user, but it can be a starting point for further testing or may be sufficient in itself depending on the user’s attitude to risk. After review by our peers, we will be calling on all R users to submit similar evidence of validation for other packages. By sharing this evidence, we hope to reduce the amount of re-work being done by multiple companies eager to use R, but fearful of doing so in a regulatory environment without documentation of validation.

The AIMS SIG presented at the 2018 PSI Conference. The title was “The Future is heRe” aimed at demonstrating ways in which R can be useful in the pharmaceutical industry and promoting our plans for the online repository for the documentation of R validation.

AIMS also presented and held an R-validation brainstorming workshop at the R/Pharma Conference in August 2018. Volunteers from this meeting and the wider community will form a sub-group who will work towards the creation of the R-validation online repository. For latest details on the project please look here. If you are interested in contributing to this project please contact: Lyn Taylor at taylorlyn@prahs.com.

Ongoing infrastructural development for R on Windows and MacOS

Follow the project here.

Quantities for R

The r-quantities project has been completed. The four milestones of the project were published in the following blog posts:

1. A first working prototype
2. Support for units and errors parsing
3. An analysis of data wrangling operations with quantities
4. Prospects on fitting linear models with quantities

And along the way, there have been multiple exciting improvements, both in the ‘units’ and ‘errors’ packages, to support all these features and make ‘quantities’ possible, which is ready for an imminent CRAN release. This project ends, but the r-quantities GitHub organization will continue to thrive and to provide the best tools for quantity calculus to the R community.

histoRicalg — Preserving and Transferring Algorithmic Knowledge

histoRicalg has generated some interest and activity in its first few months, with a working group established and some activities begun concerning older algortihms that R may or may not be using. Our gitlab repository now has a variety of materials, and some are codified into vignettes is here. Our mailing list has grown, and new members welcome join the mailing list. .

R User Group Support program

The 2018 R Consortium R User Group Support Program has wrapped up for the year on September 30th. This year we have made 3 Array level grants, 17 Matrix level grants and 96 Vector level grants to make a total of 116 RUGs funded. 78 groups elected to participate in the meetup.com program where the R Consortium pays their meeup.com dues. Additionally, we funded 14 small conferences. In total, the RUGs program awarded more than $50,000 in grants.

stars: Scalable, spatiotemporal tidy arrays for R

Follow the project here.

Forwards Workshops for Women and Girls

A package development workshop for women was held at eRum 2018. Thanks to the R Consortium funding, travel scholarships were provided for 8 women that attended. The course used material developed for the first workshop run in Auckland last year and these materials are now available on GitHub under a CC-BY-NC 3.0 license. A short (1.5-2 hours) version was also developed and run at Cardiff satRday, with accompanying RStudio cloud instance (details in the materials on the GitHub site). This opens the possibility of running the workshops with minimal set-up at R-Ladies or other user groups.

The next workshop planned is for high school girls in NYC, to be held on Saturday 27 October. Further details are on the Education page of the Forwards website.

Refactoring and updating the SWIG R module

Two significant PRs have been sent to swig maintainers, details of which are discussed in the report pages below. These PRs address:

  • Rewriting the enumeration wrapping framework in the R module, allowing arbitrarily complex definition of underlying integer values
  • Rewriting/refactoring of the accessor wrapping framework for the R module, which eliminates the use of logic driven by function names from this part of the module

The latter eliminates bugs in which some accessors were incorrectly defined due to errors in the string logic, and others were incorrectly ignored due to names matching the patterns expected of automatically generated bindings. Look here for details.

Developing Tools and Templates for Teaching Materials

We have set up a GitHub organization that includes a repository to gather feedback and ideas  and a website to provide updates about the project. We started a small R package to extract, manipulate and modify the content of yml headers.

Joint profiling of native and R code

No new updates since Q2. First thing to do will be to improve documentation and perhaps record a screencast, to lower the entry barrier.

Maintaining DBI

Kirill Müller presented DBI in Berlin, Amsterdam, and Zurich. Here are the slides. The DBI package now has a CII badge. Stay tuned for the upcoming blog posts, the first in early October. All posts will be available here.

R Documentation Task Force

The Beta versions will be released soon.

Sat R Days

satRdays (satrdays.org) continue to grow and flourish. In the next three months, we’re having our first conferences in the US (dc2018.satrdays.org) and South America (santiago2018.satrdays.org) as well as another conference in the European region (belgrade2018.satrdays.org.

We’re discussing more events in Africa and growing the number of events per region next year. We’re hoping for at least 20 satRdays to run around the world.

Folks can now register interest on our events calendar and read about running an event on our knowledgebase (knowledgebase.satrdays.org).

We’re looking forward to supporting people across the world in sharing their R knowledge via accessible and diverse events.

Conference Management System for R Consortium Supported Conferences

We have completed and supported the deployment of a satRdays conference template that is open source and forkable for other events. It allows the integration of common ticketing platforms and call for paper solutions. The template is here.

Our next steps are about building a similar solution that is UseR! branded and incorporates sciencesconf to make it easier for academic events to spin up quickly too.

R-Ladies

Growth: R-Ladies growth is exceeding all the expectations. The goal for 2018 was to reach 65 chapters, and we are pleased to announce that, as per Oct 2018, we have already doubled that number: there are currently 131 chapters on meetup with ~31,500 R-Ladies members (signed up on meetup) and distributed in 41 countries and 131 cities in 6 continents. Additionally the R-Ladies remote, launched last quarter, is very active organising virtual events for all the R-Ladies far from any chapters.

Improving infrastructure: The majority of R-Ladies chapters have been migrated to our meetup pro account, few more are scheduled next quarter. This will help simplifying the reimbursement of chapter expenses.

R-Ladies is now a non-profit organization incorporated in California (USA) – application for tax exempt status is in progress.

An Earth data processing backend for testing and evaluating stars

Follow the project here.

Fall 2018: ISC Call for Proposals

By Announcement, Blog

by Joseph Rickert

The second and final ISC Call for Proposals for 2018 is now open. We are looking for ambitious projects that will contribute to the infrastructure of the R ecosystem and benefit large sections of the R community. We are deliberately being a little vague here, but having awarded more than $650,000 in grants so far, we can show a substantial number of funded projects that provide examples.

If you are going to submit a proposal, “Think Big” but structure your proposal with intermediate milestones. The ISC is not likely to fund proposals that ask for large initial cash grants. We tend to be conservative with initial grants, preferring projects structured in such a way that significant initial milestones can be achieved with modest amounts of cash.

As with any proposed project, the more detailed and credible the project plan, and the better the track record of the project team, the higher the likelihood of receiving funding. Please be sure that your proposal includes measurable objectives, intermediate milestones, a list of all team members who will contributing work and a detailed accounting of how the grant money will be spent.

Also, if you think you are onto something but could use some help in finalizing scope of a project, or you think implementing your idea would require achieving some level of consensus within the R Community, you might consider asking the ISC to help you establish a working group.

If you don’t think you have an idea that is fundable but want to get involved, you might want to explore getting involved with existing projects or put some thought into one of the perennial issues associated with finding one’s way through the R ecosystem. For example, could you build a package discovery system or recommender engine that spans CRAN, Bioconductor and GitHub, or implement and curate a calendar that automatically tracks R related events worldwide?

Our goal in calling for proposals is to stimulate creativity and help turn good ideas into tangible benefits for the R Community. What can you do to improve the R ecosystem and how can the R Consortium help you do it?

Note that proposals to sponsor conferences, workshops or meetups should be sent directly to the R Consortium’s R User Group and Small Conference Support Program. These are not funded as ISC proposals. Note that the deadline for applying for support under the 2018 program is coming up quickly. Requests for support under the 2018 program must be received by midnight, September 30, 2018. The 2019 program will launch sometime in January.

To submit a proposal for ISC funding, read the Call for Proposals page and submit a self-contained pdf using the online form. You should receive confirmation within 24 hours.

The deadline for submitting a proposal is midnight PST, Sunday October 31, 2018.

CII Best Practices Badge for R Packages – responding to concerns

By Blog, R Consortium Project

Our last post Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results summarized results from the CII Best Practices survey conducted this summer. A goal of the CII Best Practices program is to help improve open source software quality. Respondents shared several concerns to which David Wheeler, project lead for the Core Infrastructure Initiative (CII) with the Linux Foundation, and I wanted to respond.

Let’s dive in…

Concern #1: Does the CII Badge have the “correct” or “best” set of criteria?

The CII Badge criteria are the best general-purpose OSS project criteria that we, the OSS community, have developed to date.  The CII Badge criteria were developed based on the experience and recommendations of many experts, previous criteria developed by various organizations, and the examination of real-world successful OSS projects.  No doubt the criteria could be improved further, but the badge criteria are themselves open source and can be improved using the same process as any OSS code: simply propose changes for review!

Concern #2: Achieving a badge does not necessarily mean a given package is well-designed or well-implemented.

Many of the CII criteria can help push projects towards creating better- or well-designed and implemented packages.  In the “passing” level, the CII criteria include these requirements:

  • [warnings] criterion requires enabling compiler warning flags or similar
  • [static_analysis] requires the use of at least one static code analysis tool (assuming one exists)
  • [test] requires a test suite, which often nudges people towards better design and implementation
  • [test_policy] requires that you keep adding tests, especially as new functionality comes online
  • [know_secure_design] requires at least one primary developer know how to design secure software. The best practices site explains what this means under the “details” of this criterion.  In summary, this criterion requires that at least one primary developer understands the 8 principles of Saltzer and Schroeder (as explained by the CII Best Practices site) and also knowing to (1) limit the attack surface and (2) perform input validation with whitelists.  Software can be badly designed by knowledgeable people, but software is much more likely to be designed and implemented well if developers know the basics.
  • [know_common_errors] requires that at least one of the project’s primary developers must know of common kinds of errors that lead to vulnerabilities and at least one method to counter or mitigate each of them

Higher badge levels (“silver” and “gold”) offer even more.

It’s true that a badge doesn’t guarantee that a package is well-designed or implemented by some measure, but part of the problem is that it’s difficult to unambiguously determine if something is well-designed or well-implemented.  Much depends on the purpose of the package!  So instead, many criteria focus on enabling mass peer review and managing improvements, so that problems are more likely to be detected and corrected.

In short, software normally undergoes change over time.  Instead of requiring that a project be perfect at one point in time, we focus on criteria that will help projects continuously improve over the long run.

Concern #3: How does the CII help to ensure the validity of self-certification, e.g., through automated tools?

We use automated tools and reject some answers that are clearly false.  We require that replies be public and that there be URLs for some answers; that makes it easy for anyone to check answers.  In the worst case, we can override false answers, though in practice we’ve almost never found that necessary.

Concern #4: Even if every R package had a badge, the issue of finding a needed package among over 12K packages remains.

Finding a desired package or the “best” one for a given task is largely orthogonal to improving package quality, though the two can be related. The badging process can help, because one of the criteria is “The project website MUST succinctly describe what the software does (what problem does it solve?)”.  Search engines are much more effective at finding relevant packages once that kind of information is available. In another way, if using packages that state adherence to the CII criteria is important to you or your organization, the search space may be significantly reduced – at least as a starting point.

Concern #5: Can the CII criteria be streamlined to reflect only the needs of R packages, including those that are more data and documentation than code?

Our current primary approach for streamlining is to automate criteria.  That said, if you have a specific idea for streamlining things further, please file an issue on GitHub here.

Concern #6: Will automated tools be available for performing at least parts of the assessment, e.g., as found in R’s devtools?

We already use automated tools to assist in completing the form.  We’d rather not require people to install tools to fill in information, because that would be a barrier for some.  If there are tools we aren’t using and should use, let us know!

Concern #7: A badge program could penalize developers who do not have time, money, or skills to meet the criteria, making their packages less desirable if they do not achieve a badge.

We’ve worked hard to make the badge “passing” criteria doable for single-person projects.  Daniel Stenberg is the author and maintainer of cURL and libcurl, and he’s been especially influential in ensuring that the “passing” badge is doable for single-person projects.  If you have no tests, cannot automatically build your software (even though it requires building), or have never run a static analysis tool of any kind, then there is some work… but it’s better for users if these are addressed.

The top “gold” level requires multiple people in a project, e.g., because the project MUST have a “bus factor” of 2 or more.  That can be a challenge for developers, but it’s a big advantage for users – users would much rather depend on software where a single death doesn’t suddenly mean that there’s no one to update the software.  No one is required to get the gold level, however, and there are many ways to resolve this.

Concern #8: Introducing more process comes with additional burdens for package developers, perhaps reducing overall ecosystem participation.

We’ve done our best to minimize the risk from additional burdens.  We automate some answers, and that helps.  We reduce the risk of duplicated evaluation processes by having a single set of criteria for all OSS.  Perhaps most importantly: the criteria were developed by examining real-world successful projects, so they require actions that other projects are already doing and finding helpful.

Perhaps more importantly, keep in mind that getting a CII best practices badge is optional – a package author can decide if the benefits of adhering to the CII criteria outweigh the costs.

Concern #9: Is there a way to distinguish tests for validating statistical software numerical computations and statistical properties?

Sure.  Naming conventions for tests are a common way to distinguish types of tests; you can also put different kinds of tests in different directories.  From the badge perspective, we don’t focus on that distinction. For “passing” the key is that your project must have a general policy that as major new functionality is added to the software produced by the project, tests of that functionality should be added to an automated test suite.  Passing doesn’t require a perfect test suite; instead, we require that you have a test suite and that you’re committed to improving it. Since OSS is visible to the user community, a potential user may want to examine the type and quality of tests performed.  The higher-level badges do require better test suites, as you might expect.

 

We continue to receive valuable comments through the survey and are pleased to report that more R package authors are choosing to participate in the CII as evidenced by the surge in new R CII project entries.

What’s new with R Consortium funded projects in Q2 2018

By Blog, R Consortium Project

In an effort to provide greater transparency with respect to R Consortium activities, the ISC provides quarterly updates for all R Consortium funded projects. The following is our update for Q2 2018.

histoRicalg — Preserving and Transferring Algorithmic Knowledge

The HistoRicalg project is seeking participants — both active and in a review capacity — to help select issues in older algorithms that should be addressed.
We are setting up a working group to identify possible issues in older algorithms.
Some concerns have already been identified and we are starting to address them. See wiki for details.

Forwards Workshops for Women and Girls

foRwards is pleased to announce upcoming R workshops in Melbourne and Auckland. We thank the R Consortium for funding. See the GitHub page for details.

Code Coverage Tool for R

While the software development goals of this project have been achieved through the covr package re-released in summer 2017, we continue to make progress on the secondary goal to integrate package best practices into the R Community. We pursued this along two threads.

First, we conducted a survey on the understanding and use of open source licenses and their implications for the R Community. We blogged about the results here.

Second, we reviewed the Linux Foundation Core Infrastructure Initiative Best Practices Badge Program. Initially, we considered branching a version of the CII tailored to R, however, in further discussions, it appears the CII Best Practices Badge Program can be adopted as is. We are currently conducting a survey of the R Community soliciting feedback on having the CII Best Practices Badge Program be a recommended practice from the R Consortium, as well as to identify any necessary enhancements to the questionnaire.

Look here for refactoring and updating the SWIG R module

R User Group Support program

The RUGs program continues to enroll additional user groups. As of June 14th 87 groups are participating in the program: 2 Array level, 14 Matrix level and 71 Vector level. Additionally, we have sponsored 10 small conferences since the beginning of the program. The RUGs program will run through September 30, 2018.

stars: Scalable, spatiotemporal tidy arrays for R

The stars project is underway. Look here for details and to get involved and here for examples and reports.

Sat R Days

satRdays is growing amazingly well!

We’ve got 8 events coming up in the next year, with plans to add more events too. At satRdays we’re baking in a commitment to diversity and it’s going amazingly well. The most recent event in Cardiff, UK had 11 of 14 speakers coming from under-represented groups.

We’re looking for more folks who want to organise satRdays events, particularly outside of the Europe region. If any one is interested you can read our growing docs, including about what its all about at knowledgebase.satrdays.org and chat to us on the global R User Group leaders Slack bit.ly/ruglslack You can also come one board to help with central stuff like the website, building up the documentation, marketing, and supporting new event organisers.

Conference Management System for R Consortium Sponsored Conferences

One of the next big areas for us to think about the next wave of central funding from the Consortium and whether she should do anything to have an official entity for the central administration associated with these conferences.”
Conference Management System for R Consortium Supported Conferences “After making performing an extensive review, Odoo was identified as the most feature-rich platform for hosting events however, there were limitations that particularly impacted academic-oriented events.

We discussed the option of extending Odoo as it was based on Python, but it was felt that trying to make a one size fits all solution was not the best approach.

The alternative we discussed and will now move forward with is a flexible and minimal solution in Hugo, the basis for blogdown.

Our revised proposal will see rapid development and iteration using satRfdays as the test subject. This will mean other R events can leverage the solutions developed for satRdays and the technology will be proven by the next UseR! iteration

Quantities for R

The r-quantities project has reached the third milestone. The first prototype has been polished and aligned with recent developments in the units package. Efficient parsers have been implemented to read data with units and/or errors into quantities objects. The documentation has been extended to provide a comprehensive guide on working with quantities in two common data wrangling workflows. Further details about these developments can be found in the three articles published so far in r-spatial.org.

Proposal to Create an R Consortium Working Group Focused on US Census Data NA.

The working group held its first meeting on August 8th. If you are interested in getting involved, write to us at rconsortium-isc@lists.r-consortium.org

Ongoing infrastructural development for R on Windows and MacOS

Development of the new version of Rtools, and rebuilding of C++ libraries used by packages. We are now in the process of testing base-R and all CRAN packages with the new GCC 8.1 toolchain.

Developing Tools and Templates for Teaching Materials

We are in the planning phase at this stage. We’ll soon set up GitHub repository and website for more visibility, share planed features, progress, and give an opportunity for the community to provide feedback.

Joint profiling of native and R code

OS X support has been added and the main package has been renamed to jointprof to avoid a name clash with an existing package. Try it out, happy to take your feedback!

Maintaining DBI

The third DBI project is focused on technical and non-technical issues. We would like to present DBI at R meetups in Zurich and Berlin, and we have submitted a talk for the next satRday in Amsterdam. The renaming of duplicate columns in the output introduced in DBI 1.0.0 caused problems for RSQLite and will be reverted. The sqlr package by Nicolas Bennettaims at providing a backend-agnostic way to define the structure of a database, i.e., generate DML statements from R code similarly to SQLAlchemy for Python.

R-Ladies

Growth : The growth we saw at the start of 2018 has continued with now 25,000 R-Ladies (members signed up on meet-up). With 17 new groups in this quarter (5 in the US, 1 in Canada, 4 in Latin America, 3 in Europe, 2 in Australia, 1 in Asia and 1 Remote) increasing to more than 115 R-Ladies chapters worldwide (on meetup.com). Additionally a new R-Ladies remote was launched to allow R-Ladies far from a chapter/city to be involved in an R-Ladies group.
Improving infrastructure: Move to R-Ladies global meet-up Pro account to help align chapters expenses, new initiatives for Slack community in development.
Long term planning: Progress made on Charity set-up.
Supporting Rconsortium and RStudio along with R-Forward to improve the conference organisation and diversity requirements to make all future R conferences inclusive.

PSI application for collaboration to create online R package validation repository

The PSI AIMS SIG will lead the creation of an online repository / web portal, where validation which is of regulatory standard for R packages can be submitted and stored for free use. We will define a set of “Validation Criteria”, demonstrate it by applying it to the dplyr package, and then encourage contributions from R users to document validation of other packages and load them to the shared free access portal.

In June 2018, we attended the PSI Conference in Amsterdam to promote the idea and make contacts with potential future collaborators. Our next steps will be continue work on the Validation Criteria Framework engaging key opinion leaders at the R/Pharma conference in August. If you have experience in R validation and are interested in working with us on this project please contact taylorlyn@prahs.com.

R Documentation Task Force

Beta packages with limited functionality are being prepared for release.

 

Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results

By Blog, R Consortium Project

Core Infrastructure Initiative Best Practices logoBased on our Fall 2017 survey, where the R Consortium asked about opportunities, concerns, and issues facing the R community, the R Consortium conducted a new survey this past month to solicit feedback on using the Linux Foundation (LF) Core Infrastructure Initiative (CII) Best Practices Badge Program for R packages. With your feedback, the R Consortium will base its recommendation for using the CII.  Your feedback will also help us and the Linux Foundation evolve the CII with the needs of the R Community, and FLOSS projects in general, in mind.

Introduction

With over 12,000 R packages on CRAN alone, the choice of which package to use for a given task is challenging. While summary descriptions, documentation, download counts and word-of-mouth may help direct selection, a standard assessment of package quality can greatly help identify the suitability of a package for a given need – commercial, academic, or otherwise. Providing the R Community of package users an easily recognized badge indicating the level of quality achievement would make it easier for users to know the quality of a package along several dimensions. In addition, providing R package authors and maintainers a checklist of “best practices” can help guide package development and evolution, as well as help package users know what to look for in a package.

The R Consortium has been exploring the pros and cons of recommending that R package authors, contributors, and maintainers adopt the Linux Foundation (LF) Core Infrastructure Initiative (CII) “best practices” badge. This badge provides a means for Free/Libre and Open Source Software (FLOSS) projects to highlight to what extent package authors follow best software practices, while enabling individuals and enterprises to assess quickly a package’s strengths and weaknesses across a range of dimensions. The CII Best Practices Badge Program is a voluntary, self-certification, at no cost to submit a questionnaire and earn a badge. An easy to use web application guides users in the process, even automating some of the steps.

More information on the CII Best Practices Badging Program is available: criteria, is available on GitHubProject statisticscriteria statistics., and videos. The projects page shows participating projects and supports queries (e.g., you can see projects that have a passing badge).

What did we learn?

Will the CII Best Practices Badge Program provide value to the R Community’s package developers or package users? 90% of survey respondents say ‘yes’ with 77% saying it has benefit for both developers and users. Perhaps not surprisingly, 95% of respondents had never heard of the CII before, but 74% would be willing to try it. This is according to 41 respondents, 56% of whom have been developing R packages 4 years or more, and over 60% who have developed two or more packages.

Of the six categories covered by the CII – licensing, documentation, change control, software quality, security, code analysis – over 55% of respondents found all criteria to be somewhat or highly beneficial. Over 80% found documentation and software quality criteria to be somewhat or highly beneficial. The details are provided in the table below.

Table: Expected degree of benefit for each CII criteria category

Using an open ended question, we asked respondents why the CII is good for the R Community? Here is a summary of the responses. The CII…

  • helps users discover and select R packages that adhere to software development best practices.
  • shows R developers through the badge criteria what is possible or desirable for FLOSS, especially if developers do not have a software engineering background.
  • provides an additional degree of assurance to the user community around package quality as well as provide a way for developers to assert more formally that they follow such best practices.
  • gathers and presents lessons learned from other FLOSS projects so developers don’t need to re-discover them.
  • creates an incentive to adopt a consistent set of practices throughout the R ecosystem.

While respondents were generally very positive about the use of the CII, concerns did arise:

  • Does the CII Badge have the “correct” or “best” set of criteria?
  • Achieving a badge does not necessarily mean a given package well designed or implemented.
  • How does the CII help to ensure the validity of self-certification, e.g., through automated tools?
  • Even if every R package had a badge, the issue of finding a needed package among over 12K packages remains.
  • Can the CII criteria be streamlined to reflect only the needs of R packages, including those that are more data and documentation than code?
  • Will automated tools be available for performing at least parts of the assessment, e.g., as found in R’s devtools?
  • A badge program could penalize developers who do not have time, money, or skills to meet the criteria, making their packages less desirable if they do not achieve a badge.
  • Introducing more process comes with additional burdens for package developers, perhaps reducing overall ecosystem participation.
  • Is there a way to distinguish tests for validating statistical software numerical computations and statistical properties?

Suggestions from the respondents on how best to take advantage of the CII Badge Program include:

  • The CII should be sure to reflect the existing quality criteria provided through CRAN.
  • Integrate the CII with CRAN or Bioconductor, e.g., display badges on respective package CRAN pages to give CII more visibility and so that users can identify more easily which package to use.
  • Use the CII to encourage package developers to train themselves in best practices.
  • Develop an automatic framework that will create/enforce all the criteria whenever possible.
  • Make the security criterion conditional based on what the package does. If a package never goes outside the R session, does it need a dedicated security expert?
  • Require packages implementing a statistical method be backed up by a peer-reviewed article.
  • Make it easier to recognize which criteria categories passed and by what percentage in a high level visual representation, perhaps incorporated into the badge itself.
  • “Not use it at all, it creates false impressions and discriminates against good domain packages in disciplines that simply use software rather than seek rewards.”
  • “Encourage R-Core to adopt these practices for R itself. Also, loosen the approach to LICENSE files on CRAN so as to make compliance easier.”
  • Keep it simple.

As you can see, there is quite a range of sentiment expressed regarding introducing such a badging program. Some concerns seem to be based on misunderstandings, for example, the badging process does not require a “dedicated security expert,” and there already is some degree of automation in the process. The R Consortium is grateful to the respondents for taking the time to provide their insightful and thoughtful responses. We will continue to work with the CII team to explore addressing the issues raised above, including clarifying misunderstandings where we can do so.

Since initiating this survey, however, multiple package have already taken the plunge to try the CII badge program:

foghorn R package to summarize CRAN Check Results in the Terminal https://github.com/fmichonneau/foghorn
osrm Shortest Paths and Travel Time from OpenStreetMap with R https://rgeomatic.hypotheses.org/category/osrm
R_Matrix R package for Sparse and Dense Matrix Classes and Methods

A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, …

http://matrix.r-forge.r-project.org
base64enc R tools for base64 encoding https://github.com/s-u/base64enc
ggplot2 An implementation of the Grammar of Graphics in R https://ggplot2.tidyverse.org
covr Test coverage reports for R https://github.com/r-lib/covr
datastructures Implementation of core data structures for R. https://dirmeier.github.io/datastructures
madrid.air R package to parse air quality data published by http://datos.madrid.es/. https://github.com/nramon/madrid.air
pandas Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical… http://pandas.pydata.org
An R Package for Quick Uncertainty Intervals ciTools is an R package that makes working with model uncertainty as easy as possible. It gives the user easy access to confidence or prediction intervals… https://github.com/jthaman/ciTools
dodgr Distances on Directed Graphs in R https://ATFutures.github.io/dodgr
netReg Network-penalized generalized linear models in R and C++. https://dirmeier.github.io/netReg
DBI A database interface (DBI) definition for communication between R and RDBMSs http://dbi.r-dbi.org

 

If you’re a package developer, we hope you’ll join the package developers above and start your own CII Best Practices Badge. The survey will remain open to collect your feedback on the experience.

Interview R/Medicine 2018 conference organizer Micheal Kane

By Blog, Events

The first annual R/Medicine conference is being held September 7-8 in New Haven, CT, and is a collaboration between R Consortium and the Yale School of Public Health Biostatistics Department. As the first public activity of the R in Medicine working group, it’s set to be a key event to bring together the medical community that leverages R in medical research and clinical practice.

Leading the conference committee is Michael Kane, who is an Assistant Professor in Yale University’s Biostatistics Department. With the conference coming soon he agreed to answer a few questions about the R community and the conference itself.

Tell us about the medicine industry’s use of R?

Michael: My first exposure to medicine using R came from my internship at Revolution Computing (now Microsoft R). At the time most companies used SAS and Revolution had started providing validated versions of R and some packages, similar to SAS’s validation process, which would allow R to be used in submissions to the FDA. Since then, the rules have changed, and R sees a lot more use in this space because it provides inexpensive access to powerful tools for designing and analyzing health, genetic, and clinical data. 
We are currently using and developing these tools to find subtypes in immunotherapy studies for treating cancer. Patients can respond very differently to cancer therapies depending which stage of cancer they have, how many previous treatments they’ve had, and the diversity of the tumor environment. By understanding how factors like these are related to prognostic heterogeneity, we can do a better job prescribing people with cancer the most effective possible treatments.

What drove you to create an event to bring together the R medicine community?

Michael: This conference was inspired by R/Finance. The committee does a fantastic job of providing an entertaining and informative conference. The richness and diversity of the talk subjects show how vast finance is and, at the same time, the speakers and other attendees are completely accessible. We want to bring that same sense of inclusiveness and collaboration to medicine, where sometimes practices become siloed. We hope people realize that we, in medicine, are also part of a rich large and rich area of research and we hope the conference helps to jell the community.

What is the organizing committee’s goals and measures of success for this first event?

Michael: Our goal for the first year to better understand the community as a whole. We are expecting submissions from the clinical trials community, the genetics and omics community, and the epidemiology community. We are hoping we get submissions from both academia and industry. We want to see how people are using R to advance human health. I’ll consider the conference a success if attendees find at least one talk where they are surprised, entertained, and delighted by a use of R that hadn’t occurred to them.
Our other goal is to reinvest in the conference. If we are successful, and we are able to secure enough sponsorship, then we would like to make it easier for people to attend the conference. This would include providing more awards for travel, particularly for students.

How do you see working with R Consortium as critical for driving consensus and critical mass in the medicine community?

Michael: The R Consortium has become the umbrella for the entire R community. Their approval lets the community know that this is the conference to go to if you are using R in Medicine.
We thank Micheal for his time, and hope that if you are in the medical community using R that you look to attend this event.

R Consortium is soliciting your feedback on R package best practices

By Blog, R Consortium Project

With over 12,000 R packages on CRAN alone, the choice of which package to use for a given task is challenging. While summary descriptions, documentation, download counts and word-of-mouth may help direct selection, a standard assessment of package quality can greatly help identify the suitability of a package for a given (non-)commercial need. Providing the R Community of package users an easily recognized “badge” indicating the level of quality achievement will make it easier for users to know the quality of a package along several dimensions. In addition, providing R package authors and maintainers a checklist of “best practices” can help guide package development and evolution, as well as help package users as to what to look for in a package.

The R Consortium Infrastructure Steering Committee (ISC) is exploring the benefits of recommending that R package authors, contributors, and maintainers adopt the Linux Foundation (LF) Core Infrastructure Initiative (CII) Best Practices Badge. This badge provides a means for Free/Libre and Open Source Software (FLOSS) projects to highlight to what extent package authors follow best software practices, while enabling individuals and enterprises to assess quickly a package’s strengths and weaknesses across a range of dimensions. The CII Best Practices Badge Program is a voluntary, self-certification, at no cost. An easy to use web application guides users in the process.

More information on the CII Best Practices Badging Program is available: criteria, is available on GitHubProject statistics and criteria statistics. The projects page shows participating projects and supports queries (e.g., you can see projects that have a passing badge).

As a potential initiative for the R Community, we encourage community feedback on the CII for R packages. Also, consider going through the process for a package you authored or maintain. Your feedback will help us and the Linux Foundation evolve the CII to further benefit the needs of the R Community, and FLOSS projects in general. Please provide feedback using this survey.

On conduct and diversity in the R Community

By Announcement, Blog, Events, News, R Consortium Project

An explicit goal of the R Consortium is to help create a welcoming space for everyone, no matter their race, ethnicity, gender, gender identity and expression, socio-economic status, nationality, citizenship, religion, sexual orientation, ability, or age. Diversity and inclusion are essential to foster true collaboration, move ideas forward, and create long-term sustainable community.

R Consortium recently sponsored R/Finance 2018, where it was found that there were insufficient diversity and inclusion practices, including the absence of a prominently displayed Code of Conduct. This illuminated shortcomings with our existing processes for sponsoring conferences. We are troubled and disappointed to have sponsored a conference that does not reflect our core beliefs in diversity and inclusion.

The Infrastructure Steering Committee (ISC) has approved the creation of a new working group to address diversity and inclusion issues in the R community. The R Community Diversity and Inclusion Working Group (RCDI-WG), which will include members from R Community groups that promote diversity, such as R-Ladies and FORWARDS, event organizers, and key industry members, will focus on three areas:

  • Work with conferences organizers to ensure diversity is addressed as a priority in both their program committees and speaker lineups.
  • Establish recommended Code of Conduct and Diversity Guidelines for R Community events, which will be adopted by the R Consortium and required for any event that the R Consortium participates in.
  • Have an ongoing conversation on opportunities to drive diversity and inclusion across the R Community.

This group is open to any member of the R community, and you can join by signing up for the mailing list. The group plans to have a kickoff meeting soon to work on the Code of Conduct and Diversity Guidelines, with the goal to have them established later in summer 2018. Look for updates on progress on the R Consortium blog.