Category

Blog

R Community Explorer

By | Blog

by Ben Ubah, Claudia Vitolo and Rick Pack

Introduction

One of the most important qualities of the R Language is its thriving community. The R community has a reputation for being particularly friendly, welcoming and cohesive, which has enhanced its adoption and expansion. R user groups have accordingly flourished, especially in recent years.

In this year’s Google Summer of Code program, the proposal, “Data-Driven Exploration of the R Community” was selected. For this, the project’s developer, Ben Ubah, thanks the project’s mentors, Claudia Vitolo and Rick Pack for their contributions.

The primary motivation for this project was the need to have a consistent, data-driven, automated dashboard that provides a broad overview of global R User Groups and R-Ladies Groups.

The R Consortium and other stakeholders have invested in community expansion and sustenance initiatives like R-Ladies, R User Group Support (RUGS) program, Event Sponsorship, RCDI-WG and SatRdays.These promote the learning and adoption of R in many under-represented regions. They have also significantly enhanced community engagement.

As the R community has progressed, there does not appear to have arisen a way to track its global user groups’ inception and activity. Is there a way to find out which regions require more representation? How do we recognize the efforts of organizers who put in a lot of effort to organize events that sustain user groups? How do we easily locate and recognize the most active groups and perhaps learn from their successes? Could we somehow ascertain the impact of the initiatives set by the R Consortium and others on a global scale? Could there be a unified platform dedicated to exploring the R community in an open-ended curiosity-driven fashion? These were the thoughts that inspired this project.

While this project is in its infancy, we have started seeing some encouraging results after the first coding phase of Google Summer of Code. It is our hope to share with you what we have achieved so far and receive welcomed feedback, if you are so inclined.

R-Ladies Groups

Since the R Consortium first funded the R-Ladies initiative, there has been a sporadic diffusion of their chapters and members globally. Perhaps partially as a result of having a consistent leadership compositon and funding, R-Ladies groups are mostly managed on meetup.com, and share a common naming convention. This makes it quite easy to find them on meetup.com and explore their data from the meetup API.

Chart showing Growth of R-Ladies Groups over the years

In the first phase of Google Summer of Code, this project explored a way to track R-Ladies Groups globally from the meetup API, using the meetupr package developed by R-Ladies.

This exploration was intended to be completely data-driven, automated but rendered via a static dashboard that would be hosted via GitHub Pages. R-Ladies already have a shiny dashboard, which only runs on a Shiny Server. Inspired by that dashboard, we developed one with some useful differences such as faster loading, additional aesthetic features such as thematic coloring, and additional tabular displays, charts and counts.

What Has Been Achieved

For the R-Ladies dashboard, the following were achieved:

  1. We used the meetupr package to extract R-Ladies Chapters from Meetup.com
  2. Improved the existing find_groups() and get_events() functions in meetupr to meet our requirements
  3. Transformed the data from Meetup to required formats
  4. Persisted the data on GitHub
  5. Developed a static HTML dashboard interface based on open-source Bootstrap template.
  6. Rendered the persisted data via the dashboard interface.
  7. Automated the process
  8. Deployed it via GitHub Pages

The Tools We Used

To accomplish the following, we used a mix of the tools listed below:

  1. R, RStudio and the following packages: meetupr, curl, jsonlite and leafletR
  2. Javascript and the following libraries jquery.js, d3.js, echarts.js, leaflet.js and lodash.js
  3. Gentelella Admin Dashboard Bootstrap HTML template
  4. Travis CI to build the project, execute R scripts and bash commands
  5. Bash commands to call R scripts and commit modified files to GitHub

How We Achieved it

  1. We used the meetupr package to retrieve R-Ladies Groups from meetup.com with an R script.
  2. We further analyzed this data and computed several summaries out of it. We used the leafletR package to transform our data frame to GeoJSON. We used this GeoJSON file to create a leaflet map using leaflet.js. In this map, R-Ladies groups are separated into three groups with markers of three color categories: Active (purple), Inactive (dark-purple), and Unbegun (orange). Active groups have had an event in the past 180 days or have an upcoming event in the future. Inactive groups have not had an event in the past 180 days and do not have an upcoming event. Unbegun groups have not had an event in the past and none are planned for the future.
  3. Persisted all data and our summaries in CSV / JSON files. After each Travis build, the data and our summaries gets updated straight from the Meetup API.
  4. We wrote bash commands to run our R scripts, and commit updated CSV / JSON files to GitHub after every Travis build.
  5. We setup Travis Cron Jobs, to build this project daily and update our data.
  6. We then, customized the Gentelella Admin Dashboard Bootstrap HTML template to our requirements.
  7. Rendered our summaries via widgets on this dashboard. Used Javascript/libraries to perform other simpler summaries and produce maps, charts and tables.

The Result

At the end we have an open-source dynamic dashboard for R-Ladies that is updated daily, but is built to be static and hosted via GitHub Pages. This could be seen as another approach to building information dashboards with R as a back-end technology, maintaining separation of business data-processing from data-presentation.

At the time of writing, there are 165 R-Ladies chapters composed of 50,000 + members, across 47 countries, 162 cities, with more than 1,580past events and many upcoming. 71% of R-Ladies chapters are active, 13% are inactive, and 16% are unbegun. Unbegun groups have members but have not started organizing events yet. Our observation is that members are added to the R-Ladies community daily.

The pop-up markers in the leaflet map display important information about each R-Ladies chapter including a link to the group’s webpage, number of events, status, inactive months, and how to become an organizer for inactive/unbegun groups.

Feedback

We are just starting this project and are in hopes of expanding its reach far beyond its current state. We would love to hear from you if you have any ideas or find issues. Feel free to Follow / Star the project at its GitHub repo: https://github.com/benubah/r-community-explorer/

Next

We have started working on general R user-groups and plan to report our progress soon with some lessons we have learned.

ISC Project Status

By | Blog

Refactoring and updating the SWIG R module
Richard Beare
This project is complete. See the project page for a summary.

stars: Scalable, spatiotemporal tidy arrays for R
Edzer Pebesma
The stars package currently averaging approximately 9200 downloads per month, and user involvement through github issues is rising. Look here for project status, and If you are going to useR! 2019 look for the stars tutorial.

An Earth data processing backend for testing and evaluating stars
Edzer Pebesma
Active development takes place on an AWS instance that has access to the multi-petabyte Sentinel-2 satellite image archive.  See the project page for status.

histoRicalg — Preserving and Transfering Algorithmic Knowledge
John C. Nash
histoRicalg continues to try to preserve and transfer knowledge of older algorithms that are part of R and other computational software. Our work is available  here.

Recent presentations on the project include:

Some collaborations have resulted from the project:

  • Work with Matthew Fidler on merging two CRAN packages for L-BFGS optimization and some preliminary
  • Work on a 40 year old svd method being used by NASA contractors to model Jupiter’s magnetic field.

RUGS
Joseph Rickert
So far this year the RUGS Program has awarded grants to 50 user groups and 15 small conferences totaling $31,000. We now have over 42,000 members participating in groups associated with the RUGS program.

Validation Hub (formerly called PSI application for collaboration to create online R package validation repository)
Lyn Taylor
The R Validation Hub team are now focused on designing a framework which could be used to assess package risk.  The repository would host risk metrics, examples of tests, and validation documentation which together would form evidence of the quality of an R package.  This documentation would be free to access and stored on a web based portal. The first version of the website went live in 2019 and we are also on GitHub. If you would like to be involved with the project please contact psi.aims.r.validation@gmail.com.  Representatives of the R Validation Hub and PSI AIMS SIG will also be presenting at the 2019 PSI conference in London to give an update on this initiative and their work using R in the regulatory environment.

A unified platform for missing values methods and workflows
Imke Mayer
Our website which houses articles, tutorials, data sets and a first set of workflows around a small set of popular R packages went live in January.  We will continue to provide more tutorials, assistance in choosing and using existing R packages, and data sets as time goes on. To help make this project as robust as possible, we encourage authors to submit their articles or works, and reviewers to review their placement on the platform/website, either by contacting us via our website or by submitting changes directly in our GitHub repository

Our goal is to create a benchmark of existing methods for different kinds of data (both synthetic and real data), missing values mechanisms, tasks to be fulfilled, etc. An important aspect is that this work should allow other researches and data scientists to re-use/copy our R code to compare their own method to a maximum of existing methods without having to re-implement the comparisons every time themselves.

Ongoing infrastructural development for R on Windows and MacOS
Jeroen Ooms
Rtools has been updated to GCC 8.3, and several new c/c++ libraries added to the rtools-packages repository. The new toolchain was presented at rstudio::conf 2019.

R Hub
Gábor Csárdi
The R Hub project is in maintenance mode. You can follow activities on GitHub.

Conference Management System for R Consortium Supported Conferences
Steph Locke
We have delivered a Hugo template for UseR! events and a template for creating new SVG logos for the UseR! events

R Ladies
Claudia Vitolo
We are pleased to announce that R-Ladies is now a non-profit organisation incorporated in California (United States) with 501(3)(c) tax-exempt status (a blog post will announce this publicly in the next few days). Since we can now accept donations, we have decided to join CommunityBridge, a Linux Foundation platform that allows for a transparent and traceable management of incoming donations and outgoing expenses.

The R-Ladies is continuing to grow. As per March 2019, there are 142 chapters on meetup.com with 38,000+ R-Ladies members (signed up on meetup.com) and distributed in 45 countries in 6 continents (see our shiny dashboard: ). We have also recently launched a mentorship programme to provide help and support from experienced organisers to less experienced ones. The migration to the meetup.com pro account is progressing smoothly: the last group will be migrated in 2019-Q2.

Developing Tools and Templates for Teaching Materials
François Michonneau
We developed the alpha version of the R package checker to validate links and images in static websites such as Rmarkdown and jekyll.  In addition to ensuring that there is no broken links, this package will encourage website authors to use best practices in accessibility by adding the metadata to links and images so they can be processed by screen-readers and other assistive technologies. We are starting to use this package to check some of The Carpentries lessons.

Future Minimal API: Specification with Backend Conformance Test Suite
Henrik Bengtsson
The R package future.tests has made it possible to add support for relaying messages and warnings in the future framework (a frequently requested feature) and release it in a non-breaking manner.

Strengthening of R in support of spatial data infrastructures management : geometa and ows4R R packages
Emmanuel Blondel
Milestones M1 to M5 were successfully delivered. These are identified in Github tickets with labels for each milestone.

M1 targeted provision of an INSPIRE metadata validator embedded into the geometa package. This feature has been tested by data managers in France by the French observatory for universe science, , French CNRS Research units Dynafor , and LETG .

M2 targeted the support of multi-lingual metadata encoding/decoding in geometa. All existing geometa classes subject to internationalization have been extended to support multi-language. A battery of tests has been added in all class test files. In addition to geometa, such new feature required intervention in subsequent packages ows4R and geonapi for the publication of multi-language metadata documents. An online documentation has been made available  here.

M3 provides a generic metadata converter was planned to be delivered this month of april 2019.

M4 provides Adapter NetCDF-CF core metadata) and

M5  provides Adapter for EML core metadata

 Online documentation has been made available on GitHub.

These features are being used by the IRD Marbec Research Unit and DynaFor..

Work has started for the milestones M6 and M7 which tend tackle and complete the coverage of ISO/OGC 19115-1 and 19115-2 standards in geometa by adding all missing classes (planned for completion for the summer 2019).

 

 

R Consortium Announces Event Sponsorships for 2019

By | Blog

The R Consortium is committed to the R Community. We support R projects, meetups and events, via grants and sponsorships. Over the last four years, the R Consortium has given more than $125,000 in support of R events both large and small.  We are excited to announce the events we are sponsoring in 2019.

This year we wanted to support a few events in large metro areas with active groups, a mix of geographies, and finally industries that are up and coming.  A big thanks to all the amazing R event organizers who are all working to promote, improve, and grow the R language and community.

2019 Sponsorship funding goes to:

deRSE19, a conference for research software developers in Germany, is taking place June 4-5 at the Albert Einstein Science Park in Potsdam. #deRSE19 welcomes scientists, but also people who finance, operate, develop, or maintain research software and do not usually attend conferences.

Cascadia R Conference, is in its third year, takes place on June 8th and serves the Pacific Northwest region of Oregon, Washington, and Vancouver BC. This event is the place to come together in the Pacific Northwest to discuss how people are solving everyday problems with the R language. Stay tuned for speaker announcements and follow them on twitter @cascadiarconf.

BioConductor is a conference focused on providing insights and tools required for the analysis and comprehension of high-throughput genomic data. The event takes place in New York City June 24-27. Speakers include Rob Patro,Jeffrey Leek, Elli Papaemmanuil, Simina Boca, Lieven Clement, Lihua Julie Zhu, Anshul Kundaje. Follow all the action on Twitter at #bioc2019.

UseR Toulouse This global event, July 9-12, in Toulouse, is the largest meeting of the R user and developer community. The program consists of both invited and user-contributed presentations. Invited keynote lectures cover a broad spectrum of topics ranging from technical and R-related computing issues to general statistical topics of current interest. Keynote speakers include Joe Cheng, CTO, RStudio, Julien Cornebise, Director of Research at Element AI (UK), Bettina Grün Professor, Johannes Kepler Universität Linz (Austria), Julie Josse Professor, École Polytechnique (France) among others. In addition, R Consortium’s own Joe Rickert will be giving a talk on high-profile meetup groups and the work they are delivering. Follow the event on Twitter @UseR2019_Conf

EARL Conference The Enterprise Applications of the R Language Conference (EARL) is a cross-sector conference focusing on the commercial use of the R programming language and takes place in London, on September 10-12. The conference is dedicated to the real-world usage of R with some of the world’s leading practitioners. Workshops for 2019 include Shiny for Production, Deep Learning with Keras for R, and Package Development in R among others. Check the website for updates on speakers or join the mailing list or follow them on Twitter @earlconf ‏.

R/Medicine  The goal of the R/Medicine conference is to promote the use of the R programming environment and ecosystem in medical research and clinical practice. The event takes place September 12-14, 2019, New Haven, CT. Topic areas for R/Medicine include clinical trial design, the analysis of clinical trial data, personalized medicine, the analysis of patient records, the analysis of genetic data, the visualization of medical data, and reproducible research. For more information follow them on Twitter @r_medicine.

satRday Chicago, a brand new event, is a community-led, regional conference to support collaboration, networking, and innovation within the R community. Tracks for the event ranged from academic and civic applications to industry applications, upskilling reproducibility, statistical methodology and more.

New York R Conference united R enthusiasts and data scientists to explore, share, and inspire ideas. This year’s event covered a wide variety of R language topics from Machine Learning in R to GIS, to tidyverse and beyond by some of the best-known data scientists in the community including Andrew Gelman, Emily Robinson, Namita Nandakumar, Max Kuhn, Wes McKinney, Soumya Kalra, David Madigan. For more about the community visit their website at nyhackr.org, follow them on Twitter at @nyhackr and @rstatsnyc.

While our funding efforts are complete for 2019, we encourage the community to continue to share feedback on Twitter @Rconsortium about R events you’d like to see supported in the future. Let us know what conferences are important to you so we can continue to improve our processes and support for the community.

Census Academy Launches with Two R Courses

By | Blog

by Ari Lamstein

Ari Lamstein is an independent consultant and organizer of the Census Working Group.

The US Census Bureau recently launched Census Academy, an online platform focused on training the public to learn about Census data. R Enthusiasts will be excited to learn that Census Academy has launched with two R-specific courses:

If you have an interest in using R to analyze US Census Data, then, in addition to the above courses, you might also want to read A Guide to Working with Census Data in R. The Guide summarizes the most popular datasets that the Census Bureau publishes, as well as the most popular R packages for working with Census Data.

A Guide to Working with Census Data in R was created as part of the R Consortium’s Census Working Group, which you can learn more about here.

CII Best Practices – R Package Leaderboard

By | Blog

Since my last post on the Core Infrastructure Initiative CII Best Practices Badge for R Packages – responding to concerns, there have been many R language projects started – and completed – on the CII Best Practices site. In this post, we recognize the R projects that have achieved the CII Best Practices – Passing level, and note that several are well on their way to achieving silver level.  In all, there are more than 50 CII projects related to R packages, with the popular ggplot2 package at the cusp of joining the group below with 97% completion as of this post.

Please congratulate these package owners for their achievement. If you’re a package developer, consider adding your package to the CII Best Practices ranks, and work your way through the levels of passing, silver, and gold.

Id Name Description Owner
265 madrid.air Parse air quality data published by http://datos.madrid.es/ Ramón Novoa
1882 DBI A database interface (DBI) definition for communication between R and RDBMSs Kirill Müller
2011 Delaporte Provides the probability mass, distribution, quantile, random variate generation, and method of moments parameter estimation Avraham Adler
2022 lamw Calculates the real-valued branches of the Lambert-W function Avraham Adler
2033 pade Returns the numerator and denominator when given a vector of Taylor series coefficients of sufficient length as input Avraham Adler
2041 fixedWidth Save fixed width files Jeston
2053 DataExplorer Simplified Exploratory Data Analysis Boxuan Cui
2054 PKNCA Perform all noncompartmental analysis (NCA) calculations for pharmacokinetic (PK) data Bill Denney
2055 BAS Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling Merlise Clyde
2083 MortalityLaws Fit and compare the most popular human mortality laws MariusD. Pascariu
2135 drake A general-purpose workflow manager for data-driven tasks in R that rebuilds intermediate data objects when their dependencies Will Landau
2136 httptest A Test Environment for HTTP Requests in R Neal Richardson
468 busdater Business dates for R Mick Mioduszewski
2527 jtools Summarize and visualize regressions with other helpful tools Jacob Long

Package licensing and enterprise use

By | Blog

For enterprise users of R, licensing terms of open source software can occupy a significant share of Legal and Corporate Architecture departments time. In Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results, one survey topic touched on the licensing of R packages. In talking with various enterprise users of R, there were a few suggestions about how the R community could make leveraging R packages easier within enterprises, while allowing Legal and Corporate Architecture departments to get more sleep.

Getting approvals to use packages

Some of you may be familiar with the process that enterprise users of R packages go through for approvals to use R in their products. Third party software often needs to go through legal reviews, corporate architectural reviews, security reviews, and line of business approvals before they can find their way into use within an enterprise or in products that they produce.

One area of concern is the use of GPL licenses, and the potential impact they may have on proprietary software. See Why GPL still gives enterprises the jitters for more discussion. While there are varying debates about the true impact of a certain license designation, for example, GPL–2 versus GPL–3, in many large organizations, a more conservative interpretation is often applied. (Comparing license options.)

Perhaps less known, is that it’s not just the license of the package in question, but all of its dependent packages, recursively. For example, is a GPL–3 licensed package using a GPL–2 license package validly designated?

What can we do?

When we ask representatives of enterprises who are responsible for approving the use of third-party software what would make their easier, a few suggestions for package authors and maintainers arise concerning licensing:

  • Packages should not depend on other packages that have incompatibly licensed materials
  • Use the most permissive license possible for your package, for example, LGPL, GPL–3 or GPL>=2, as opposed to just GPL–2
  • Minimize the number of dependent packages whenever possible, since each one requires its own approval process which affects adoption
  • Avoid using packages with more restrictive licensing terms than you intend for your package

We encourage package authors and maintainers to review their dependent packages and look for opportunities to address the suggestions above. Where possible, encourage dependent package authors and maintainers to adopt more permissive licenses as well. Where not possible, ask whether the functionality provided by the dependent package is essential.

For enterprise users of open source software, ask your Legal departments to share their concerns with developers so more informed choices can be made in the future.

2019 Update One: R Consortium and ISC Announce the Newest Funded Projects for the R Community

By | Announcement, Blog

We are excited to announce a wide and diverse group of new R Consortium funded projects. If you are interested in finding out more about these projects, connect with the project owners via links provided below each project. 

New Projects include:

Strengthening of R in support of spatial data infrastructures management

Project Owner: Emmanuel Blondel

The project aims to strengthen the role of R in support of Spatial Data Infrastructures (SDI) management, through major enhancements of the geometa R package which offers tools for reading and writing ISO/OGC geographic metadata, including ISO 19115, 19110, and 19119 through the ISO 19139 XML format. This also extends to the Geographic Markup Language (GML – ISO 19136) used for describing geographic data. The use of geometa in combination with publication tools such as ows4R and geosapi fosters the use of R software to ease the management and publication of metadata documents and related datasets in web catalogues, and then allows to move forward with a real R implementation of spatial data management plans based on FAIR (Findable, Accessible Interoperable and Reusable) principles.

The work plan includes several activities such as working on the completeness of the ISO 19115 (ISO 19115-1 and 19115-2) data model in geometa, functions to read/write multilingual metadata documents, and an increased metadata validation capability with a validator targeting the EU INSPIRE directive. Finally, functions will be made available to convert between geometa ISO/OGC metadata objects and other known metadata objects such as NetCDF-CF and EML (Ecological Metadata Language) to foster metadata interoperability. By providing these R tools, we seek to facilitate the work of spatial data (GIS) managers, but also data scientists, whatever the thematic domain, whose daily tasks consist in handling data, describing them with metadata and publishing datasets

Learn more about the project here

Catalyzing R-hub Adoption Through R Package Developer Advocacy

Project Owner:  Maëlle Salmon,

After the continuing technical progress of R-hub over the last two years, this project aims at
catalyzing its adoption by R package developers of all levels through developer advocacy. Indeed, R-hub is currently a successful and very valuable project, but it is not documented thoroughly, which hinders its wider adoption by package developers. This project shall answer this concern by three main actions: improving R-hub documentation, making R-hub
better known in the community and making the R-hub web site more attractive to, and easier to use by, R developers and users via the ingestion of METACRAN services and the creation of a R-hub blog.

Learn more about the project here

 

Licensing R – Guidelines and Tools

Project Owner: Colin Fay

Licensing is a vital part of Open Source. It provides guidelines for interacting with a program, and for making code accessible and reusable (or not). It provides a way to make code open source, in a way one wants to share it, protecting how it will be used and reused. Licensing is also challenging and complex: there are a lot of available licenses, and the choice is influenced by how you import and interact with elements from other packages and/or programs.

With this project, we propose to explore and document the current state of open source licenses in R, and to decipher compatibility and incompatibility elements inside these licenses, to help developers chose the best suited license for their project.
Screen reader support enabled.

Learn more about the project here and here.

 

Data-Driven Discovery and Tracking of  R Consortium Activities

Project Owner: Benaiah Chibuokem Ubah

This project proposes an infrastructure that provides a data-driven approach to render the yearly activities of the R Consortium, by deploying web pages for discovering and tracking ISC Funded Projects, RUGS and Marketing activities. These pages are planned to appear like dashboards summarizing activities in interactive tables and charts, presenting several views, trends and insights to what R Consortium has achieved over time. The project hopes that presenting these achievements in a data-driven manner to the R community, the data science community and prospective R Consortium members will promote greater transparency, productivity and community inclusiveness around R Consortium activities. Screen reader support enabled.

Learn more about the project here.

 

serveRless

Project Owners: Christoph Bodner, Florian Schwendinger, Thomas Laber

R is a great language for rapid prototyping and experimentation, but putting an R model in production is still more complex and time-consuming than it needs to be. With the growing popularity of serverless computing frameworks such as AWS Lambda and Azure Functions we see a huge chance to allow R developers to more easily deploy their code into production. We want to create an R package that provides a common API for different Function-as-a-Service providers such as Azure Functions and AWS Lambda.  We will also look into integrating Docker-as-a-Service (e.g. Azure Container Services) if appropriate. Our main goal is to build a user-friendly cloud agnostic wrapper that can be extended to include additional cloud providers later on. We want to build on the work already done for deploying R functions to AWS Lambda by Philipp Schirmer and on the work already done by Neal Fultz and Gergely Daróczi on a gRPC client/server for R, which is necessary for Azure Functions.

If you like our idea and want to help us, feel free to reach out to us on Github here

 

Next-Generation Text Layout in Grid and ggplot2

Project Owner: Claus Wilke

Text is a key component of any data visualization. We need to label axes and legends, we need to annotate or highlight specific data points, and we need to provide plot titles and captions. The R graphics package ggplot2 provides numerous features to customize the labeling and annotation of plots, but ultimately it is limited by the current capabilities of the underlying graphics library it uses, grid. Grid can draw simple text strings or mathematical expressions (via plotmath) in different colors, sizes, and fonts. However, it lacks functionality for changing formatting within a string (e.g., draw a single word in italics or in a different color), and it also cannot draw text boxes, where the text is enclosed in a box with defined margins, padding, or background color. This project will support the development of a new package, gridtext, that will alleviate these text formatting limitations. The project will also support efforts to make these new capabilities available from within ggplot2.

Learn more about the project here

 

Symbolic Formulae for Linear Mixed Models

Project Owner: Emi Tanaka

Symbolic model formulae define the structural component of a statistical model in an easier and often more accessible terms for practitioners. The earlier instance of symbolic model formulae for linear models was applied in Genstat with further generalization by Wilkinson and Rogers (1973). Chambers and Hastie (1993) describe the symbolic model formulae implementation for linear models in the S language which remains much the same in the R language (Venables et al. 2018).

Linear mixed models (LMMs) are widely used across many disciplines (e.g. ecology, psychology, agriculture, finance etc) due to its flexibility to model complex, correlated structures in the data. While the symbolic formula of linear models generally have a consistent representation and evaluation rule as implemented in stats::formula, this is not the case for LMMs. The inconsistency of symbolic formulae arises mainly in the representation of random effects, with the additional need to specify the variance-covariance structure of the random effects as well as structure of the associated model matrix that governs how the random effects are mapped to (groups of) the observational units. The differences give rise to confusion of equivalent model specification in different R-packages.

The lack of consistency in symbolic formula and model representation across mixed model software motivates the need to formulate a unified symbolic model formulae for LMMs with: (1) extension of the evaluation rules described in Wilkinson and Rogers (1973); and (2) ease of comprehension of the specified model for the user. This symbolic model formulae can be a basis for creating a common API to mixed models with wrappers to popular mixed model R-packages, thereby achieving a similar feat to parsnip R-package (Kuhn 2018) which implements a tidy unified interface to many predictive modeling functions (e.g. random forest, logistic regression, survival models etc).

We would like to find out what are your experiences with fitting linear mixed model in R! Please fill out this survey to help us understand your problems.

Learn more about the project here

 

Editorial Assistance for the R Journal

Project Owner: Di Cook  

This project supports the operation of the R Journal. There are two aspects, one is to fund an editorial assistant to send reminders about reviews, and assist with typesetting and copyediting issues. The second part is to explore updating the technical operations of the journal production.

Learn more about the project here

ISC Call for Proposals

By | Announcement, Blog

The March 2019 ISC Call for Proposals is now open. Once again, we are looking for ambitious projects that will contribute to the infrastructure of the R ecosystem and benefit large sections of the R community.  

Our goal in calling for proposals is to stimulate creativity and help turn good ideas into tangible benefits for the R Community. What can you do to improve the R ecosystem and how can the R Consortium help you do it?

We encourage you to “Think Big” but structure your proposal with intermediate milestones. The ISC is most likely to fund proposals that ask for modest initial grants. We tend to be conservative with initial grants, preferring projects structured in way that significant early milestones can be achieved with a modest amount of financial support.

As with any proposed project, the more detailed and credible the project plan and the better the track record of the project team, the higher the likelihood of receiving funding. Please be sure that your proposal includes the following:

  • Measurable objectives
  • Intermediate milestones
  • A list of all team members who will contributing work
  • Detailed accounting of how the grant money will be spent

You may find that reviewing some previously funded projects will help stimulate your thinking. Notice that not all projects require software development. The Guide to using Census Data and the Missing Data Task View are work products from recent ISC funded projects that focused on documentation.  

If you are really thinking big, consider proposing an ambitious project such as the R Validation Hub, or the R / Pharma and R / Medicine conferences that are funded and organized as ISC working groups.

Please note that proposals to sponsor conferences, workshops or meetups should be sent directly to the R Consortium’s R User Group and Small Conference Support Program, or the R Consortium Marketing Committee.

To submit a proposal for ISC funding, read the Call for Proposals page and submit a self-contained pdf using the online form. You should receive confirmation within 24 hours.

The deadline for submitting a proposal is midnight PST, April 1, 2019.

RC RUGS 2019 Is Up and Running

By | Announcement, Blog, Uncategorized

The R Consortium’s 2019 R User Group and Small Conference Support program which provides cash grants to R focused user groups, and small R-themed conferences is now accepting applications for financial support.

R User Groups

Grants to R user groups are awarded in three categories that depend on the number of users who typically attend meetings, and the frequency with which the group meets.

Array Level: Large, established R user groups that held at least three meetings in the six month period prior to applying that attracted more than 100 attendees may be eligible for $1,000 grants.

Matrix Level: R user groups that held at least three meetings in the six month period prior to applying that attracted at least 50 attendees may be eligible for $500 grants.

Vector Level: Other groups, even very small ones just getting started, may be eligible for $150  grants.

In addition to the cash grants, R user groups accepted into the program are eligible to participate in the R Consortium’s meetup.com Pro program. Under this program, the R Consortium will pay a user group’s meetup.com dues for twelve months.

Small Conferences

Small conferences, typically those that expect to attract less than 200 people may apply for cash grants up to $1,000. To qualify, a conference must be either entirely devoted to the R language or applications using R, or have a significant amount of R content. To apply, conferences should have a public-facing web page with a code of conduct, information about the technical program and sponsorship information. Conferences will be evaluated, and grants awarded on a case-by-case basis.

Details for RUGS, meetup.com Pro and Small Conference programs may be found here on the R Consortium website. To apply for support, please use the online form.

 

R-users & Community: give us your feedback on a R Certification to teach & verify skilled R Professionals

By | Announcement, Blog

In the past few years, we have seen an increase in the demand for R – both from employers looking for skilled R-users and professionals looking to further improve their skills. Due to this supply and demand gap, there have been various teaching channels created in an attempt to extend knowledge of the language. Even with the abundance of R teaching material, we still face a dearth of qualified, skilled R users. The inability to differentiate self-taught data scientists from qualified personnel creates confusion for employers and difficulties for quality professionals to separate themselves from the rest.

R Consortium started a working group that has identified an absence of a system to certify qualified R professionals as a cause for this problem. As a response to this, the group is working to create a certification for R that will allow professionals and students to acquire fundamental skills and knowledge of the language. Creation of this certification also aims to help recruiters identify and assess the skills of potential recruits. This group will be driven by the needs of the current R professionals and data science recruiters. More information about this initiative can be found here.

In order for this working group to create a valuable certification, we encourage community feedback in this initiative. Your feedback will help the working group to evolve this certification to best serve the needs of the R community. Please respond to this survey to help in the creation of this certification.