Skip to main content
Category

Blog

Call for Proposals

By Blog, R Consortium Project

by Hadley Wickham

The infrastructure Steering Committee (ISC) is pleased to announce that the committee is now ready to accept proposals for the first round of funding in 2017. The ISC is broadly interested in projects that will make a difference to the R community. Don’t be afraid to think big! We have the budget to fund ambitious projects and we want to fund infrastructure that can help large segments of the R community.

Infrastructure includes:

  • Ambitious technical projects (like R-hub), which require dedicated
    time to supply infrastructure that is currently missing in the R
    ecosystem.
  • Community projects (like R-ladies and SatRdays), which help catalyse
    and support the growth of the R community around the world.
  • Smaller projects to develop packages (like DBI and sf), which
    provide key infrastructure used by thousands of R programmers.

The deadline for submitting a proposal is midnight PST, Friday February 10, 2017. For the mechanics of submitting a proposal and some guidance on how to write a good proposal see the Call for Proposals Webpage. Also, if you have ideas for projects, but you’re not sure you have the skills to do them yourself, file an issue with your idea on the wish list that the R Consortium maintains on GitHub.

 

Halfway through “Improving DBI”

By Blog, R Consortium Project, R Language

by Kirill Müller

In early 2016 the R Consortium partially accepted my “Improving DBI” proposal. An important part is the design and implementation of a testable DBI specification. Initially I also proposed to make three DBI backends to open-source databases engines (RSQLite, RMySQL, and RPostgres) compatible to the new DBI specification, but funding allows to work on only one DBI backend. I chose RSQLite for a number of reasons:

  • It is a very important package, judging by the number of reverse CRAN and Bioconductor dependencies
  • It’s easy to work with, because everything (including the database engine) is bundled with the package
  • It seemed to be the most advanced package, closest to the (yet to be completed) DBI specification
  • An informal Twitter poll supports this decision by a tiny margin

The project has reached an important milestone, with the release of RSQLite 1.1. This post reports the progress achieved so far, and outlines the next steps.

RSQLite

While the RSQLite API has changed very little (hence the minor version update), it includes a complete rewrite of the original 1.0.0 sources in C++. This has considerably simplified the code, which makes future maintenance easier, and allows us to take advantage of the more sophisticated memory management tools available in Rcpp, which help protect against memory leaks and crashes.

RSQLite 1.1 brings a number of improvements:

  • New strategy for prepared queries: Create a prepared query with dbSendQuery() or dbSendStatement() and bind values with dbBind(). This allows you to efficiently re-execute the same query/statement with different parameter values iteratively (by calling dbBind() several times) or in a batch (by calling dbBind() once with a data-frame-like object).
  • Support for inline parametrised queries via the param argument to dbSendQuery(), dbGetQuery(), dbSendStatement() and dbExecute(), to protect from SQL injection.
  • The existing methods dbSendPreparedQuery() and dbGetPreparedQuery() have been soft-deprecated, because the new API is more versatile, more consistent and stricter about parameter validation.
  • Using UTF8 for queries and parameters: this mean that non-English data should just work without any additional intervention.
  • Improved mapping between SQLite’s cell-types and R’s column-types.

See the release notes for further changes.

The rewrite was implemented by Hadley Wickham before the “Improving DBI” project started, and has been available for a long time on GitHub. Nevertheless, the CRAN release has proven much more challenging than anticipated, because so many CRAN and Bioconductor packages import it. (Maintainers of reverse dependencies might remember multiple e-mails where I was threatening to release RSQLite “for real”.) My aim was to break as little existing code as possible. After numerous rounds of revdep-checking and improving RSQLite, I’m proud to report that the vast majority of reverse dependencies pass their checks just as well (and as quickly!) as they did with v1.0.0. Most tests from v1.0.0 are still present in the current codebase. This means that non-packaged code also has a good chance to work unchanged. I’m happy to work with package maintainers or users whose code breaks after the update.

DBI

I have also released several DBI updates to CRAN, mostly to introduce new generics such as dbBind() (for parametrized/prepared queries) or dbSendStatement() and dbExecute() (for statements which don’t return data). The definition of a formal DBI specification is part of the project, a formatted version is updated continuously.

DBItest

In addition to the textual specification in the DBI package, the DBItest package provides backend independent tests for DBI packages. It can be easily used by package authors to ensure that they follow the DBI specification. This is important because it allows you to take code that works with one DBI backend and easily switch to a different backend (providing that they both support the same SQL dialect). Literate programming techniques using advanced features of roxygen2 help keeping both code and textual specifications in close proximity, so that amendments to the text can be easily tracked back to changes of the test code, and vice versa.

Next steps

The rest of the project will focus on finalizing the specification in both code and text (mostly discussed on GitHub in the issue trackers for the DBI and DBItest projects). At least one new helper package (to handle 64-bit integer types) will be created, and DBI, DBItest, and RSQLite will see yet another release: The first two will finalize the DBI specification, and RSQLite will fully conform to it.

The development happens entirely on GitHub in repositories of the rstats-db organization. Feel free to try out development versions of the packages found there, and to report any problems or ideas at the issue trackers.

 

RL10N hits its first milestone

By Blog

by Richard Cotton and Thomas Leeper

richie_logo

R is gradually taking over the world (of data analysis).  However, proficiency in English remains a prerequisite for effectively working with R.  While R has a system for translating messages, warnings, and error messages into other languages, very few packages take advantage of this functionality.

Part of the problem is that it currently takes a lot of effort to create translations.  There are a few issues that the RL10N project aims to address. Firstly, the functionality contained in the tools package isn’t particularly easy to work with. Secondly, finding translators can be difficult. RL10N aims to solve both of these problems.

The project has reached its first milestone, having released the poio package to CRAN. Translations of messages are stored in .pot master translation and .po language-specific translation files that are understood by the GNU gettext utility. poio provides functionality to read and write this file format.

Setting up Translations

The workflow to create translation infrastructure for a package is now reasonably straightforward.

First, a .pot master translation file is created using the xgettext2pot from the tools package. The .pot file contains a few lines of metadata, consisting of name-value pairs.

"Project-Id-Version: R 3.3.1\n"
"Report-Msgid-Bugs-To: bugs.r-project.org\n"
"POT-Creation-Date: 2016-11-06 17:19\n"
...

After this, it contains message ID lines, along with blank message translation lines.

msgid "This is a message!"
msgstr ""

The second step is to read this file into R, using poio’s read_po function.  (The same function reads both .po and .pot files, automatically detecting which is which.)

pot <- read_po(pot_file)

The file created by x has some incorrect metadata values.  These can be fixed by calling fix_metadata.

pot_fixed <- fix_metadata(pot)

Next, you need to choose some languages to translate your messages into.  You need to specify the languages as a two- or three-letter ISO 639 code.  These include “fr” for French, “zn” for Chinese, and country-specific variations like “pt_BR” for Brazilian Portuguese.  The language_codes dataset shows all the available language and country codes.

For each language, you must generate a language-specific po object from the master translation, using generate_po_from_pot, then write it to a .po file using write_po.

for(lang in c("de", "fr_BE"))
{
po <- generate_po_from_pot(pot, lang)
write_po(po)
}

That’s it! You are now ready to translate.

Next Steps

The msgtools package is currently under development, and has higher level tools for managing and updating translations, and integrating translations into packages.  The immediate next step is to integrate poio with msgtools and release the latter package to CRAN.

Beyond this, the RL10N project has a plan to tackle the second problem: finding translators.  This will involve integrating automated translation functionality from Google Translate and Microsoft Translator into msgtools, as well as providing assistance with getting human translators.

The start of satRdays

By Blog

by Gergely Daroczi, organizer

Almost 200 people from 19 countries registered for the first satRday conference which was held last Saturday, September 3rd, in Budapest. The final count showed that nearly 170 R users spent 12 hours at the conference venue attending workshops, regular and lighting talks, social events and a data visualization challenge. If you missed the event, you can rewatch the live stream of the conference talks at any time. An abridged version of the video recordings will also be uploaded to the conference homepage along with the slides and related materials in the next couple of days.

There was a pretty intense interest for the conference from the beginning: the registration opened at the end of June, just before the useR! conference, and 90% of the originally planned 150 tickets were gone in a month, when the early-bird period ended. To my great but pleasant surprise, it didn’t become a local Hungarian conference at all: on an average, every third registration came from another country. The 50/50 ratio of academic to industry tickets was similarly stable from the beginning.

We sold around 130 tickets without sharing any details on the line-up of invited and contributed talks, although previously announcing our two keynote speakers (Gabor Csardi and Jeroen Ooms) kind of guaranteed a high quality for the conference. Fortunately, we received a good number of talk proposals and decided to have 25 speakers after all:

speakers.jpg

It took a while to finalize the conference program and to figure out how we would fund an inexpensive event for so many attendees (as the number of registered attendees continued to increase by one or two every day), but things sorted out by the end of August and we received a good amount of financial help (covering 75% of the overall conference expenses) from our sponsors. Thank you!

sponsors.png

And the very early morning of September 3 arrived! I left home at 6am to arrive to the conference venue in time, and it was extremely exciting to see the first attendees arrive:

The registration took a bit longer than I hoped, but after around 10 minutes of delay, all 6 workshops were ready to start. I’m extremely proud of the great line-up of workshop speakers, who provided free training to all attendees on the validation package, H2O, data.table, ggplot2 and shiny.

The conference started with the above noted short delay, but we managed to get back on track in the later sessions — by forcing myself to act as an extremely strict conference chair pushing most of the questions to the coffee breaks. Thanks to all for your highly appreciated cooperation with this!

Gabor Csardi soon proved that it was a very good idea to have him as our morning keynote speaker — he kicked off the conference with an exciting talk on fun stories from the past years of R and also introduced some of his wonderful and extremely useful projects to us. Please keep up the good work!

The R Infrastructure session started right after Gabor’s keynote talk with four presentations on networking, using R and Python, R in MSSQL and other tools along with R for applications such as fighting fraud. Photos of these and other talks will be soon uploaded to the conference homepage, until then, you might want checking the #satRdays Twitter hashtag, where I posted a number of pics. For a quick insight, this is how the conference hall looked like:

The first session ended after noon, so we headed for a quick lunch:

And we soon started the next technical session on different R packages: Arun on data.table, Mark on the validate package and Romain on dplyr — all did a fantastic job not only while working on the packages, but with their talks as well. And yet, one of the most exciting moments of the conference happened between the talks, when one of our speakers decided to ask one of the attendees a very important and personal question: Congratulations to Cecile and Romain!

And we had our first lightning talk (exactly 15 slides each shown for 20 seconds) where Bo did a wonderful job and presented a lot of valuable information and summary in such short period of time. The session ended with our second keynote talk, where Jeroen shared some of his past R projects, showed some really impressive curl examples, and gave an inspiring intro to his new cool magick package for easy and advanced image manipulations on the top of ImageMagick:

The afternoon sessions, both regular and lightning talks, covered a wide range of machine learning tools and use-cases. In addition to the H2O machine learning tools, we learned about how R and ML is used at CERN, multivariate data analysis of time-series, political parties and Thomas Levine’s crazy tools for rendering data as music and virtual kebabs. (Variables were mapped to different spices.)  It was a good mix!

Oh, and I don’t want to forget about the talks on choosing the right tools for different use cases like: catching all Pokemons, visualizing geochemical models, on how to get your boss and colleagues to love R, and an inspiring proposal on the RUG Toolbox to enable networking among local R users; and the chance to learn about how to build JS-heavy, complex Shiny dashboards at Friss for example.

The conference ended with the Data Visualization Challenge, where 8 projects were shown in 3-3 minutes and the audience voted for the best visualizationwhile having a slice of pizza and some beers. It was great to see the very well prepared and creative dashboards and plots:

The formal event ended around 8:30 pm, more than 12 hours after the start of the morning workshops, with nearly 80 attendees walking 15 mins to a nearby pub for some additional informal conversations. For myself, this was the most rewarding moment of the event — to see that all the pretty hard work that Denes and I did during the past months (more on this in a follow-up post) paid off after all: people spent the whole satRday together in a fruitful environment, where new friendships and R package ideas were born.

Hope to see many similar events in the future!

The R Consortium Funds Three Projects in July

By Blog

by Joseph Rickert and Hadley Wickham

The Infrastructure Steering Committee (ISC) has approved funding for three of the thirteen proposed projects received during the most recent round of contributed proposals which closed on July 10th. The total amount awarded was just over $29,000. A brief description of each of these projects follows.

The R Documentation Task Force: The Next Generation R Documentation System

Andrew Redd received $10,000 to lead a new ISC working group, The R Documentation Task Force, which has a mission to design and build the next generation R documentation system. The task force will identify issues with documentation that currently exist, abstract the current Rd system into an R compatible structure, and extend this structure to include new considerations that were not concerns when the Rd system was first implemented. The goal of the project is to create a system that allows for documentation to exist as objects that can be manipulated inside R. This will make the process of creating R documentation much more flexible enabling new capabilities such as porting documentation from other languages or creating inline comments. The new capabilities will add rigor to the documentation process and enable the the system to operate more efficiently than any current methods allow. For more detail have a look at the R Documentation Task Force proposal (Full Text).

The task force team hopes to complete the new documentation system in time for the International R Users Conference, UseR! 2017, which begins July 4th 2017. If you are interested in participating in this task force, please contact Andrew Redd directly via email (andrew.redd@hsc.utah.edu). Outline your interest in the project, you experience with documentation any special skills you may have. The task force team is particularly interested in experience with documentation systems for languages other than R and C/C++.

Interactive data manipulation in mapview

The ISC awarded $9,100 to Tim Appelhans, Florian Detsch and Christoph Reudenbach the authors of the Interactive data manipulation in mapview project (Proposal) which aims to extend the capabilities of R for visualizing geospatial data by implementing a two-way data exchange mechanism between R and JavaScript. The central idea is to extend the capabilities of existing tools to enhance the user experience of interactively working with geospatial data by implementing mechanisms for two way data transfer. For example, although htmlwidgets has proven itself to be a powerful framework for enabling interactive, JavaScript based data visualizations, data flow from R to Javascript runs on a one-way street.  There is currently no way to pass manipulated data back into the user’s R environment. This project aims to first develop a general framework to provide a bridge between htmlwidgets and R to enable a workflow of R -> htmlwidgets -> R and then to use this framework to implement standard interactive spatial data manipulation tools for packages mapview and leaflet. The plan section of the project proposal provides considerable detail on the steps required to achieve the project’s goals.

If you would like to help and have strong R and JavaScript skills contact the authors directly via the email address provided in the links above.

R-Ladies Alignment and Global Expansion

The ISC awarded $10,000 to a team that includes members from both the London and San Francisco R-Ladies user groups (Gabriela de Queiroz, Chiin­Rui Tan, Alice Daish, Hannah Frick, Rachel Kirkham, Erin Ledell, Heather Turner, and Claudia Vitolo) to establish additional R-Ladies groups worldwide. The proposal (Full text) authors note that women are underrepresented in every role of the global R community; as leaders, package developers, conference speakers, conference participants, educators, and R users. They propose to address this issue through a series of practical actions that build on the success of the San Francisco and London R-Ladies groups in encouraging female participation. The team envisions the project unfolding in two phases. In the first phase, the team will identify the common elements contributing to the success of both existing R-Ladies groups, establish the “R-Ladies” brand and build a new centralized community infrastructure. The second phase will be devoted to managing the global expansion of the “R-Ladies” initiative through selective seeding of new groups around the world.

This is an ambitious project that will require a variety of technical skills (website design and development, Bookdown content development and graphic design, for example)  as well as expertise in marketing, public relations, social media communications and event organization. The R-Ladies are looking for help. If you are interested contributing your expertise or maybe starting a “R-Ladies” group in your area write to info@rladies.org.

Impact

Each of these projects has the potential to profoundly affect the R Community. R-Ladies will enlarge the community and strengthen the social fabric that binds it together. If successful, the R Documentation Task force will improve the environment for R package development and enrich the experience of every R user. The interactive data manipulation project has the potential to increase the synergy between R and JavaScript and set the direction for the development for interactive visualizations.

First Public Version of the r-hub Builder

By Blog

The r-hub builder is the first major project of the R consortium. It is an R package build and continuous integration service, open to all members of the R community.

Goals for R-Hub include:

  • simplify the R package development process: creating a package, building binaries and continuous integration, publishing, distributing and maintaining it;
  • encourage community contributions; and
  • pre-test CRAN package submissions to ease burden on CRAN maintainers.

What’s available

  • Linux builders for uploaded R source packages. You can watch the package check process in real time. Currently Debian and Fedora builders are available. Builds are performed in Docker containers, and new builders can be added easily.
  • Automatic detection of system requirements. We built a system requirements database that allows us to automatically install system software needed for R packages. Note that the database needs constant improvements, and if it fails for your R package, please let us know. See below.
  • Flexible package dependencies. You don’t need to have all your package dependencies on CRAN in order to use r-hub. We support devtools-styleRemotes fields in DESCRIPTION, so you can depend on GitHub, BitBucket, etc. packages. See more about this at https://cran.r-project.org/web/packages/devtools/vignettes/dependencies.html

Go to https://builder.r-hub.io to try the r-hub builder!

What’s coming?

Mostly everything else that was promised in the proposal The two major features that are coming soon are

  • Windows builds, and
  • The r-hub CI. You’ll be able to trigger builds from your GitHub repositories.

You can help

R Consortium Outlook 2016

By Blog

2016 is already shaping up to be another banner year for the R Project. The project is as active as ever, with the new R 3.2.4 just released (and R 3.3.0 following up very soon), and the milestone of 8,000 contributed packages was passed just last month (coincidentally, on the 16th anniversary of the release of R 1.0.0). Meanwhile, the popularity of the R language continues unabated: R was ranked #6 in IEEE’s Top Programming Languages of 2015, and it’s one of the fastest-growing languages on StackOverflow.

This popularity makes the mission of the R Consortium as important as ever: to support to the R community, the R Foundation, and everyone using, maintaining and distributing R software. Founded just last July, 2015 was a year of “spinning up” for the R Consortium: establishing a board, recruiting members, and setting a charter for the steering committee. With that groundwork behind us, 2016 will be a year of action. The Consortium will be distributing funds to community-nominated projects and soliciting more proposals, supporting the useR! 2016 conference in Stanford, and continuing to popularize R as the leading platform for data science research and applications.

The R Consortium recently welcomed Avant and Procogia as new Silver members. If your employer is a member of the R Consortium, we encourage you to show your support for our mission with this member badge. If not, please encourage your employer to become a member and provide their support as well.

We’ll have more news to share here in the blog in the coming weeks. In the meantime, we invite you to read the list of frequently-asked questions about the R Consortium, and follow the R Consortium on Twitter for the latest updates.

R Consortium Infrastructure Steering Committee (ISC) elects Chair

By Blog

by Joseph Rickert

This week, the Infrastructure Steering Committee (ISC) of the R Consortium unanimously elected Hadley Wickham as its chair thereby also giving Hadley a seat on the R Consortium board of directors. Congratulations Hadley!!

This is a major step forward towards putting the R Consortium in business. Not only is the ISC the group that will decide on what projects the R Consortium will undertake, but it will also be responsible for actually getting the work done. (Look here for the charter of the ISC. )

The whole process of funding, soliciting, selecting and executing projects will work something like this: The board of directors under the leadership of its chair, Richard Pugh of Mango Solutions, will establish a budget for projects. The ISC will solicit proposals for new projects both from R Consortium member companies and from the R Community at large.  With approval from the board, the ISC will decide which projects to fund. From there on, the ISC will assemble resources and manage the work. That’s the plan. The devil, of course, is in the details. There is much work to be done to put all of the necessary infrastructure in place, but Hadley’s election makes it possible for the ISC to begin bootstrapping the process.

So, while there is currently no formal proposal process in place, and the ISC and the R Consortium are not ready to begin the process of soliciting proposals from the public, it is not too early for the R Community to begin thinking about what work needs to be done. Now, is the time to begin thinking on a grand scale; well, at least on a scale that might be a bit more ambitious than creating a single R package.

What type of project might make the cut? I don’t want to set up any constraints here, or limit possibilities. But, just to pick one application area, it seems to me that that there were more than a few ideas kicked around in the HP Workshop on Distributed Computing in R held earlier this year that could be formulated into exciting and important projects. How about a unified interface for distributed computing in R?

If you have an idea for a project that you think would benefit the general R community but is more complicated than writing a simple package please start thinking about how you would write up your ideas, elaborating on the benefits to the R Community, technical feasibility, required resources etc. And, stayed tuned to R Consortium Announcements for information on when the proposal process will begin.

I’ll finish here by congratulating Hadley one more time, and state that I am very pleased to have the opportunity to work with him and the other members of the committee. I expect that with Hadley’s technical leadership, the guidance of the board of directors, and the participation of committed R users that the R Consortium will become an effective advocate and source of support for the R Community.

You can write to the ISC at:  isc@r-consortium.org

Some facts about the R Consortium
Founded: June 19, 2015
Status: The R Consortium is organized as a Linux Foundation Collaborative Project
Member organizations: Alteryx, Google, Hewlett Packard, Oracle, Ketchum Trading, Mango Solutions, Microsoft, The R foundation, RStudio and Tibco
Board of Directors: David Smith (Microsoft), Hadley Wickham (RStudio), John Chambers (R Foundation), J.J. Allaire (RStudio), Louis Bajuk-Yorgan (Tibco) and Chair, Richard Pugh (Mango Solutions)
ISC Members: Hadley Wickham (RStudio), Joseph Rickert (Microsoft), Luke Tierney (R Foundation) and Stephen Kaluzny (Tibco)

Best Practices for Using R Securely

By Blog

The R Consortium was formed to serve the interests of the R user community, and to that end the members of the R Consortium would like to share some best practices for using R securely and safely. These recommendations are not unique to R: you should follow similar practices for any software you download from the Internet.

If you download R (or R packages) using an unencrypted Internet connection, there is a possibility that a malicious actor could modify the code in transit (or substitute their own file), if they have access to the connection linking you and the CRAN server delivering the code. (This is possible, for example, when you download R using an unsecured Wi-Fi network.) This could potentially give an attacker the same rights you have to execute code on your system.

To eliminate the possibility of such an attack, the R Consortium recommends all R users to always download R and R packages using an encrypted HTTPS connection from a secure server. This document describes steps you can take to configure your existing or new R installations to adhere to best practices for secure R use.

1. Always download R installers from a CRAN server using HTTPS

Every time you download R, make sure you are connected to the download site using a secure HTTPS connection. Check that the URL of the web page you are using to download R begins with “https://” (not “http://”) and that your browser reports the site to be secure. (Here are some ways you can check: http://info.ssl.com/article.aspx?id=10068.)

If you are downloading R from CRAN, the following CRAN mirrors support HTTPS and we recommend using one of them:

The above list is complete as of August 12, 2015. Check the list of CRAN Mirrors for other HTTPS mirrors added since then.

2. Check the MD5 checksums of R before you begin the installation.

When you download R, the same webpage should also provide the “md5 checksum” for the installation. (It will be a long string of letters and digits. Here’s an example — but remember, it will be different for every version of R: 9578948a99ee6b74ff10b71b0891b94c.) After you download the file to install R, you should generate another md5 checksum for the file you downloaded, and make sure it matches the checksum provided on the download site. (Here are instructions for doing so on WindowsLinux, and Mac OS X). If the checksums do not match, do not install R using that file.

3. Configure R for secure file downloads

When downloading files over the Internet (including R packages), R must be configured such that a secure, HTTPS-enabled web server may be used.  To configure R appropriately, add code to your .Rprofile or Rprofile.site file. The instructions vary depending on the version of R and operating system you use. Note that this is the default configuration for R 3.2.2, so you do not need to take any action for R 3.2.2 or any later version of R.

R 3.2.0 and R 3.2.1

Windows:

options(download.file.method = “wininet”)

OS X and Linux:

options(download.file.method = “libcurl”)

R 3.1 and earlier

Windows:

utils::setInternet2(TRUE)

options(download.file.method = “internal”)

OS X:

options(download.file.method = “curl”)

Linux:

options(download.file.method = “wget”)

4. Always download CRAN packages from a secure mirror

The same cautions apply to R packages. Always make sure you are using a CRAN mirror that supports HTTPS, such as one from the list given in point 1 above.

To configure R to automatically use a secure mirror, add the following code to your .Rprofile or Rprofile.site file, using the mirror of your choice (beginning with “https://”) in the first line.

securemirror <- “https://cran.r-project.org/

local({

r <- getOption(“repos”);

r[“CRAN”] <- securemirror

options(repos=r)

})

Note that you do not need to check md5 sums for packages: R automatically checks md5 checksums before it installs any package.

Summary

With these simple steps, you can eliminate one vector of attack for a malicious actor who can intercept your communications. The R Consortium recommends all R users follow this practice.