All Posts By

josephrickert

Fall 2018: ISC Call for Proposals

By | Announcement, Blog

by Joseph Rickert

The second and final ISC Call for Proposals for 2018 is now open. We are looking for ambitious projects that will contribute to the infrastructure of the R ecosystem and benefit large sections of the R community. We are deliberately being a little vague here, but having awarded more than $650,000 in grants so far, we can show a substantial number of funded projects that provide examples.

If you are going to submit a proposal, “Think Big” but structure your proposal with intermediate milestones. The ISC is not likely to fund proposals that ask for large initial cash grants. We tend to be conservative with initial grants, preferring projects structured in such a way that significant initial milestones can be achieved with modest amounts of cash.

As with any proposed project, the more detailed and credible the project plan, and the better the track record of the project team, the higher the likelihood of receiving funding. Please be sure that your proposal includes measurable objectives, intermediate milestones, a list of all team members who will contributing work and a detailed accounting of how the grant money will be spent.

Also, if you think you are onto something but could use some help in finalizing scope of a project, or you think implementing your idea would require achieving some level of consensus within the R Community, you might consider asking the ISC to help you establish a working group.

If you don’t think you have an idea that is fundable but want to get involved, you might want to explore getting involved with existing projects or put some thought into one of the perennial issues associated with finding one’s way through the R ecosystem. For example, could you build a package discovery system or recommender engine that spans CRAN, Bioconductor and GitHub, or implement and curate a calendar that automatically tracks R related events worldwide?

Our goal in calling for proposals is to stimulate creativity and help turn good ideas into tangible benefits for the R Community. What can you do to improve the R ecosystem and how can the R Consortium help you do it?

Note that proposals to sponsor conferences, workshops or meetups should be sent directly to the R Consortium’s R User Group and Small Conference Support Program. These are not funded as ISC proposals. Note that the deadline for applying for support under the 2018 program is coming up quickly. Requests for support under the 2018 program must be received by midnight, September 30, 2018. The 2019 program will launch sometime in January.

To submit a proposal for ISC funding, read the Call for Proposals page and submit a self-contained pdf using the online form. You should receive confirmation within 24 hours.

The deadline for submitting a proposal is midnight PST, Sunday October 31, 2018.

Wanted: Your input on the next generation of R-Hub

By | Announcement, Blog, R Consortium Project

R-Hub, which was originally conceived as a useful tool for R package developers to build and test R packages on a variety of platforms, was the first project funded by the R Consortium. The initial version was released in June 2016. Now that the capabilities of R-Hub have progressed well beyond the proof of concept stage, the R Consortium is looking for ideas from the R community on how we can make it and even more useful for R users.

We would like to know how you think we could improve existing functionality and what new features you would like to see. So far, we have come up with the following list of future goals for R-Hub. We welcome comments and suggestions:

  • Enable organizations to deploy repositories and build infrastructure locally for use in controlled corporate environments.
  • Provide a system to manage source code, builds, and binary packages in a repository that offers confidence and trust to R users.
  • Enable end-users to use packages with confidence by providing tools to assess code pedigree, license, quality, security, and package maintenance for individual packages.
  • Encourage and enable package developers to provide metadata for their packages to help end users discover packages.
  • Provide package authors and maintainers a broad testing matrix that works on multiple architectures, operating systems, and R runtime engines.
  • Provide package developers with feedback required to assess and ensure broad compatibility for their packages.

We would very much appreciate comments on this vision for future development along with your assessment of the current system, including your answers to such questions as:

  • What value does R Hub provide you today?
  • What does R Hub not do well?
  • What other aspects of package development should R Hub add?
  • How could R Hub best serve the corporate package development, deployment, and management process?
  • Is there anything that CRAN isn’t providing that you would like to have?

Please send your comments to the following email address:  isc@r-consortium.com

Note that you may try R-Hub here.

R Consortium Call For Proposals: February 2018

By | Announcement, Blog, News, R Consortium Project

by Joseph Rickert

The first ISC Call for Proposals for 2018 is now open. We are looking for ambitious projects that will contribute to the infrastructure of the R ecosystem and benefit large sections of the R community. However, we are not likely to fund proposals that ask for large initial cash grants. The ISC tends to be conservative with initial grants, preferring projects structured in such a way that significant initial milestones can be achieved with modest amounts of cash.

As with any proposed project, the more detailed and credible the project plan, and the better the track record of the project team, the higher the likelihood of receiving funding. Please be sure that your proposal includes measurable objectives, intermediate milestones, a list of all team members who will contributing work and a detailed accounting of how the grant money will be spent.

But, most importantly – don’t let this talk of large projects dampen your enthusiasm! We are looking for projects with impact, regardless of their size. With this call for proposals, we are hoping to stimulate creativity and help turn good ideas into tangible benefits. Look around your corner of the R Community, what needs doing and how can the R Consortium help?

Please do not submit proposals to sponsor conferences, workshops or meetups. These requests should be sent directly to the R Consortium’s R User Group and Small Conference Support Program.

To submit a proposal for ISC funding, read the Call for Proposals page and submit a self-contained pdf using the online form. You should receive confirmation within 24 hours.

The deadline for submitting a proposal is midnight PST, Sunday April 1, 2018.

The 2018 R Consortium R User Group Support Program is Underway.

By | Announcement, Blog, Events, News, R Consortium Project

In just one year, the R Consortium through the R User Group Support program sponsored 76 R user groups and 3 small conferences with cash grants totaling just under $30,000. This program aligns with the R Consortium mission of fostering the continued growth of R community and the data science ecosystem, and has already helped bring more people to using R and contributing to the community.

Coming off a successful 2017, we are pleased to announce the opening of the 2018 program today. While the structure of the 2018 program is similar to last year’s program with the multiple levels of support, we have enhanced the program based on feedback from last year’s funded user groups.

Complimentary Meetup.com Pro Account

After a year of supporting user groups, we’ve found that the primary cost for each group is having a page on meetup.com or thier own website ( though the majority prefer the meetup.com platform ). This leaves less funds available things like meetup space, food, or even swag, and thus put more of a burden on the group leaders to attract people to the group.

This year we’ve leveraged our relationship with the Linux Foundation, and now will provide each user group a complimentary meetup.com Pro account. Leveraging this not removes one less cost concern for group leaders, but it will also better enable us to promote user groups through the many features the platform provides for groups. For all the details of the program, eligibility requirements for the three levels of user group grants, the schedule of grants and the details of signing up for the meetup.com pro account please see the R Consortium’s R User Group Support Program webpage.

Small Conference Support

We’ve also seen an increase in the number of smaller, regional focused R conferences happening around the world. Grassroots events like this are critical for sustainability in the R community, but need financial support and community awareness to be successful.  Several reached out last year and we provided funding with excess funds in the program with great results.

These events perfectly align with the mission of the R User Group Support program, we’re formally expanding it this year to provide cash grants in the $500 to $1,000 range to continue to encourage small, R-focused conferences and meetings organized by non-profit or volunteer groups are the world. You can find out more about this new piece of the program on the R Consortium’s R User Group Support Program webpage.

Length of Program

R Consortium will begin taking applications for both R User Group Support and Small Conference Support today. Applications will be accepted through September 30, 2018.

Apply to the 2018 RC RUGS program by filling out this form. You can email us at rugs@r-consortium.org with any questions around the program.

Recap of the uRos2017 conference

By | Blog, Events

Editor’s Note: This post comes from Nicoleta Caragea, uRos2017 conference organizer. uRos2017 is a conference held in November 2017  for collaboration around the use of R in Romania.  Through the RUGS program, R Consortium was honored to be a sponsor for this event. If you have an smaller event you would like support for, stay tuned for the official program announcement in early 2018.

uRos2017 conference

The International Conference New Challenges for Statistical Software – The Use of R in Official Statistics – uRos2017, the fifth in a series of events, organized at Romanian NIS dedicated to the use of R Project in Romania, was held between 6-7 of November 2017. The conference, which provides a public forum for researchers from academia and institutes of statistics, brought together over 60 participants from 20 countries (Austria, Canada, Columbia, Croatia, France, Deutschland, Italy, Irak, Japan, Lithuania, Luxembourg, Morocco, Netherlands, Norway, Poland, Romania, Spain, Switzerland and Turkey). Moreover, representatives from Eurostat and other international organizations (United Nations/UNIDO and FAO) attended as guests.

Not only was uRos2017 an opportunity to develop new ideas and cooperation in the field of official statistics, the event once again demonstrated the significant role played by National Institute of Statistics in the official statistics and gives Romania a prominent spot on the map of useRs.

uRos2017 growth

Throughout the five editions of the event, the international participation has increased exponentially.

The event hosted, besides the presentations, eight workshops lectured by prestigious professionals from official statistics and academia:

  • Mark van der Loo (Statistics Netherlands), Statistical data cleaning with R
  • Valentin Todorov (United Nations Industrial Development Organization), R in the statistical office: the UNIDO experience
  • Bernhard Meindl (Statistics Austria), Current developments in R-packages sdcMicro and sdcTable for statistical disclosure control
  • Marcello D’Orazio (Food and Agriculture Organization of The United Nations), Outlier detection in R: some remarks
  • Camelia Goga (Institut de Mathématiques de Bourgogne, Université de Bourgogne, France), Survey sampling techniques with R
  • Hervé Cardot (Institut de Mathématiques de Bourgogne, Université de Bourgogne, France), Fast robust center estimation, clustering and Principal Components Analysis with large samples in high dimension with R
  • Bogdan Oancea (National Institute of Statistics/University of Bucharest, Romania) and Ciprian Alexandru (National Institute of Statistics/Ecological University of Bucharest, Romania), From unstructured data to structured data – Web scraping for Official Statistics
  • Elena Druică (University of Bucharest, Department of Economic and Administrative Sciences, Romania), Working with the ‘pglm’ package in R. Explaining the number of nosocomial infections in Romanian hospitals

uRos2017 speakers 1 uRos2017 speakers 2

The proceedings of the conference, which took place in parallel sections and included 22 presentations and 8 thematic workshops, will be published in two issues of Romanian Statistical Review: no. 4/2017 and no. 1/2018. The first one has already been published and handed to the participants during the conference, and the second one will be released in March 2018.

Romanian Statistical Review 4/2017A novelty of this year’s edition is that the conference joined with “International Conference On Computing, Mathematics And Statistics 2017” (iCMS2017), held in Langkawi Island, Malaysia. Nicolaas Jan Dirk Nagelkerke, Matthias Templ and Martin Everett delivered keynote talks at uRos2017 Asia Pacific/iCMS2017.

As a satellite event of uRos2017, a meeting between Japan’s, Austria’s (UN/UNIDO) and Romanian NIS representatives took place on November 8. The meeting was an opportunity to exchange ideas and knowledge. The discussions regarded the following subjects:

  • Modernization of Romanian Official Statistics
  • The use of R in statistical surveys
  • Data editing (outlier detection, imputation etc.)
  • Generation of statistical reports using R with Sweave/knitr
  • Online data collection for business statistics surveys

You can find more information about uRos2017 at the conference website.

The R-omanian team, has agreed to organize uRos2018 together with our colleagues from CBS-Netherlands. Keep in contact on: https://twitter.com/uRos2018.

ISC PROPOSAL SUBMISSION FAILURE

By | Blog, News, R Language

by Joseph Rickert

The ISC has determined that an error in the ISC proposal submission process has caused us to lose some, but not all proposals. If you submitted a project proposal to the ISC and have received a confirmation email then you are fine. However, if you have not received a confirmation email, please email your proposal as a pdf attachment to proposal@r-consortium.org

If you do not receive a confirmation within 24 hours please email hadley@rstudio.com. Once again, if you have already received a confirmation that your proposal was received you do not need to take further action.

The revised deadline for submitting proposals is now midnight PST, Sunday October 15th.

We apologize for the inconvenience.

R Consortium Call for Proposals: Summer 2017

By | Announcement, Blog, News

by Joseph Rickert and Hadley Wickham

The second and final ISC Call for Proposals for 2017 is now open. In this round, with the intention of spreading the available funds as widely as possible, the ISC is encouraging the R community to submit proposals for projects that are smaller in scope than those solicited earlier this year. For this round, the total funds requested for an individual grant should be less than $10,000. Look at the Simple Features Project as an example of what can be achieved with this level of funding.

Note that the current funding cap should not discourage anyone with plans for a more ambitious project. The ISC tends to be conservative with initial grants for large projects. So, framing your initial proposal as a “proof of concept” or “initial objective” of a large project – with an estimate of the total project cost – will not necessarily slow down the work.

As always, proposals should clearly describe the problem that needs to be solved, and be likely to have an impact on a broad segment of the R Community. Keep in mind that the ISC generally does not fund projects that apply to a limited geographic region, or a very specialized domain.

Please do not submit proposals to sponsor conferences, workshops or meetups. The R Consortium is in the process of establishing a “Marketing Committee” reporting directly to the Board of Directors, for this purpose. Until the Marketing Committee establishes a more formal procedure, please send your request for a conference or meeting sponsorship to me, joseph.rickert@rstudio.com, and I will see that it gets forwarded to the committee.

The R Consortium and ISC are proud to report that, so far, we have awarded nearly half a million dollars in grants. With your help, we can continue this pace in the future. We need solid, well thought-out proposals. Act now! Submit a proposal using this form. The current call for proposals will end at midnight PST on September 15, 2017.

Take the R Consortium’s Survey on R!

By | Announcement, Blog, News, R Consortium Project, R Language

by Joseph Rickert and Hadley Wickham

Help us keep the conversation going: Take the R Consortium’s Survey. Let us know: What are you thinking? What do you make of the way R is developing? How do you use R? What is important to you? How could life be better? What issues should we be addressing? What does the big picture look like? We are looking for a few clues and we would like to hear from the entire R Community.

    

The R Consortium exists to promote R as a language, environment and community. In order to answer some of the questions above and to help us understand our mission better we have put together the first of what we hope will be an annual survey of R users. This first attempt is a prototype. We don’t have any particular hypothesis or point of view. We would like to reach everyone who is interested in participating. So please, take a few minutes to take the survey yourself and help us get the word out. The survey will adapt depending on your answers, but will take about 10 minutes to complete.

The anonymized results of the survey will be made available to the community for analysis. Thank you for participating.

                                                                                   Take the survey now!      

现在进行调查!     今すぐ調査をしてください!    Participez à l’enquête en ligne!    ¡Tome la encuesta ahora!

 

 

Code Coverage Tool for R Working Group Achieves First Release

By | Blog, News, R Consortium Project, R Language

by Mark Hornick, Code Coverage Working Group Leader

The “Code Coverage Tool for R” project, proposed by Oracle and approved by the R Consortium Infrastructure Steering Committee, started just over a year ago. Project goals included providing an enhanced tool that determines code coverage upon execution of a test suite, and leveraging such a tool more broadly as part of the R ecosystem.

What is code coverage?

As defined in Wikipedia, “code coverage is a measure used to describe the degree to which the source code of a program is executed when a particular test suite runs. A program with high code coverage, measured as a percentage, has had more of its source code executed during testing which suggests it has a lower chance of containing undetected software bugs compared to a program with low code coverage.”

Why code coverage?

Code coverage is an essential metric for understanding software quality. For R, developers and users alike should be able to easily see what percent of an R package’s code has been tested and the status of those tests. By knowing code is well-tested, users have greater confidence in selecting CRAN packages. Further, automating test suite execution with code coverage analysis helps ensure new package versions don’t unknowingly break existing tests and user code.

Approach and main features in release

After surveying the available code coverage tools in the R ecosystem, the working group decided to use the covr package, started by Jim Hester in December 2014, as a foundation and continue to build on its success. The working group has enhanced covr to support even more R language aspects and needed functionality, including:

  • R6 methods support
  • Address parallel code coverage
  • Enable compiling R with Intel compiler ICC
  • Enhanced documentation / vignettes
  • Provide tool for benchmarking and defining canonical test suite for covr
  • Clean up dependent package license conflicts and change covr license to GPL-3

CRAN Process

Today, code coverage is an optional part of R package development. Some package authors/maintainers provide test suites and leverage code coverage to assess code quality. As noted above, code coverage has significant benefits for the R community to help ensure correct and robust software. One of the goals of the Code Coverage project is to incorporate code coverage testing and reporting into the CRAN process. This will involve working with the R Foundation and the R community on the following points:

  • Encourage package authors and maintainers to develop, maintain, and expand test suites with their packages, and use the enhanced covr package to assess coverage
  • Enable automatic execution of provided test suites as part of the CRAN process, just as binaries of software packages are made available, test suites would be executed and code coverage computed per package
  • Display on each packages CRAN web page its code coverage results, e.g., the overall coverage percentage and a detailed report showing coverage per line of source code.

Next Steps

The working group will assess additional enhancements for covr that will benefit the R community. In addition, we plan to explore with the R Foundation the inclusion of code coverage results in the CRAN process.

Acknowledgements

The following individuals are members of the Code Coverage Working Group:

  • Shivank Agrawal
  • Chris Campbell
  • Santosh Chaudhari
  • Karl Forner
  • Jim Hester
  • Mark Hornick
  • Chen Liang
  • Willem Ligtenberg
  • Andy Nicholls
  • Vlad Sharanhovich
  • Tobias Verbeke
  • Qin Wang
  • Hadley Wickham – ISC Sponsor

Improving DBI: A Retrospect

By | Blog, News, R Consortium Project, R Language

by Kirill Müller

The “Improving DBI” project, funded by the R consortium and started about a year ago includes the definition and implementation of a testable specification for DBI and making RSQLite fully compliant to the new specification. Besides the established DBI and RSQLite packages, I have spent a lot of time on the new DBItest package. Final updates to these packages will be pushed to CRAN at the end of May. This should give downstream maintainers some time to make accommodations. The follow-up project “Establishing DBI” will focus on fully DBI-compliant backends for MySQL/MariaDB and PostgreSQL, and on minor updates to the specs where appropriate.

DBItest: Specification

The new DBItest package provides a comprehensive backend-agnostic test suite for DBI backends. When the project started, it was merely a collection of test cases. I have considerably expanded the test cases and provided a human-readable description for each, using literate programming techniques powered by roxygen2. The DBI package weaves these chunks of text to a single document that describes all test cases covered by the test suite, the textual DBI specification. This approach ensures that further updates to the specification are reflected in both the automatic tests and the text.

This package is aimed at backend implementers, who now can programmatically check with very little effort if their DBI backend conforms to the DBI specification. The verification can be integrated in the automated tests which are run as part of R’s package check mechanism in R CMD check. The odbc package, a new DBI-compliant interface to the ODBC interface, has been using DBItest from day one to enable test-driven development. The bigrquery package is another user of DBItest.

Because not all DBMS support all aspects of DBI, the DBItest package allows developers to restrict which parts of the specification are tested, and “tweak” certain aspects of the tests, e.g., the format of placeholders in parameterized queries. Adapting to other DBMS may require more work due to subtle differences in the implementation of SQL between various DBMS.

DBI: Definition

This package has been around since 2001, it defines the actual DataBase Interface in R.

I have taken over maintenance, and released versions 0.4-1, 0.5-1, and 0.6-1, with release of version 0.7 pending. The most prominent change in this package is, of course, the textual DBI specification, which is included as an HTML vignette in the package. The documentation for the various methods defined by DBI is obtained directly from the specification. These help topics are combined in a sensible order to a single, self-contained document. This format is useful for both DBI users and implementers: users can look up the behavior of a method directly from its help page, and implementers can browse a comprehensive document that describes all aspects of the interface. I have also revised the description and the examples for all help topics. Other changes include:

  • the definition of new generics dbSendStatement() and dbExecute(), for backends that distinguish between queries that return a table and statements that manipulate data,
  • the new dbWithTransaction() generic and the dbBreak() helper function, thanks Barbara Borges Ribero,
  • improved or new default implementations for methods like dbGetQuery(), dbReadTable(), dbQuoteString(), dbQuoteIdentifier(),
  • internal changes that allow methods that don’t have a meaningful return value to return silently,
  • translation of a helper function from C++ to R, to remove the dependency on Rcpp (thanks Hannes Mühleisen).

Fortunately, none of the changes seemed to have introduced any major regression issues with downstream packages. The news contain a comprehensive list of changes.

RSQLite: Implementation

RSQLite 1.1-2 is a complete rewrite of the original C implementation. Before focusing on compliance to the new DBI specification, it was important to assert compatibility to more than 100 packages on CRAN and Bioconductor that use RSQLite. These packages revealed many usage patterns that were difficult to foresee. Most of these usage patterns are supported in version 1.1-2, the more esoteric ones (such as supplying an integer where a logical is required) trigger a warning.

Several rounds of “revdep checking” were necessary before most packages showed no difference in their check output compared to the original implementation. The downstream maintainers and the Bioconductor team were very supportive, and helped spotting functional and performance regressions during the release process. Two point releases were necessary to finally achieve a stable state.

Supporting 64-bit integers also was trickier than anticipated. There is no built-in way to represent 64-bit integers in R. The bit64 package works around this limitation by using a numeric vector as storage, which also happens to use 8 bytes per element, and providing coercion functions. But when an integer column is fetched, it cannot be foreseen if a 64-bit value will occur in the result, and smaller integers must use R’s built-in integer type. For this purpose, an efficient data structure for collecting vectors, which is capable of changing the data type on the fly, has been implemented in C++. This data structure will be useful for many other DBI backends that need support for a 64-bit integer data type, and will be ported to the RKazam package in the follow-up project.

Once the DBI specification was completed, the process of making RSQLite compliant was easy: enable one of the disabled tests, fix the code, make sure all tests pass, rinse, and repeat. If you haven’t tried it, I seriously recommend test-driven development, especially when the tests are already implemented.

The upcoming release of RSQLite 2.0 will require stronger adherence to the DBI specification also from callers. Where possible, I tried to maintain backward compatibility, but in some cases breaks were inevitable because otherwise I’d have had to introduce far too many exceptions and corner cases in the DBI spec. For instance, row names are no longer included by default when writing or reading tables. The original behavior can be re-enabled by calling pkgconfig::set_config(), so that packages or scripts that rely on row names continue to work as before. (The setting is active for the duration of the session, but only for the caller that has called pkgconfig::set_config().) I’m happy to include compatibility switches for other breaking changes if necessary and desired, to achieve both adherence to the specs and compatibility with existing behavior.

A comprehensive list of changes can be found in the news.

Other bits and pieces

The RKazam package is a ready-to-use boilerplate for a DBI backend, named after the hypothetical DBMS used as example in a DBI vignette. It already “passes” all tests of the DBItest package, mostly by calling a function that skips the current test. Starting a DBI backend from scratch requires only copying and renaming the package’s code.

R has limited support for time-of-day data. The hms package aims at filling this gap. It will be useful especially in the follow-up project, because SQLite doesn’t have an intrinsic type for time-of-day data, unlike many other DBMS.

Next steps

The ensemble CRAN release of the three packages DBI, DBItest and RSQLite will occur in parallel to the startup phase for the “Establishing DBI” follow-up project. This project consists of:

  • Fully DBI compatible backends for MySQL/MariaDB and Postgres
  • A backend-agnostic C++ data structure to collect column data in the RKazam package
  • Support for spatial data

In addition, it will contain an update to the DBI specification, mostly concerning support for schemas and for querying the structure of the table returned for a query. Targeting three DBMS instead of one will help properly specify these two particularly tricky parts of DBI. I’m happy to take further feedback from users and backend implementers towards further improvement of the DBI specification.

Acknowledgments

Many thanks to the R Consortium, which has sponsored this project, and to the many contributors who have spotted problems, suggested improvements, submitted pull requests, or otherwise helped make this project a great success. In particular, I’d like to thank Hadley Wickham, who suggested the idea, supported initial development of the DBItest package, and provided helpful feedback; and Christoph Hösler, Hannes Mühleisen, Imanuel Costigan, Jim Hester, Marcel Boldt, and @thrasibule for using it and contributing to it. I enjoyed working on this project, looking forward to “Establishing DBI”!