Skip to main content

deposits R Package Delivers a Common Workflow for R Users

By November 30, 2023December 5th, 2023Blog

Mark Padgham, a Software Research Scientist for rOpenSci, has decades of experience in R, C, and C++, and maintains many packages on CRAN. Mark is leading the development of the deposits R package. Mark has been supported throughout this project by rOpenSci staff.

Publicly depositing datasets associated with published research is becoming more common, partly due to journals increasingly requiring data sharing and partly through more general and ongoing cultural changes with data sharing. Yet data sharing is often seen as time-consuming, particularly to meet the expectations of individual data repositories. While documentation and training can help familiarize users with data-sharing processes, browser-based data, and metadata submission workflows can only be so fast, are not easily reproduced, and do not facilitate regular or automated data and metadata updates. Better programmatic tools can transform data sharing from a mountainous climb into a pit of success.

deposits is a universal client for depositing and accessing research data in online deposition services. It provides a unified interface to many different research data repositories, which functions like dplyr, through “verbs” that work identically across many backend data repositories. 

Currently supported services are Zenodo and Figshare. These two systems have fundamentally different APIs and access to deposition services has traditionally been enabled through individual software clients. The deposits package aims to be a universal client offering access to a variety of deposition services, without users having to know any specific details of the APIs for each service.

The deposits package works seamlessly with the “frictionless” data workflow, to enable unified documentation of all aspects of datasets in one place. 

Outside of his work at rOpenSci, Mark has a passion for urban environments and understanding how cities can be improved. He is the lead developer of the Urban Analyst platform, ‘a platform for open-source, open-access interactive visualizations of urban structure and function.’ Mark says, “Cities cannot learn; therefore, I built a data platform for cities to learn from one another.”

RC: Doesn’t data sharing take too much time and too much effort? What is Deposits and what does it do? What problem are you solving?

Data sharing takes time and effort; everyone sharing from different places makes it hard to sync up. However, the deposits R package creates a common workflow for R users. It aims to streamline the data-sharing process. 

It addresses the issue of disparate data-sharing locations by creating a standardized workflow, simplifying the process for researchers. All deposits are initiated on the nominated services as “private” deposits, meaning they can only be viewed by the deposit owner until the owner decides. A deposit can only be publicly viewed once it has been published. The process of using deposits to prepare one or more datasets for publication involves multiple stages of editing and updating.

RC: How far along are you on the project? Currently supported services are zenodo and figshare. Will you be adding more?

Currently, the project provides support for Zenodo (CERN’s Data Centre) and Figshare (Open Data) as the initial services. There are plans to expand and include more repositories in the future. 

The team is working on integrating additional services, and there is a possibility of securing further funding for the Harvard Dataverse system, which operates as a federated rather than a centralized system. Integrating the DataVerse system presents additional complexities due to its federated nature, requiring more intricate API handling with greater flexibility but potentially posing challenges in adopting the workflow. 

RC: Have users contributed their own plugins to extend functionality to other repositories?

deposits is implementing a modular/plugin system to enable users to contribute their own plugins to extend functionality to other repositories. Users will be able to authenticate, prepare data and metadata, and finally submit, fetch, and browse data.

Actual activity around plugins has been a little slow so far. We are writing a JSON Schema for the system which will improve the process. We will be seeking people to build plugins after more adaptations are done and documented. Actually, I would not recommend regular users try to extend deposits to other repositories yet. But that is coming soon!

Preparing the data is a barrier, mainly during preparation. deposits is made to run workflows, documenting all the columns in a data table. By using deposits, there will be a completely painless, single update function. 

RC: What was your experience working with the R Consortium? Would you recommend applying for a grant to others?

Yes! The application process was painless and straightforward. In fact, I got a second grant recently. I’m very thankful for the support.

The only off-putting part of the process was no guidance on how much to ask for. You are fully enabled to submit it on your own. This is good, and I appreciate the outcome, getting financial support. But giving applicants better overall guidance would be very helpful. The R Consortium should make the application process more inclusive with more consultation.