Skip to main content


Webinar: Discover the Future of R in Regulatory Submissions

By Blog, Events

Are you an analyst, statistician, or data science enthusiast keen on understanding how open source software like R shapes the future of regulatory submissions? This webinar is for you!

“R and Shiny in Regulatory Submission”

  • When: Dec 11, 2023
  • Time: 3:00 p.m. – 4:30 p.m. EDT
  • Duration: 1.5 hours
  • Where: Zoom – Virtual Event
  • Hosts: FDA Statistical Association and R Consortium

Agenda Highlights:

  • Opening Remarks by Ning Leng, People and Product Lead in Product Development Data Sciences at Roche.
  • Presentation on Open Source Software for Regulatory Submissions by Paul Scheutte, FDA. 
  • Dive into R Consortium R Submission Pilot 2 with Eric Nantz from Eli Lilly. Discover how an R-based submission with a shiny component unfolded.
  • Reviewing Experience of R-based Submissions with Hye Soo Cho, FDA. Understand the nuances and challenges faced during the review process.
  • Interactive Panel Discussion moderated by Ning Leng. Join Paul Schuette, Hye Soo Cho, and Eric Nantz as they delve deep into the R adoption journey and discuss practical challenges and solutions.

Why Should You Attend?

The data science world is rapidly evolving, and R is at the forefront of this transformation. With a robust open source community backing it, R brings many cutting-edge statistical tools—a standout feature. R shiny offers unparalleled flexibility and interactivity, revolutionizing how data scientists operate. Recently, the R consortium broke new ground by introducing a Shiny component in a submission package, a pivotal moment marking the fusion of open source capabilities with formal regulatory processes.

In the upcoming webinar, the FDA and industry speakers will share their unique experiences with R-based and shiny-based submissions. Whether you’re an industry professional or an aspiring data scientist, this is an opportunity to stay ahead of the curve.


Integrating open source software in regulatory processes represents a leap toward more transparent, efficient, and adaptable systems. Don’t miss this golden opportunity to learn, interact, and contribute to this transformative journey.

Register for the Webinar Now!

About the R consortium R Adoption Series

The R consortium R adoption series is a curated set of webinars focusing on the growing adoption of the R programming language in the data science community. Each webinar provides insights through a compelling case study and offers an interactive platform for attendees to pose questions and learn. This series is a collaborative effort by the R consortium, PhUSE, and PSI. Dive deep into the series here: R-Adoption Webinar Series.

PHUSE Connect EU 2023 –  Clinical Data Science Conference – Coming in Early November

By Blog, Events

Blog post for R Consortium partner PHUSE

Join PHUSE in Birmingham, UK, from November 5-8 for an event that promises to energise your learning through connection with the PHUSE Community and hearing from our experts on the topics that are most important to industry today.

The R Consortium is participating directly in one session on Tuesday, November 11, starting at 11am local time. Dive deep into the world of open source in pharma by joining a panel discussion that includes Mehar Pratap Singh, Chairman of the Board of Directors at the R Consortium, Director Sumesh Kalappurakal, and other open source representatives entitled “Let’s Discuss Open Source Openly: A New Path in Pharma.” This is an excellent opportunity to engage in a meaningful discussion and gain insights from leading voices in the industry. 

One of the highlights of this event will be “Let’s Discuss Open Source Openly, A New Path in Pharma,” a session led by Director Sumesh Kalappurakal, Mehar Pratap Singh, Chairman of the Board of Directors at the R Consortium, and other open source representatives. Dive deep into the world of open source in pharma, understanding its implications and the potential it holds for the future. This is a golden opportunity to engage in meaningful discussions and gain insights from a leading voice in the industry. 

It’s not too late to register your place or secure a sponsorship opportunity! View full information via the links below and explore the agenda to see what’s in store for attendees.


Event Information: 


R-Ladies Morelia, Mexico, hosts First Anniversary Event on July 31, 2023

By Blog, Events

R-Ladies Morelia is celebrating its first anniversary on the 31st of July 2023, and hosting a hybrid event to mark this occasion. In this event, they plan on providing the Center of Mathematical Science at UNAM with an analysis of their recruitment, graduation, and research data. 

Nelly Sélem, co-founder and organizer of the group also discussed the group’s rapid growth over the course of a year. She also shared how she uses R for her work as a bioinformatics researcher.

Please share about your background and involvement with the RUGS group.

I am a professor at the Center for Mathematical Sciences at UNAM in Morelia, Mexico. I earned a degree in Mathematics from the University of Guanajuato and a master’s degree from CIMAT. Then, I did a Ph.D. and a Post-doctorate in Integrative Biology at the Evolution of Metabolic Diversity lab at Langebio-Cinvestav. I care about teaching. I have taught at prestigious México Universities: UNAM, ITESM, IPN, and CINVESTAV. I contributed to the educational community by developing a metagenomics open-source lesson in “The Carpentries Incubator.” I’m a founder member of BetterLab, a biotechnology and software startup, and I’m also a member of the Mexican SARS-CoV-2 Genomic Surveillance Consortium.

As a scientist, I have proposed and developed bioinformatics solutions to biological problems of comparative genomics of microorganisms. I am interested in the genome evolution of Archaea, Bacteria, and Fungi. 

I founded the R Ladies Morelia chapter with Haydee and Claudia last year. We try to organize meetings every month more or less. And this year on our first anniversary we plan to hold a big annual meeting in which we will get to meet more people. 

Can you share what the R community is like in Mexico?

I can only talk about the R Ladies chapters in Mexico, as I am more familiar with them. We have several chapters in Mexico and each year there is an annual meeting for all cities. 

The Mexico City and Cuernavaca chapters are rather big. I would say, overall, there is a lot of interest on social media and members of R-Ladies chapters are inviting other girls to learn to code. 

Our chapter is also growing rapidly as we started with four members and now we have a stable community. On the best days, we have up to 90 people attending our events but on average we have between 12 to 20 attendees. 

Most of the R-Ladies chapters in Mexico are being run by people from academia and sponsored by universities. I do know that some of us work in the area of bioinformatics and Bioconductor. 

We are also close to the international R community because we are following the R Champions program. I think it’s for Latin America and we are trying to get connected with that program.

You have a Meetup on “Graphics for the Center of Mathematical Science,” can you share more on the topic covered? Why this topic? 

For this meetup, which is also our first anniversary, we plan on giving the Center of Mathematical Science, National University of Mexico an evaluation. It would include a comparison of the number of students being graduated each year and the quality standard of researchers against other universities from Mathematics in Mexico and Latin America. The center has sponsored us for the past year, and it is going through the process of becoming a bigger institute. 

With this event, we are trying to give back to the center with data analysis of its basic statistics. The audience will learn to use dataframes and ggplot to visualize data. We will be working in teams to teach basic ggplot visualizations. And on the second day, we will be giving small workshops and sharing our work with each other. All our events are Hybrid so this one is also going to be Hybrid and people will attend both physically and virtually. 

We hope to grow our community through this event and also contribute to the annual report of the Center of Mathematical Sciences. 

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?   

Meetup has been very helpful for keeping everything organized, and we use Zoom for our virtual meetings. We also share code through our GitHub repo and people can go back to it after meetings. For communication between organizers, we mostly use WhatsApp chat. 

At the start of the semester, we plan events for that semester with dates, speakers, and topics to be covered. We work in teams, so we can help each other. Sometimes we go through chapters of a book, or we just go for an R package. We consider ourselves a community of practice. Even if people don’t know a lot, we do some data analysis and share the code on the meeting day.

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

I would like to mention MetaEvoMining, a project one of my undergrad students is working on for his thesis. We are trying to treat metagenomic data in order to look for some gene families that are going through expansions. And maybe these expansions conduce to recruitment into antibiotic gene producers. So we are looking for something different in gene families that may be recruited to new antibiotic gene families. This has been researched in genomes but not in metagenomes and there is a lot more data available in metagenomes. We want to develop an R package for this purpose. For the project, we are using Posit (RStudio). We are also using packages like ggplot and RString. We are also using tidyverse in general.

A Memorable Experience: Attending the NYC-R Conference and Key Takeaways

By Blog, Events

This is a guest post by Joseph Korszun, Senior Manager of Data Solutions at ProCogia. ProCogia is a member of the R Consortium. Joe is a data scientist with a background in mathematics and engineering. He is passionate about using statistical analysis to improve business decisions by developing scalable and flexible solutions that solve complex problems.


New York City, known for its vibrant energy and thriving tech scene, became the epicenter of data and analytics during the recent NYC-R Conference. As an avid data enthusiast, I couldn’t resist the opportunity to immerse myself in this bustling conference and gain valuable insights into the world of R and Python programming.

I had the pleasure of representing ProCogia and the R-Consortium. The experts who stood before us showcased their deep knowledge and dedication to advancing data science. It was remarkable to see people so passionate about the R programming language and its applications in the field. Throughout the event, they engaged the audience with informative presentations and interactive workshops, sparking insightful discussions among attendees. The positive reception from the crowd highlighted the significance of collaboration and knowledge-sharing in the data science community. I was inspired by their expertise and left the conference with a renewed enthusiasm for data science and the possibilities it offers. The event provided a fantastic opportunity to connect with like-minded professionals and learn from the best in the industry. I am grateful for the experience and eagerly look forward to seeing more contributions in the future. In this blog post, I will share my experiences attending the NYC-R Conference and highlight some key takeaways that left a lasting impression.

Pre-Conference: Workshops at NYC-R

The NYC-R Conference’s workshops promise a thrilling exploration of diverse data science topics using the power of R programming. Attendees will embark on an immersive journey to delve into essential areas of data science, including time series forecasting, machine learning, Bayesian data analysis, and causal inference. Industry experts led these workshops offer a unique opportunity to expand data science expertise and harness the true potential of R in driving data-driven innovations. Some of the below workshops were provided in the first two days of the NYC-R Conference:

  • Tidy Time Series and Forecasting in R by Michell O’Hara-Wild
  • Machine Learning in R by Max Kuhn
  • Bayesian Data Analysis and STAN by Jonah Gabry
  • Causal Inference in R by Malcolm Barrett and Lucky D’Agostino McGowan

Day 1: An Exciting Kickoff

The conference commenced with an invigorating address, highlighting the growing significance of R in the industry and the importance of fostering its continued development. The vibrant atmosphere was infectious as I was surrounded by like-minded individuals who shared the same passion for data science.

The day was packed with informative sessions covering a various of topics, from advanced data visualization techniques to machine learning algorithms.

Day 1 of the NYC-R Conference featured diverse and insightful presentations, showcasing the remarkable potential of the R programming language in data science. Attendees explored various topics, including transitioning to Quarto for interactive data reports, building R packages with LLMs, and making impactful design decisions for statistical software visualizations. The presentations also delved into data-driven marketing channel attribution, the power of OpenAI’s Embeddings API, and the art of creating captivating presentations through Slidecraft. Experts from NFL Next Gen Stats revealed the many models powering sports analytics, underscoring the transformative role of data science in the sports industry. The conference left attendees inspired and equipped with valuable skills to drive data- driven innovation in their fields.

The Importance of Continuous Learning

Day 1 of the NYC-R Conference was a remarkable showcase of the importance of continued learning and the incredible potential of the R programming language. As data enthusiasts gathered, the conference provided a platform for exploring various facets of R and its impact on data-driven decision-making.

The NYC-R Conference became a hub of knowledge sharing and collaboration, where data professionals engaged in vibrant discussions and exchanged ideas. This collaborative environment emphasized the significance of staying updated with the latest trends in data science to remain at the forefront of innovation.

Day 2: Unlocking Data Insights through Advanced Analytics

Day 2 of the NYC-R Conference was a captivating journey into the forefront of data science. Attendees were treated to lectures and presentations that showcased the latest advancements in the field. The exploration of Bayesian Boosting revealed its potential for predictive modeling, offering a fresh perspective on data analysis techniques.

In an enlightening presentation, a renowned data science expert delved into the importance of democratizing data access in the session “An Ode to Permissionless Data Science.” This inspiring talk encouraged attendees to foster a more inclusive and collaborative data science community, empowering data professionals to drive innovation together.

Participants were enthralled by demonstrations of LLM use, equipping them with practical skills to build robust R packages. The “How to Make Decisions with Data” session, empowered attendees to derive meaningful insights, ensuring data-driven strategies and informed decision-making.

The day continued with captivating lectures that covered various data science aspects, concluding with a live episode of the SuperDataScience Podcast. The podcast provided invaluable industry insights and sparked engaging discussions, leaving attendees inspired and eager to apply their newfound knowledge in their data-driven endeavors. Day 2 at the NYC-R Conference left participants with a deeper understanding of data science’s evolving landscape, motivating them to make a lasting impact in the dynamic world of data-driven innovation.

Language Wars Still at Large

Wes McKinney, the brilliant mind behind pandas, addressed the ever-lingering “Language Wars” in the data science realm. With a focus on breaking down barriers and fostering interoperability, McKinney unveiled how Apache Arrow and the Python Polars library are revolutionizing the data stack. Attendees were enthralled by McKinney’s insights on harnessing the power of these cutting-edge tools to streamline data operations, improve performance, and enable seamless data exchange across programming languages. As the discussion unfolded, it became evident that the quest for data-driven excellence continues, and the open-source community remains at the forefront of bridging the gap between programming languages for the betterment of data science.

The Power of Community

The conference highlighted the power of community in the world of data science. Interacting with professionals from diverse backgrounds provided fresh perspectives and insights, fostering an environment of collaborative learning and growth. As a sponsor member of the R-Consortium, ProCogia extends its heartfelt gratitude for their invaluable support in making this event possible. Their commitment to advancing the R programming language and data science community has been instrumental in creating a vibrant platform for knowledge sharing and networking. The connections made during the NYC-R Conference are a testament to the strength of this community, forming the foundation for future collaborations and knowledge sharing that will undoubtedly drive data-driven innovations for years to come. ProCogia is proud to be part of this thriving community and looks forward to continuing its involvement in fostering growth and innovation within the R community.


Attending the NYC-R Conference was an exhilarating and enlightening experience. The conference reiterated the widespread adoption of R as a powerful tool in data science. Numerous presenters showcased their impressive projects and highlighted the versatility of R in data analysis, modeling, and visualization. It became evident that R is not just a programming language but an entire ecosystem that supports data-driven decision-making across various domains.

The conference showcased the immense potential of R in data science, emphasized the importance of continuous learning, and highlighted the value of community and collaboration. As I left the conference with a wealth of new knowledge and connections, I felt inspired to apply what I had learned in my own data-driven endeavors. The NYC-R Conference not only expanded my horizons but also reinforced my passion for the exciting world of data science.

R Validation Hub Community Meeting – June Recap ↺

By Blog, Events

The R Validation Hub recently had its community meeting after a brief hiatus. The team discussed announcements, common challenges, and brainstormed ideas for possible future projects through the R Validation Hub. Here are some highlights of their meeting. Stay tuned for the next monthly community meeting, dates to be announced!

📢 Announcements

We have some announcements about R Val Hub leadership and structure! Doug Kelkhoff, Principal Data Scientist / Statistical Software Engineer at Roche, will be taking over the R Validation lead role from Andy Nicholls.

Doug Kelkhoff has been supporting the adoption of R in the pharmaceutical space for the past 6 years. During his time at Roche, he has pushed the adoption of R through pilot clinical trials, showcasing the benefits of using R by crafting internal tools and building services that embed the R Validation Hub’s guidance as part of our software development lifecycle. He is passionate about making the use of open source tools in a regulated setting a viable path not just for large pharmaceutical companies, but for lean startups and the public sector by addressing challenges through open initiatives. We welcome Doug as the new lead!

Andy Nicholls, Senior Director, Head of Data Science at GSK, has been the lead of the R Validation team for over four years. He has greatly contributed to the working group, including his work with the R Adoption Series, presenting eight recent case studies covering how the R Validation Hub guidance is being put into practice across many of our industry partners. These case studies covered building a Gxp framework with R: 

Thank you, Andy, for the contributions and leadership you brought to the R Validation Hub! 🏅

Interested in supporting the R Validation Hub with its communication workstream? We need volunteers to help us with improving consistency with branding, communication channels, and year-long planning, contact us:

Call To Action for the Community 👋

During the community meeting, the team had discussions about how people use the riskmetric package for their evaluation processes and what they deem as too high-risk scores. 

💬 Continue the conversation and participate in this riskmetric survey:

Other Topics Covered During the Discussion Rounds 

💡 CDISC data should be the standard for add-on unit testing, a common repo from the R Validation Hub would be very welcome.

💡 Finally, we discussed who reviews R packages, including what is the role of software engineers, statisticians, and clinical experts. Join us in the next meeting to share your thoughts! 

Check out the Meeting Slides

Additional Resources 

Coming in July! 🔥 We Have a 20% off Promo Code – Data Scientists & Data Professionals at New York R Conference

By Announcement, Blog, Events

The R Consortium is sponsoring the New York R Conference, presented by Lander Analytics! This in-person and virtual conference will be running from July 13-14, with workshops (tickets sold separately) from July 11-12. The New York R Conference grew out of the New York Open Statistical Programming Meetup (also known as the New York R Meetup), with currently over 14,000 members. Topics from the meetup include data science, visualization, machine learning, deep learning, and so much more. The New York R Conference is where enthusiasts and data scientist gather! 

Locations for the conference are at Columbia University and FIAF Manhattan.

Will you be attending? Let us know! Use promo code RSTATS20 for 20% off conference & workshop tickets.

The conference gathers data scientists and professionals from all over the world. This year the conference will include a series of talks covering topics like creating beautiful maps, using OpenAI Embeddings API, data-driven approaches to marketing, using data for journalism, and much more!

There will be workshop sessions from July 11-12. Workshops are a way for generating revenue for OS developers, workshops will include:

🎙 Join Jon Krohn with special guest Chris Wiggins, Chief Data Scientist at The New York Times and Associate Professor of Applied Mathematics at Columbia University, for the SuperDataScience live podcast.

To learn more about the agenda, speaker lineup and workshops visit Also, follow @rstatsai on Twitter to stay up to date with all conference details. 

Learnings and Reflection from Case Studies: What is Next for the R Validation Hub?

By Blog, Events

Join us for an update from the R Consortium’s R Validation Hub, which supports the use of R within regulated industries. The community meeting will be on June 27, 12 PM ET / 9 AM PT!


Last year, the R validation hub initiated a three-part presentation series on “case studies.” Eight pharma companies presented their implementation of the risk assessment framework. We briefly summarize common themes and differences in the approaches. For the majority of the meeting, we want to discuss common challenges and brainstorm ideas for possible future projects by the R Validation Hub.

Additional Resources: 

Previous Case Studies:

Experiences Building a GxP framework with R (Part 1): Roche, Novartis, Merck and GSK

Using R in a GxP environment (Part 2)

Using R in a GxP Environment (Part 3)

R/Adoption Series: Learnings and Reflection from R Validation Case Studies

Join the community call! (Microsoft Teams meeting)

Announcing R/Medicine 2023!

By Announcement, Blog, Events

Join us at R/Medicine 2023! The 6th annual conference will be fully virtual from June 5 through 9 and feature two days of workshops followed by a day of demos, a Hackathon, and a poster session. The last two days will be filled with speaking sessions, presentations, and lightning talks. This year’s keynotes include Jeff Leek, Vice President and Chief Data Officer, at Fred Hutch Cancer Center, and Neale Batra, President of Applied Epi

The R/Medicine conference provides a forum for sharing R based tools and approaches used to analyze and gain insights from health data. Conference workshops provide a way to learn and develop your R skills. Midweek demos allow you to try out new R packages and tools, and our hackathon provides an opportunity to learn how to develop new R tools. The conference talks share new packages, and successes in analyzing health, laboratory, and clinical data with R and Shiny with a vigorous ongoing discussion with speakers (with pre-recorded talks) in the chat.

Check out some highlights from the 2022 conference on our YouTube channel!

Here’s a glimpse of the 2023 R/Medicine workshops:

  • Using REDCap and R to rapidly produce biomedical publications
  • R/Medicine 101: Introduction to Clinical Data Analysis with R

🐦 Early Bird Registration is now open until May 5th so sign up for the conference now! We are accepting proposals for 30 minute talks, 30 minute panel discussions, and 10 minute lightning talks. 

📣 Interested in sponsoring R/Medicine? Please take a look at our sponsorship brochure.

2022 Government & Public Sector R Conference

By Blog, Events

We are proud sponsors of the 2022 Government & Public Sector R Conference hosted by Lander Analytics! This year’s conference will take place on December 1st & 2nd with workshops on November 30th! You can attend either in-person at Georgetown University or virtually online from anywhere in the world.

You don’t want to miss out on the fun! You’ll here from speakers, such as:

And many, many more!

Also, you can attend a the full-day interactive workshop on November 30th:

  • Introduction to Natural Language Processing for Public Policy Research with William E. J. Doane

Use code RSTATS20 to receive 20% off conference & workshop tickets!

To learn more about the speaker lineup, workshops, and agenda visit Also, follow @rstatsai on Twitter to stay up to date with all conference details. 

If your organization is interested in being a sponsor, please contact Lander Analytics at

A Survey of Changes around the Tidyverse Package in R

By Blog, Events

Date: Friday, 28th October, 2022 • Time: 4:00pm – 5:00pm (WAT)

Register Here:

Join the Osun RUG, Nigeria at their event on A Survey of Changes around the Tidyverse Package in R with special guest, Chief Scientist at RStudio, Hadley Wickham. The core tidyverse includes the packages that you’re likely to use in everyday data analyses. As of tidyverse 1.3.0, the following packages are included in the core tidyverse:

The tidyverse also includes many other packages with more specialized usage. Attend this webinar for an in-depth discussion with the man who invented the tidyverse itself … Prof. Hadley Wickham.

Speaker: Prof. Hadley Wickham

Hadley is Chief Scientist at RStudio, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website:

Organizers: Osun RUG, Nigeria (