Skip to main content
Category

Blog

A Memorable Experience: Attending the NYC-R Conference and Key Takeaways

By Blog, Events

This is a guest post by Joseph Korszun, Senior Manager of Data Solutions at ProCogia. ProCogia is a member of the R Consortium. Joe is a data scientist with a background in mathematics and engineering. He is passionate about using statistical analysis to improve business decisions by developing scalable and flexible solutions that solve complex problems.

Introduction

New York City, known for its vibrant energy and thriving tech scene, became the epicenter of data and analytics during the recent NYC-R Conference. As an avid data enthusiast, I couldn’t resist the opportunity to immerse myself in this bustling conference and gain valuable insights into the world of R and Python programming.

I had the pleasure of representing ProCogia and the R-Consortium. The experts who stood before us showcased their deep knowledge and dedication to advancing data science. It was remarkable to see people so passionate about the R programming language and its applications in the field. Throughout the event, they engaged the audience with informative presentations and interactive workshops, sparking insightful discussions among attendees. The positive reception from the crowd highlighted the significance of collaboration and knowledge-sharing in the data science community. I was inspired by their expertise and left the conference with a renewed enthusiasm for data science and the possibilities it offers. The event provided a fantastic opportunity to connect with like-minded professionals and learn from the best in the industry. I am grateful for the experience and eagerly look forward to seeing more contributions in the future. In this blog post, I will share my experiences attending the NYC-R Conference and highlight some key takeaways that left a lasting impression.

Pre-Conference: Workshops at NYC-R

The NYC-R Conference’s workshops promise a thrilling exploration of diverse data science topics using the power of R programming. Attendees will embark on an immersive journey to delve into essential areas of data science, including time series forecasting, machine learning, Bayesian data analysis, and causal inference. Industry experts led these workshops offer a unique opportunity to expand data science expertise and harness the true potential of R in driving data-driven innovations. Some of the below workshops were provided in the first two days of the NYC-R Conference:

  • Tidy Time Series and Forecasting in R by Michell O’Hara-Wild
  • Machine Learning in R by Max Kuhn
  • Bayesian Data Analysis and STAN by Jonah Gabry
  • Causal Inference in R by Malcolm Barrett and Lucky D’Agostino McGowan

Day 1: An Exciting Kickoff

The conference commenced with an invigorating address, highlighting the growing significance of R in the industry and the importance of fostering its continued development. The vibrant atmosphere was infectious as I was surrounded by like-minded individuals who shared the same passion for data science.

The day was packed with informative sessions covering a various of topics, from advanced data visualization techniques to machine learning algorithms.

Day 1 of the NYC-R Conference featured diverse and insightful presentations, showcasing the remarkable potential of the R programming language in data science. Attendees explored various topics, including transitioning to Quarto for interactive data reports, building R packages with LLMs, and making impactful design decisions for statistical software visualizations. The presentations also delved into data-driven marketing channel attribution, the power of OpenAI’s Embeddings API, and the art of creating captivating presentations through Slidecraft. Experts from NFL Next Gen Stats revealed the many models powering sports analytics, underscoring the transformative role of data science in the sports industry. The conference left attendees inspired and equipped with valuable skills to drive data- driven innovation in their fields.

The Importance of Continuous Learning

Day 1 of the NYC-R Conference was a remarkable showcase of the importance of continued learning and the incredible potential of the R programming language. As data enthusiasts gathered, the conference provided a platform for exploring various facets of R and its impact on data-driven decision-making.

The NYC-R Conference became a hub of knowledge sharing and collaboration, where data professionals engaged in vibrant discussions and exchanged ideas. This collaborative environment emphasized the significance of staying updated with the latest trends in data science to remain at the forefront of innovation.

Day 2: Unlocking Data Insights through Advanced Analytics

Day 2 of the NYC-R Conference was a captivating journey into the forefront of data science. Attendees were treated to lectures and presentations that showcased the latest advancements in the field. The exploration of Bayesian Boosting revealed its potential for predictive modeling, offering a fresh perspective on data analysis techniques.

In an enlightening presentation, a renowned data science expert delved into the importance of democratizing data access in the session “An Ode to Permissionless Data Science.” This inspiring talk encouraged attendees to foster a more inclusive and collaborative data science community, empowering data professionals to drive innovation together.

Participants were enthralled by demonstrations of LLM use, equipping them with practical skills to build robust R packages. The “How to Make Decisions with Data” session, empowered attendees to derive meaningful insights, ensuring data-driven strategies and informed decision-making.

The day continued with captivating lectures that covered various data science aspects, concluding with a live episode of the SuperDataScience Podcast. The podcast provided invaluable industry insights and sparked engaging discussions, leaving attendees inspired and eager to apply their newfound knowledge in their data-driven endeavors. Day 2 at the NYC-R Conference left participants with a deeper understanding of data science’s evolving landscape, motivating them to make a lasting impact in the dynamic world of data-driven innovation.

Language Wars Still at Large

Wes McKinney, the brilliant mind behind pandas, addressed the ever-lingering “Language Wars” in the data science realm. With a focus on breaking down barriers and fostering interoperability, McKinney unveiled how Apache Arrow and the Python Polars library are revolutionizing the data stack. Attendees were enthralled by McKinney’s insights on harnessing the power of these cutting-edge tools to streamline data operations, improve performance, and enable seamless data exchange across programming languages. As the discussion unfolded, it became evident that the quest for data-driven excellence continues, and the open-source community remains at the forefront of bridging the gap between programming languages for the betterment of data science.

The Power of Community

The conference highlighted the power of community in the world of data science. Interacting with professionals from diverse backgrounds provided fresh perspectives and insights, fostering an environment of collaborative learning and growth. As a sponsor member of the R-Consortium, ProCogia extends its heartfelt gratitude for their invaluable support in making this event possible. Their commitment to advancing the R programming language and data science community has been instrumental in creating a vibrant platform for knowledge sharing and networking. The connections made during the NYC-R Conference are a testament to the strength of this community, forming the foundation for future collaborations and knowledge sharing that will undoubtedly drive data-driven innovations for years to come. ProCogia is proud to be part of this thriving community and looks forward to continuing its involvement in fostering growth and innovation within the R community.

Conclusion

Attending the NYC-R Conference was an exhilarating and enlightening experience. The conference reiterated the widespread adoption of R as a powerful tool in data science. Numerous presenters showcased their impressive projects and highlighted the versatility of R in data analysis, modeling, and visualization. It became evident that R is not just a programming language but an entire ecosystem that supports data-driven decision-making across various domains.

The conference showcased the immense potential of R in data science, emphasized the importance of continuous learning, and highlighted the value of community and collaboration. As I left the conference with a wealth of new knowledge and connections, I felt inspired to apply what I had learned in my own data-driven endeavors. The NYC-R Conference not only expanded my horizons but also reinforced my passion for the exciting world of data science.

Use of R in Non-Profit Social Policy Research in New York

By Blog

Dorota Rizik of the R Ladies New York recently talked to the R Consortium about the diverse R community in New York. She also discussed how her current job at a non-profit organization involves training her colleagues to use R. She shared details of some of the packages they have developed for internal use. 

Dorota has done her Bachelor’s in Psychology from Northwestern University and also completed her Master’s in Applied Statistics from New York University. She currently works as a Technical Research Analyst at MDRC

Please share about your background and involvement with the RUGS group.

My background is in psychology and policy research. I currently work at a nonprofit social policy research organization. I learned R during my Master’s in Applied Statistics and then I joined my current organization. When I joined, MDRC was primarily using SAS and over the past couple of years they’ve been transitioning to R because it’s open source. My role has been to train my colleagues in the R programming language. So I train them on how to use R and also how to do specific data-related tasks like data cleaning, data analysis, or data visualization in R. 

Our focus has been on building both training and guidance for using R for various use cases. We have been working on developing internal functions and packages that staff can use to help automate certain tasks. We have ‌been trying to replace all of the macros and functions we previously developed in SAS. So we have been translating a lot of code from SAS to R. 

It’s just been a wonderful experience to help organize these meetups for R Ladies of New York City. I’ve learned how to organize and communicate, as well as how to adapt to a changing audience and community. So it’s been a very challenging, but rewarding experience.

Can you share what the R community is like in New York?

In our group, we have people from both the private and public sectors. We have someone who works at 1-800-Flowers. We have someone who works in the political space doing analysis for political campaigns or getting voters registered. There are folks that work in media and journalism and the pharmaceutical industry. There are some people who have gotten higher-level degrees. 

It’s just such a wide range, and it’s been very eye-opening to interact with all these people who come from such different backgrounds. It has also been eye-opening in the sense that I have realized that I can really take my skill set and apply it to any industry‌. R programming and programming knowledge ‌in general, as well as data analysis knowledge can be used in many different industries. 

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

We are currently wrapping up two major packages, one focuses on analysis, and the other focuses on tabling. The packages are for internal use and not publicly available. Our staff prefers having all the analysis results in one place, so we have been working on that. We have different basic statistical functions like linear regression, chi-square, etc., but we have written them together so that they can produce a nice table of results. The package focused on tabling will interact with the analysis package to create a table of results.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

Recently there’s been more talk about ethics and reproducibility. AI has made a huge impact and has been a major consideration for people within my company as we are talking about how to train people in R programming. We want to be mindful that some folks will probably use AI for coding help, but it doesn’t necessarily give you the most efficient answer. So, a major trend in our meetups has been AI and the ethical considerations of relying on AI for developing your code. 

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?   

We use Zoom for our meetups and Zoom has been incredibly helpful in the sense that it has allowed us to tap into a wider audience. I’ve noticed we have people who don’t live very close to New York City but are still in the relative area and they join our meetups. So that’s been a significant benefit. 

The only downside has been that we get some uninvited people joining and disrupting our meetups. So I would suggest that groups hosting online meetups try to come up with ways to minimize the possibility of strangers joining their meetups. We’ve taken the approach of having a separate sign-up form where you have to provide your first and last name and your email. We’ve also tried not sharing the Zoom link until the last minute before the meetup. So a mix of those two approaches has been helpful.

I’ve noticed more of an interest in physical meetups now. We’ve been online for a while at this point. So we’re trying to find the right balance of which meetups are more appropriate for physical versus online.


Get Involved!

The 2023 RUGS Program is currently taking applications and will close at midnight PST on September 30, 2023. 

These grants do not include support for software development or technical projects. Grants to support the R ecosystem’s technical infrastructure are awarded and administered through the ISC Grant Program which issues a call for proposals two times each year.

R Consortium Funded Project Extendr Provides Rust Extensions for R

By Blog

Andy Thomason, code performance consultant and lecturer at the University of London covering programming, physics and AI courses focused on game development, created an open source project to add Rust’s performance, reliability, and productivity to R. Andy created the Extendr package, a safe and user-friendly R extension interface for using Rust. The project was supported by a grant from the R Consortium.

Extendr is a toolkit for building extensions to R using Rust. R users can add high performance features to R without having to worry about crashes due to segmentation faults and other problems associated with C and C++. Extendr manages the lifetime of wrapped R objects so that they are available for garbage collection when they reach the end of their life.

Rust is extremely user-friendly and has been voted one of the most loved languages on Stack Overflow for several years. 

ExtendR is available on crates.io.


By joining R with Rust, Extendr makes it possible to write packages and small functions in Rust. What problem are you solving by doing this?

The integration of R and Rust through Extendr addresses a specific challenge encountered by individuals with an R background when it comes to writing code in languages like C and C++. While R users often face difficulties when working with C or C++ code, Rust offers a more accessible and streamlined alternative, focused on multi-threaded programs. The package is meant to help simplify the process from R to Rust. 

What sort of use cases provided by R users helped with the design requirements?

I originally started working with a biotech start-up in Oxford, where most of their business logic was written in R. They were working on solving tricky problems with Bayesian Inference in particular. Bayesian inference is a method of statistical inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available. I started coding C and C++ extensions to R to solve these problems using improved exponential functions.

For Bayesian Inference, you work in log space. Initially, I was using C++ but later the company I was working for started using Rust instead, and I started liking Rust. Using Rust, I was much more productive than with C++.

My interest then got me to study R internals to the point I could start writing something like this Extendr package. More people joined the community, brought things to it, and it has grown since!

How far along are you on the project? 

It works! We have a strong team committing every day (andy-thomason, clauswilke, yutannihilation, multimeric, CGMossa, Ilia-Kosenkov). There are regular contributors to the project, and they work constantly to improve Extendr. 

As yet, we don’t have a large number of applications. It would be great if more people knew about and were involved in using this useful language. 

If you want to consider using or contributing to Extendr, please contact us through Discord.

How did you get involved?

Having dedicated several years to working with Sony and the PlayStation consoles, I gained valuable experience in compiler development. Interestingly, my spouse, who is also a scientist, uses R. She often collaborates with colleagues learning to use Rust. Furthermore, we have friends who use R for drug trial designs, with a shared objective of cost reduction – drug trials are notoriously expensive. 

Also of this means I looked into both programming languages, R and Rust. By utilizing R, significant time and financial resources can be saved. It inspired us to set out to build a bridge between Rust and R, leveraging the strengths of each. 

Rust excels when dealing with vast amounts of data, and it proved to be exceptional for storing and retrieving multi-terabyte datasets. Rust acted as a powerhouse that facilitates data processing, while R empowered statisticians to draw meaningful conclusions and perform clustering analyses. In this symbiotic relationship, R served as the cognitive component, and Rust as the practical execution framework. The meticulous engineering nature of Rust ensured that everything aligned flawlessly, allowing R to seamlessly integrate and thrive in this cooperative ecosystem.

What are the next steps with the project?

Our contributors Hiroaki Yutani and Claus Wilke have put quite a lot of effort into plotting – there are also plans for 3D plotting. Ilia Kosenkov in Helsinki is the current maintainer and the project is very active. We also hope to release documentation in book form. 

What do you do for your day job?

I am a consultant for a number of different industries. I also teach Rust programming to people all over the world like employees of Google and Fastly. Recently I have been consulting with a blockchain company about data organization and data-related problems. They use R for analyzing stock prices. I am also writing a book on Rust performance – making Rust go fast. 

What non-project hobbies do you have?

Outside of my projects, I enjoy spending time with my kids, who provide plenty of entertainment. I also have a fascination with Soviet brutalist architecture, which I share with my son. We also find pleasure in exploring areas on the Thames near Oxford.

What was your experience working with the R Consortium? Would you recommend applying for a grant to others?

The process of working with the R Consortium was brilliant! It was a great project to dive into during my transition from employee to consultant. Some tasks were structured, while others allowed for experimentation and refining the project’s structure. Over the past year, we’ve seen increased interest, especially in the collaboration between Rust and R. 

The project has a vibrant Discord group (Extendr – R Adventures in Rust) where scientists seek help and support. 

I highly recommend others apply for grants through the R Consortium to contribute to meaningful projects and join a supportive community.


About ISC Funded Projects

A major goal of the R Consortium is to strengthen and improve the infrastructure supporting the R Ecosystem. We seek to accomplish this by funding projects that will improve both technical infrastructure and social infrastructure. 

Change in R Consortium Leadership – Thank you, Joseph Rickert!

By Announcement, Blog

After serving as chair of the board for the R Consortium and being involved in multiple R Consortium committees both on the technical and community development sides, as well as participating directly in R-focused events, webinars, and countless other R activities, Joseph Rickert is stepping away from the position. The board will be conducting elections to decide the next chair.

Joseph has been with the R Consortium since it was first conceived in 2014.  His initial role was as Community Officer. In 2018, he began to also serve as Secretary. In 2019, he was elected chair of the board. 

Thank you for your tireless interest in promoting R and supporting developers and user groups around the world who are working to improve the R programming language.

Thank you, Joseph!

R Validation Hub Community Meeting – June Recap ↺

By Blog, Events

The R Validation Hub recently had its community meeting after a brief hiatus. The team discussed announcements, common challenges, and brainstormed ideas for possible future projects through the R Validation Hub. Here are some highlights of their meeting. Stay tuned for the next monthly community meeting, dates to be announced!

📢 Announcements

We have some announcements about R Val Hub leadership and structure! Doug Kelkhoff, Principal Data Scientist / Statistical Software Engineer at Roche, will be taking over the R Validation lead role from Andy Nicholls.

Doug Kelkhoff has been supporting the adoption of R in the pharmaceutical space for the past 6 years. During his time at Roche, he has pushed the adoption of R through pilot clinical trials, showcasing the benefits of using R by crafting internal tools and building services that embed the R Validation Hub’s guidance as part of our software development lifecycle. He is passionate about making the use of open source tools in a regulated setting a viable path not just for large pharmaceutical companies, but for lean startups and the public sector by addressing challenges through open initiatives. We welcome Doug as the new lead!

Andy Nicholls, Senior Director, Head of Data Science at GSK, has been the lead of the R Validation team for over four years. He has greatly contributed to the working group, including his work with the R Adoption Series, presenting eight recent case studies covering how the R Validation Hub guidance is being put into practice across many of our industry partners. These case studies covered building a Gxp framework with R: https://www.pharmar.org/casestudies/ 

Thank you, Andy, for the contributions and leadership you brought to the R Validation Hub! 🏅

Interested in supporting the R Validation Hub with its communication workstream? We need volunteers to help us with improving consistency with branding, communication channels, and year-long planning, contact us: https://www.pharmar.org/contact/

Call To Action for the Community 👋

During the community meeting, the team had discussions about how people use the riskmetric package for their evaluation processes and what they deem as too high-risk scores. 

💬 Continue the conversation and participate in this riskmetric survey: https://bit.ly/risk_survey

Other Topics Covered During the Discussion Rounds 

💡 CDISC data should be the standard for add-on unit testing, a common repo from the R Validation Hub would be very welcome.

💡 Finally, we discussed who reviews R packages, including what is the role of software engineers, statisticians, and clinical experts. Join us in the next meeting to share your thoughts! 

Check out the Meeting Slides

Additional Resources 

R for Predictive Modeling and Data Visualization in Turkey

By Blog

Mustafa Cavus, organizer of the Eskisehir R User Group, in Turkey, discussed the diverse and thriving R community in Eskisehir. He shared the details of a 4-day event hosted by the group, which covered beginner-level talks and advanced topics for expert users. He also shared some useful techniques for hosting successful events. 

 mi2.ai in Warsaw University of Technology, Poland

Please share your background and involvement with the RUGS group.

I am working as an Assistant Professor at Eskisehir Technical University, Department of Statistics. My teaching work focuses on Machine Learning, Data Visualization, and Statistical Hypothesis Testing using R. Additionally, I am actively researching Explainable AI at mi2.ai.

I have a Ph.D. in Applied Statistics and have gained post-doctoral research experience at the Faculty of Mathematics and Information Science, Warsaw University of Technology. My journey with R started during my undergraduate years, and it has greatly advanced my academic career.

Inspired by the productivity and impact of local R groups and events worldwide, my colleague Ugur Dar and I decided to establish a local R user group in Eskisehir in 2019. Our aim was to foster a vibrant user community within our region.

Can you share what the R community is like in Eskisehir? 

The pioneer R users in Eskisehir primarily come from academia, focusing on teaching and research. This group consists of undergraduate students from various disciplines who seek to enhance their skills in data-related fields, as well as graduate students and researchers engaged in analytical research. In recent years, there has been growing interest from diverse industries, including finance, medicine, and more. The group’s areas of interest encompass data visualization, descriptive analysis, and predictive modeling.

Eskisehir Technical University Data Science Society in Eskisehir, Turkey

You had a Meetup Learn R, can you share more on the topic covered? Why this topic? 

We organized a four-day event on Learn R, which took place on November 13th and 26th, as well as December 17th and 19th. This event was a collaboration with the Eskisehir Technical University Data Science Society. Throughout the event, we delved into various aspects of R, starting from the basics and progressing toward designing user interfaces using R.

On the first day, we commenced with the session “Introduction to R Programming,” covering topics such as understanding the essence of R, effectively handling error messages, and exploring basic data types, data structures, operators, loops, and functions in R. We provided hands-on examples to reinforce learning.

Introduction to R Programming Tutorial by Mustafa Cavus

The subsequent session, “Data Manipulation with R,” focused on the practical applications of the {dplyr} package and its functions for manipulating data.

Once participants had gained a glimpse of R and acquired useful data manipulation skills, we proceeded to advanced topics such as “Data Visualization with {ggplot2}” and “Designing User Interfaces with {Shiny}.” These sessions were led by Ugur, an expert in these fields, who actively applies these techniques in the banking sector at Visbanking.

Data Visualization with {ggplot2} Tutorial by Ugur Dar

Our primary goal in selecting these topics was to enhance the skills of individuals who were new to R or had only a beginner-level understanding. Furthermore, we aimed to provide initial training to meet the increasing demand for human resources in our country, particularly in areas such as data visualization and user interface design.

Who is the target audience for attending this event? 

As you know, the COVID-19 pandemic obligated a shift to online events. Fortunately, organizing such events now allows us to reach a wider audience. Initially, we had planned for this event to be a face-to-face gathering primarily for undergraduate students in Eskisehir. However, in response to numerous requests, we decided to change it to an online format. This modification enables not only participants from Eskisehir but also participants from all across Turkey who are interested in learning R to have easy access to the event.

With the aim of attracting a diverse range of participants, we have carefully prepared comprehensive content that caters to both beginners and those seeking more advanced knowledge. Our curriculum covers fundamental concepts and progressively delves into advanced topics.

Any techniques you recommend using for planning for or during the event? (Github, Zoom, other) Can these techniques be used to make your group more inclusive to people unable to attend physical events in the future?   

Certainly! We now have access to many online platforms and tools for hosting events, and the options available continue to expand rapidly. It is crucial to consider the familiarity of the target audience with the chosen tools to ensure a positive experience.

For our event, we decided to use the widely-used Zoom platform, which is prevalent in Turkey. We also made the training materials available in a public repository on GitHub. Additionally, we uploaded the recordings on YouTube, allowing those who missed the session to catch up.

We think that comprehensive documentation of an event enhances its impact and longevity. The content we developed for this particular event is expected to serve as a valuable resource for individuals interested in learning R for years to come.

Lastly, in order to gauge the effectiveness of an event, we encourage participants to provide feedback using tools such as Google Forms. This valuable feedback will aid us in planning future events more efficiently.

Please share any additional details you would like included in the blog. 

Thanks. I would like to express my gratitude to Ugur for his invaluable support in our R initiatives. Additionally, I extend my appreciation to Gizem Altun, Seyma Gunonu, and Zeynep Afra Sezer for their exceptional organizational contributions.

Mustafa and Ugur in Gdansk, Poland

Our city and country are home to numerous enthusiastic R users, and the opportunity to meet and exchange experiences with them is an integral aspect of R users’ culture. Finally, I extend an invitation to all those who are interested to reach out to us and actively participate in our events. We firmly believe that knowledge becomes more useful when shared. 

Greetings from Eskisehir to the R community! 👋


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

Coming in July! 🔥 We Have a 20% off Promo Code – Data Scientists & Data Professionals at New York R Conference

By Announcement, Blog, Events

The R Consortium is sponsoring the New York R Conference, presented by Lander Analytics! This in-person and virtual conference will be running from July 13-14, with workshops (tickets sold separately) from July 11-12. The New York R Conference grew out of the New York Open Statistical Programming Meetup (also known as the New York R Meetup), with currently over 14,000 members. Topics from the meetup include data science, visualization, machine learning, deep learning, and so much more. The New York R Conference is where enthusiasts and data scientist gather! 


Locations for the conference are at Columbia University and FIAF Manhattan.

Will you be attending? Let us know! Use promo code RSTATS20 for 20% off conference & workshop tickets.

The conference gathers data scientists and professionals from all over the world. This year the conference will include a series of talks covering topics like creating beautiful maps, using OpenAI Embeddings API, data-driven approaches to marketing, using data for journalism, and much more!

There will be workshop sessions from July 11-12. Workshops are a way for generating revenue for OS developers, workshops will include:

🎙 Join Jon Krohn with special guest Chris Wiggins, Chief Data Scientist at The New York Times and Associate Professor of Applied Mathematics at Columbia University, for the SuperDataScience live podcast.

To learn more about the agenda, speaker lineup and workshops visit rstats.ai/nyr. Also, follow @rstatsai on Twitter to stay up to date with all conference details. 

Renewing the R Consortium Census Working Group

By Announcement, Blog

Guest author: Ari Lamstein

Ari Lamstein is a Director of Analytics at MarketBridge, a Marketing Analytics Consultancy.

Back in 2018, when I was most involved in developing choroplethr, Joe Rickert, R Consortium Director, recommended that I submit a proposal to the R Consortium to create a Working Group related to Census data. I was intrigued to see what a group like that could accomplish and submitted a proposal. The proposal was accepted, and the group’s main accomplishment was publishing A Guide to Working with Census Data in R. After publishing the Guide, however, the group went dormant.

Fast forward 5 years, and last month I published a new package related to Census data: zctaCrosswalk. When discussing the package with Joe he mentioned that he recently spoke to some leaders at Census who were interested in revitalizing the R Consortium Census Working Group. He asked me if I was interested in meeting the people who were interested, and of course I said yes!

We just had a preliminary meeting, and several exciting ideas were discussed. One is to write a new version of ‘A Guide to Working with Census Data in R.’ We are still in the early stages of deciding what a new version might look like, but one goal would be to remove links to deprecated software such as American FactFinder. More broadly, the original guide was written as a survey of both Census data and CRAN packages. Perhaps the Guide might be more useful if it was written more as a tutorial to the most popular datasets and packages.

We also discussed ideas for longer-term projects. For example, applying for funding from the R Consortium, and directing that funding towards the development of R packages that might improve the R ecosystem for Census data.

If you have an interest in following our work or participating in our next meeting, please sign up below.

Learnings and Reflection from Case Studies: What is Next for the R Validation Hub?

By Blog, Events

Join us for an update from the R Consortium’s R Validation Hub, which supports the use of R within regulated industries. The community meeting will be on June 27, 12 PM ET / 9 AM PT!

Summary:

Last year, the R validation hub initiated a three-part presentation series on “case studies.” Eight pharma companies presented their implementation of the risk assessment framework. We briefly summarize common themes and differences in the approaches. For the majority of the meeting, we want to discuss common challenges and brainstorm ideas for possible future projects by the R Validation Hub.

Additional Resources: 

Previous Case Studies:

Experiences Building a GxP framework with R (Part 1): Roche, Novartis, Merck and GSK

Using R in a GxP environment (Part 2)

Using R in a GxP Environment (Part 3)

R/Adoption Series: Learnings and Reflection from R Validation Case Studies

Join the community call! (Microsoft Teams meeting)

Use of R for Meta-Research in Zürich

By Blog

The R Consortium recently talked to Rachel Heyard of the Zürich R User Group to discuss the vibrant R community in Zürich. The group collaborates with different companies in Zürich to host events, providing network opportunities for the R community.


Rachel currently works as a post-doctoral researcher at the Center of Reproducible Science at the University of Zürich. She uses R for her work and for teaching in her course on good research practices. 


Please share about your background and involvement with the RUGS group.

I received my Masters in Statistics from the University of Strasbourg, France, and my Ph.D. in Biostatistics at the University of Zürich, Switzerland. I started using R during my Masters for different projects and assignments. During my Ph.D. I became more proficient in it and also wrote my first packages. After my Ph.D., I left academia and worked for the Swiss National Science Foundation in Bern as a statistician. 

I used R to analyze data on how the Swiss National Science Foundation is distributing funding to research projects and got to also do some research but this time research on research. I got very interested in what we call meta-science and decided to go back to academia. I have been working on my postdoc since October 2021 a postdoc at the Center for Reproducible Science at the University of Zürich. I do a lot of teaching: I am teaching good research practices to Ph.D. students and postdocs from different disciplines.

I joined the R User Group organizing committee at the end of my Ph.D. We were very regular before Covid and had meetings every six weeks to two months. Different companies hosted our events, and we had nice aperos that were sponsored. We got a little stuck during Covid and decided against hosting online events. We felt that there were a lot of online events happening and we couldn’t add much to it. Our meetups are more about the networking and community aspect and less about the talks. We also have this setup where people can pitch a job or people who are looking for a job could pitch themselves. It really brings people together at the apero. We felt that this community aspect would be missing in online meetups. 

After things settled down a bit, we tried organizing a few meetups, but we struggled to gain momentum in the team. However,  now we have planned three sponsored events for this year. One in July, one in September, and one in November. We are hopeful that it will work out and we will gain momentum again with the meetups. 

Can you share what the R community is like in Zürich?

It is actually very diverse, and we have a lot of people from different domains. There are many from academia as we have two big universities in Zürich, the University of Zürich and ETH.  Data journalists are attending, as well as people from re-insurance, official statistics, pharma, and other fields. 

What industry are you currently in? How do you use R in your work?

I currently work in academia and I teach a block course on Good Research Practices. We teach the steps of good and reproducible research including starting with a study protocol and then registering this study protocol. We also teach them all kinds of steps to avoid questionable research practices like p-hacking. 

Part of this course is two hours on dynamic reporting with R as well as another hour on how to use Git for version control. The course is very interesting because participants are early career researchers doing their Ph.D. or postdoc. They all come from very different fields and some already have some experience in R while others are using it for the first time. So you never know what’s gonna happen in the course. Sometimes it is easier to teach the course because the participants are familiar with R. Other times it’s really difficult because they are all coming from different backgrounds and some have never used R before. So you have to be very dynamic in teaching and adapt to the level of the participants. The goal is not to make people who’ve never used R or R Markdown before proficient in R. It is to have them experience it once and see the benefit. Maybe get them interested in it, hoping that they use it in the future or learn more about it. 

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

I need to follow the latest trends more often. For example, I am still teaching R Markdown but we have been thinking about switching to Quarto. That might be a big development because some of our participants are Python users. So for them, it might be good to use Quarto because it’s easier to use with Python. 

When I did my Ph.D. I was a base R user and at the university, people are still using a lot of base R. When I left academia to join the Swiss National Science Foundation, I started using tidyverse and also got proficient in it. It is so much easier to handle messy data and do data processing with tidyverse. 

Now that I am teaching R, I also teach tidyverse. For people who are not statisticians like me and have to do a lot of data handling, tidyverse is much easier to get into as compared to base R. That’s not very recent anymore but for me, it’s one of the biggest changes I saw for my personal work. 

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future? 

I think Slack is a really great tool for organizers. It’s an asynchronous conversation that you can always go back to. We also collect ideas for speakers in a Google doc. Whenever we talk to somebody who could be interested in giving a talk, we quickly write down their name and contact details. So whenever we need a speakers we go back to this document. 

I also feel that it is very productive for the organizers to meet in person and have a drink or coffee together and discuss. Because it feels more urgent and is great for discussing future meetups. While Slack is great at times we might forget about conversations when we get busy. 


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!