Jan 16

Satellite Data with R: Unveiling Earth’s Surface Using the ICESat2R Package

By R Consortium Blog

The R Consortium recently connected with Lampros Sp. Mouselimis, the creator of the ICESat2R package, discussing the ICESat-2 mission, a significant initiative in understanding the Earth’s surface dynamics. This NASA mission, utilizing the Advanced Topographic Laser Altimeter System (ATLAS), provides in-depth altimetry data, capturing Earth’s topography with unparalleled precision.

Mouselimis’ contribution, the ICESat2R package, is an R-based tool designed to streamline the analysis of ICESat-2 data. It simplifies accessing, processing, and visualizing the vast datasets generated by ATLAS, which emits 10,000 laser pulses per second to measure aspects like ice sheet elevation, sea ice thickness, and global vegetation biomass. This package enables users to analyze complex environmental changes such as ice-sheet elevation change, sea-ice freeboard, and vegetation canopy height more efficiently and accurately. The R Consortium funded this project.

Lampros Sp. Mouselimis is an experienced Data Analyst and Programmer who holds a degree in Business Administration and has received post-graduate training in Data Processing, Analysis, and Programming. His preferred programming language is R, but he can also work with Python and C++. As an open-source developer, you can find his work on GitHub With over a decade of experience in data processing using programming, he mainly works as a freelancer and runs his own business, Monopteryx, based in Greece. Outside of work, Lampros enjoys swimming, cycling, running, and tennis. He also takes care of two small agricultural fields that are partly filled with olive trees.

You built an R package called ICESat2R using the ICESat-2 satellite. Do you consider your ICESat2R project a success?

ICESat-2 R has 7,252 downloads, which, considering the smaller group of researchers who focus on using ICESat-2 data, qualifies it as a popular tool. It’s not as popular compared to some other remote sensing packages, but I believe it’s been a success based on two main points:

Contribution to the R users community: I hope that the R programmers who use the IceSat2R R package are now able to process altimetry data without any issues, and, if any, then I’ll be able to resolve these by updating the code in the GitHub and CRAN repositories.

Personal and Professional achievement: I applied for a grant to the R consortium, and my application was accepted. Moreover, I implemented the code by following the milestone timelines. Seeing a project through and providing it publicly is a success, I believe.

Who uses ICESat2R, and what are the main benefits? Any unique benefits compared to the Python and Julia interfaces?

The users of the ICESat2R package can be professionals, researchers, or R programming users in general. I assume that these users could be:

Ice scientists, ecologists, and hydrologists (to name a few) who would be interested in the altimeter data to perform their research
Public authorities or military personnel, who, for instance, would like to process data related to high-risk events such as floods
Policy and decision-makers (the ICESat-2 data can be used, for instance, in resource management)
R users that would like to “get their hands dirty” with altimeter data

I am aware of the Python and Julia interfaces, and to tell the truth, I looked at the authors’ code bases before implementing the code, mainly because I wanted to find out the exact source they used to download the ICESat-2 data.

Based on the current implementation, I would say that the benefits of the ICESat2R package are the following:

The R programming users can use NASA’s OpenAltimetry interface, which, as of December 2023, doesn’t require any credentials
The R package includes 3 Vignettes (Articles) and detailed documentation (Reference) for the implemented code

What is an interesting example of using ICESat2R?

There are many examples where the ICESat2R package can be used. For instance, a potential use case would be to display differences between a Digital Elevation Model (Copernicus DEM) and land-ice-height ‘ICESat-2’ measurements. The next image shows the ICESat-2 land-ice-height in winter (green) and summer (orange) compared to a DEM,

*From the package Vignette: ‘ICESat-2 Atlas Products’*

More detailed explanations related to this use case exist in the Vignette ICESat-2 Atlas Products of the package.

Were there any issues using OpenAltimetry API (the “cyberinfrastructure platform for discovery, access, and visualization of data from NASA’s ICESat-2 mission”)? (NOTE: Currently, the OpenAltimetry API website appears to be down?)

At the beginning of October 2023, I was informed that the OpenAltimetry website (previously https://openaltimetry.org/) has migrated to https://openaltimetry.earthdatacloud.nasa.gov/. I then contacted the support of the National Snow & Ice Data Center, which informed me about the migration of the API interface.

Currently, I have an open issue in my Github repo related to this migration. Once the OpenAltimetry API becomes functional again, I’ll submit the updated version of the ICESat2R package to CRAN.

In your blog post for the copernicusDEM package, you showed a code snippet showing how it loads files, iterates over the files, and uses a for-loop to grab all the data. Can you provide something similar for ICESat2R?

Whenever I submit an R package to CRAN, I include one (or more) vignettes that explain the package’s functionality. Once the package is accepted, I also upload one of the vignettes to my personal blog. This was the case for the CopernicusDEM R package,

but also for the ICESat2R package,

The current version of IceSat2R on CRAN (https://CRAN.R-project.org/package=IceSat2R) is 1.04. Are you still actively supporting IceSat2R? Are you planning to add any major features?

Yes, I still actively support IceSat2R. I always respond to issues related to the package and fix potential bugs or errors. The NEWS page of the package includes the updates since the first upload of the code base to Github.

I don’t plan to add any new features in the near future, but I’m open to pull requests in the Github repository if a user would like to include new functionality that could benefit the R programming community.

About ISC Funded Projects

A major goal of the R Consortium is to strengthen and improve the infrastructure supporting the R Ecosystem. We seek to accomplish this by funding projects that will improve both technical infrastructure and social infrastructure.

Learn more

Jan 15

Love0

Rebooting the Warsaw R Community in 2024: Insights from Kamil Sijko’s Journey and Future Aspirations

By R Consortium Blog

Recently, Kamil Sijko of the Warsaw R User Group discussed with the R Consortium his transition from academia to leading data science in the business sector. He noted the current dormancy of Warsaw’s R community and the eagerness to revive its dynamic, pre-COVID meetups. The group’s latest meeting explored new, interactive formats to engage its diverse membership better.

Please share about your background and involvement with the RUGS group.

During my early academic years at the University of Social Sciences in Warsaw, we explored several interesting projects, one of which was ‘webiR’ in 2009. This project was an attempt to blend R’s capabilities with web application development, which was not very common at the time. We developed webiR a few years before the advent of Shiny in 2012 with the idea of making R more accessible to non-technical users.

While webiR might not be widely remembered today, unlike the widely successful Shiny, it represented our early efforts to simplify data analysis. The application allowed users to choose survey questions they were interested in, and then it would automatically select suitable analyses through a set of heuristics. This approach aimed to eliminate the need for users to understand the underlying R functions, making data analysis more approachable.

Although webiR wasn’t a major success, it was a valuable learning experience and a stepping stone in exploring how R could be used innovatively, especially in web development. These kinds of exploratory projects contribute to the ongoing evolution and versatility of R, which we continue to see today.

Later, I transitioned to working at research institutes, including a government-funded Polish Educational Research Institute. Now, I’m in the business sector. I serve as the Head of Data Science at Transition Technologies Science, a company that operates in the medical industry. We collaborate with pharmaceutical companies, universities, and medical scientists. My role involves leveraging data science in various aspects of the medical field.

Can you share what the R community is like in Warsaw, Poland?

The situation is dormant, but it’s good timing for a reboot. There have been no revised activities since the pandemic ended. Before COVID, though, this was a hot topic of discussion. There were frequent meetups, including Python and data science gatherings. These meetups were unique, and I found them slightly unconventional in a good way. For example, Python meetups often focused on deep learning and applications in risk management or insurance.

But with R meetups, there was a broader spectrum of topics, often venturing far beyond conventional subjects. I found this diversity particularly refreshing, especially as many academics were involved, exploring a wide range of innovative applications.

One of the things that stood out was the involvement of women from the Warsaw University of Technology, who ran the ‘R Ladies’ in Warsaw. They organized numerous workshops, which were quite popular. These workshops offered an accessible entry point into data science for those looking to change careers. One interesting observation made was that R is often seen as more approachable as a first language for newcomers from different backgrounds.

We also have a strong scientific group in Warsaw led by Professor Biecek, a fervent advocate of R and leader of MI2.AI. His work in Explainable AI is cutting-edge, making us feel connected to a vibrant local scene. Another point raised was the curiosity about local technological developments, not just the global cutting-edge advancements.

I recall an initiative named ‘PoweR’ – a three-week crash course in data science that attracted about 500 participants. I didn’t participate myself, but it was impressive. Also, the fields of science like medicine, statistics, econometrics, spatial sciences, and humanities were highlighted. R is extremely popular in these areas, allowing for exploration of unique and diverse topics.

It’s clear there’s a strong desire to revive these meetups and initiatives, as they foster a unique learning environment and community spirit.

You had a Meetup on December 11th, 2023. Can you share more on the topic covered? Why this topic?

In our recent meeting, we deviated from the usual format of workshops and lectures, opting for a more unique approach that we may not repeat. Instead, we engaged in a peer-to-peer discussion, which was feasible due to the small number of attendees. We focused on two main topics. The first was understanding what people miss most about our meetings, as I aim to incorporate these elements when I reboot them. The second topic was exploring future directions for our meetings.

We delved into the different types of participants attending our meetings. One group comprises those familiar with R and eager to learn about advanced techniques, for whom lectures are ideal. Another group includes individuals transitioning from other fields to data science. We also considered students, particularly those favoring Python over R, and I believe it’s important to dispel any misconceptions about career prospects in R.

Additionally, we discussed members of the open source community around Warsaw, recognizing their contributions during events like hackathons. Another interesting aspect was the companies’ involvement, not just in recruitment but also in sharing their work with the community.

An unaddressed yet intriguing aspect was attendees transitioning within the data science field, seeking insights into new companies and trends. I also want to focus more on social interactions beyond just having pizza and experiment with ideas like speed dating or extended interactions with lecture presenters.

Lastly, we considered the language of our meetings. Operating in Poland, we debated whether to conduct some sessions in English, stream them, or post them on YouTube to reach a broader audience. I’m excited to experiment with these ideas, which could significantly enhance our meetings.

Who is the target audience for attending this event?

Up to this point, our focus has primarily been on individuals who are already interested in R and seeking to deepen their knowledge with expert insights. That’s been our main audience. The other significant group consists of those completely new to the field who are looking to be introduced to data science through R. These are the two main types of participants we usually have.

We aim to be more inclusive; of course, there’s the ‘R Ladies’ initiative. The ‘R Ladies’ essentially engage in the same activities as the rest of our groups, but they cater to a different audience. The content and structure of their sessions are similar to what we offer to other participants. Still, they focus on creating an inclusive environment for women interested in data science and R.

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?

There were various opinions, but one perspective really resonated with me. COVID took away our in-person meetups, and although there was an attempt to transition them to a virtual environment, it wasn’t the same. We miss face-to-face interactions and being in the same physical space together. That’s something special.

There were instances where, despite people already gathering in the room, we had to announce that the expert wouldn’t be able to come and would instead join via Zoom. This often led to disappointment, with some attendees leaving the room immediately, as they weren’t interested in a virtual presentation. After all, there’s plenty of similar material available online.

One comment struck me: even though we could have experts from RStudio (now posit) or other places speak to us from across the ocean about their latest developments, this information is already accessible on platforms like YouTube. The experience is likely to be similar. In terms of using Zoom or similar virtual platforms, we’re leaning towards not pursuing that path for future meetups.

We would like to get to know you more on the personal side. Can you please tell me about yourself? For example, hobbies/interests or anything you want to share about yourself.

International Volunteer Day at CoderDojo

A fun fact about me is my deep involvement in an initiative focused on teaching children creative computer skills. I’ve found it incredibly rewarding to help kids learn how to use technology creatively. It’s a lot of fun, both for me and the children. For instance, I recently prepared workshops on creating Electronic Dance Music (EDM). These workshops cover aspects like sampling and looping. I find this work enjoyable and immensely fulfilling, as it combines my passion for technology with the joy of teaching and engaging with children.

Additionally, in my work with CoderDojo, I’ve had the opportunity to engage children in programming projects, including a special focus on encouraging a group of girls. We utilized ‘Kodu Game Lab‘ for these sessions, a platform that offers a more immersive, video game-like environment for coding. This platform enabled the children to learn programming concepts in a playful manner, such as coding a robot to follow or avoid objects and even creating their own simple games.

A key moment came when the girls highlighted a significant limitation: the lack of relatable characters in the games, noting the predominance of robots and other figures but a conspicuous absence of princesses or characters they could identify with. This feedback was invaluable and led us to adapt our approach. We creatively worked around this limitation by incorporating an object—a ‘tag’—which we collectively imagined as a princess needing rescue. This improvisation turned into a unique game by the end of the day.

This experience was not just fun but also enlightening, underscoring the importance of CoderDojo’s approach in offering unique insights into how different groups perceive technology. It highlighted the need to understand and address diverse perspectives and requirements in technology, especially when introducing young minds to the world of programming.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Learn More

Jan 12

Love0

From Vision to Action: Pfizer’s R Adoption Odyssey – Join the Webinar on February 8, 2024

By R Consortium Blog, Events

In the rapidly evolving sphere of pharmaceutical data analysis, a significant transition is taking place – the shift from traditional SAS to the versatile R programming language. Pfizer, a trailblazer in the pharmaceutical industry, is leading this change. We are excited to invite you to an exclusive webinar that will cover details about how Pfizer has succeeded and what the benefits are: “From Vision to Action: The Pfizer R Center of Excellence-led Journey to R Adoption,” scheduled for February 8th, 2024, at 3:00 pm ET.

Register Now!

Pfizer’s Progressive Shift to R:

At the heart of Pfizer’s data analysis revolution is the adoption of R – a language known for its robust community-driven development and open-source nature. This move is not just about changing tools; it’s about embracing a culture of innovation and collaboration.

The journey began with an internal query at Pfizer: How many of our colleagues are proficient in R? The answer led to the unveiling of a latent community of R users, eager yet unconnected. In 2022, an internal survey highlighted the presence of over 1,500 R users, a clear sign of a burgeoning community within Pfizer.

In response, Pfizer established the R Center of Excellence (CoE) in 2022. This initiative marked a shift from scattered individual efforts to a cohesive, strategic approach to R adoption. The CoE, celebrating its first anniversary in 2023, has become a linchpin in nurturing Pfizer’s vibrant R community.

Webinar Highlights:

This upcoming webinar, hosted by the R consortium, is more than just a case study. It’s a treasure trove of insights for fostering an engaged R community. The session will cover:

Pfizer’s journey in building a robust R community.
Practical strategies applicable across various industries.
Understanding the critical role of an engaged R community in data analysis.

Join Us for the Webinar:

This is an unmissable opportunity for anyone interested in data science, R programming, or community building within large organizations. By attending this webinar, you will gain firsthand insights into how Pfizer successfully integrated R into its data analysis practices and how you can apply these learnings to your organization.

Don’t miss this opportunity to learn from Pfizer’s experience and expertise. Register now for the webinar on February 8, 2024, at 3:00 pm ET and be a part of the conversation shaping the future of pharmaceutical data analysis.

Register Now!

Jan 10

Love0

R for Public Health Data Analysis in Karachi, Pakistan

By R Consortium Blog

The Karachi R User Group, Pakistan, hosted its second event, “Unveiling the Power of R Shiny Dashboards,” on December 30, 2023. The R Consortium spoke with Uzair Aslam, the group’s founder, about the challenges of starting an R User Group in a budding R community. He also discussed his data analysis project for studying the health deficiencies experienced by the Pakistani population.

Please share about your background and your involvement in the R Community.

My name is Uzair Aslam, and I did my BSc in Economics and Mathematics from the Institute of Business Administration (IBA), Karachi. I have a keen interest in data science, statistics, and econometrics. After graduating, I co-founded a consulting firm called StatDevs. I work with two developers to develop R and Shiny applications for our clients.

At StatDevs, we solve complex problems using data science solutions and data analytics. R is a core language for us, and we’re experienced in Python, too. However, we are focused on R because of its strengths in data analysis, data visualization, and the development of Shiny applications.

My motivation for starting this group came from watching online events of R user groups in the USA and Europe. I attended the presentations and listened to what R is capable of and how they are bringing R to their communities. I noticed much R activity on that side of the world, but nothing was happening on the Asian side. That is when I wanted to make people realize that they could use R for their data analysis in academia and industry so they can solve more problems.

R User Group Distribution Around the World, from Ben Ubah’s R Community Explorer repo using the meetupr package to query Meetup API

Currently, regarding R users, there is a lack of community concept in Pakistan. Tech communities are not nurtured properly, not built properly, and they are not contained properly.

I contacted the R consortium and shared my story of wanting to establish an R user group as the organizer to promote the language.

Can you share what the R community is like in Pakistan?

I have observed that R is used in academia, but not to the extent it should be. I have seen a couple of professors at IBA and some in Islamabad who use R but also use Stata and Excel for their academic purposes and data analysis. In terms of industry, Power BI and Excel are used extensively. This is because not many people know R’s data analysis and analytics capabilities. The acceptance of R is not realized due to the lack of awareness. Some academic researchers use R but may need more training to get the most out of what R offers them. Karachi R User Group aims to narrow down this gap.

Are there any particular challenges you have faced in organizing this RUG?

Indeed, getting people to participate in this R user group has been a challenge. I held our first meetup myself last month in November, and only 4 or 5 people attended. I prepared for the meetup for about two weeks because I wanted an excellent introduction and everything, but fewer people showed up. Of those five people, one was my co-founder, and two were participating from the US and Brazil sides. There was only one person from Pakistan. This happens when you introduce something new in a place people are unaware of. My job is to continue this effort and tell people about the possibilities and opportunities of data analysis and consulting using R.

As we approach our second meetup, more people are showing interest, and the number is growing daily. I am not active on Instagram and very less active on Twitter. However, I use LinkedIn as my platform to reach people and Facebook. On Facebook, I have joined multiple groups, so I share information about the meetups in these groups. Lately, I have been realizing that I should use Twitter as well because I have seen more people promoting their R events on Twitter.

You have a Meetup on “Unveiling the Power of R Shiny Dashboards,” can you share more on the topic covered? Why this topic?

Currently, we have 100 members in our user group, and the upcoming meetup is titled “Unveiling the Power of R Shiny Dashboards.” Jehangeer Aswani is the speaker for this event. Jehangeer is a professional freelancer on Upwork and is based in Islamabad. Due to his motivation and my idea, we started this R user group. He is one of the people I look to for motivation. He has a bachelor’s degree in Statistics and provides R Shiny consulting services.

This meetup is about the fundamental concepts of R Shiny. One may wonder why R Shiny is relevant when we have Power BI and Excel. Jehangeer will provide a hands-on experience with R Shiny applications. This will help participants understand why R Shiny is a better tool. In addition, this meetup will unlock the potential to transform data into captivating visualizations. Participants will also learn how to build R Shiny dashboards. They will get hands-on experience with a real-world application that can be used to solve a business case.

Please share about a project you are working on or have worked on using the R language. Goal/reason, result, anything interesting, especially related to your industry?

I used R for micro-analysis of the Public Health domain. I collaborated with a consultant in Karachi, Pakistan, named Jaweid Ishaque. We worked on a data analysis project for Indus Hospital and Health Networks, a large network of hospitals. The problem statement of the project was to create a broader understanding of the health deficiencies experienced by the Pakistani population, particularly in Punjab, Sindh, and Balochistan. This was a funded study that we conducted.

I utilized a variety of data sets in this study. One of the data sets was the 2017 census data. Another data set was the Pakistan Social Living Measurement (SLM) 2019-2020 data set. I also used data from the Pakistan Maternal Mortality Survey (PMMS) and the Pakistan Demographic and Health Survey (PDHS). I obtained these data sets from the Pakistan Bureau of Statistics and open sources. I analyzed and explored the exact status of public health delivery and public health care at the country and provincial levels.

I worked as a data analyst on this project. The consultant guided me throughout the study. I summarized and presented the current status of health parameters in terms of mortality, disease, incidence, and prevalence. We also compared these parameters to those of other countries, such as Bangladesh, India, Sri Lanka, and Nepal. With the help of R and its packages, I could extract, process, and clean the data sets from multiple sources using dplyr. I used ggplot to visualize the data. Finally, out of the 141 total districts, I identified the most disadvantaged districts in Pakistan in terms of Public Healthcare Delivery (PHC), Social Living Measurements (SLM), and Incidence Of Diseases (IOD). Our rigorous analysis narrowed the list of disadvantaged districts to around 35 districts in Pakistan. There were eighteen districts in lower Balochistan, ten in Sindh, and seven in Punjab. This study helped Indus Hospital And Health Networks deploy mobile health clinics to remote areas of Pakistan.

I wrote and executed all of the analytical scripts for the data cleaning and analysis of the provided surveys in R. This allowed me to gain an overview and insights into the data, which I then reported to the stakeholders. I presented Indus Hospital Health Networks with a comprehensive overview of our seven to eight months of research. I generated Pakistan’s population parameters in these analyses, including birthplaces, provincial distributions, mortality rates, and stillbirth rates by provinces and districts.

In addition to the above, I have also started offering R training. I delivered an online course on R one year ago titled “R for Economics and Finance.” I instructed over 15 students from IBA and all over Pakistan in this online training course, which was solely based on R.

Students were delighted to learn about the practical applications of their economic and financial models, as they had previously only been taught theoretical courses in Universities. I conducted this training last year and will now conduct several R trainings in industry and academia.

I will be conducting one of these trainings in February. This training will be titled “R for Data Science,” and students and industry professionals will attend it. I have begun working on this training to promote R as much as possible through our efforts.

As my commitment to advancing the use of R in data analysis and data science grows, I express gratitude to the R Consortium for their support on this transformative journey. Envisioning a significant impact on Pakistan, I am dedicated to constructing a vibrant open source community. The fruits of my efforts will manifest as I realize my vision: fostering open source data analytics and collaboration throughout Pakistan.

How do I Join?

Learn more

Jan 08

Love0

Financial Assistance is Available for Your R User Group in 2024!

By R Consortium Announcement, Blog

The R Consortium is excited to open the doors to the 2024 RUGS Program, starting January 8th, 2024! We are committed to supporting R User Groups (RUGS) worldwide in your efforts to organize and share information and strengthen their local communities. We’re now inviting applications for our program.

Apply now!

What’s New in 2024?

This year, the RUGS Program is structured around three distinct categories of grants:

User Group Grants: Tailored for groups seeking support to enhance user experience or develop user-centric projects.
Conference Grants: Ideal for those looking to host or attend conferences that align with the program’s objectives.
Special Projects Grants: Designed for innovative and unconventional projects that require a boost to get off the ground.

For full details and to submit your proposals, visit here. Your contribution can significantly support the ongoing evolution of the R language.

R User Groups: Global Impact

With 74 active R User Groups (RUGS) worldwide and 67,458 members, R communities welcome individuals of all backgrounds and skill levels, from beginners to advanced users.

Community Spotlight

Explore our blog for interviews with R User Group organizers from various industries, offering insights into their experiences and impacts.

Examples of User Group Grants:

R Conference 2023: Malaysia’s Largest Face-to-Face Annual R Conference

Highlights from R-Ladies Paris Hybrid Meetup Empowering Community Outreach

Examples of conferences:

https://rr2023.sciencesconf.org/

https://latin-r.com/

Get Involved: 2024 RUGS Program

The application period for the 2024 RUGS Program opens on January 8th, 2024, and will close at midnight PST on September 30, 2024. Note that these grants do not cover software development or technical projects. For such initiatives, consider the ISC Grant Program, which opens for proposals twice a year. You can learn more about the ISC Grant Program here.

Join Us in Strengthening the R Community

Your participation can significantly contribute to the development and cohesion of the R community. Apply starting January 8th, 2024, and be part of this exciting journey of growth and collaboration!

Apply now!

Jan 04

Love0

Harness the Power of R for Survival Analysis: Join Our {ggsurvfit} Webinar!

By R Consortium Blog, Events

Hello, Data Enthusiasts!

Are you ready to take your data visualization skills to the next level in the fascinating world of survival analysis? We’re thrilled to invite you to our exclusive webinar: “Visualizing Survival Data with the {ggsurvfit} R Package.” Mark your calendars for January 25, 2024, at 7:00 PM Eastern Time (ET)!

Register here!

Why This Webinar is a Must-Attend

The {ggsurvfit} package is designed for both beginners and seasoned data scientists. It streamlines the process of generating publication-quality, time-to-event, or survival analysis graphs. And the best part? It’s built on the backbone of the beloved {ggplot2} package, marrying simplicity with sophistication.

What You Will Learn

Our interactive session will dive deep into how the {ggsurvfit} functions, like add_confidence_interval() and add_risktable(), seamlessly integrate as {ggplot2} ‘geoms.’ This means you can spruce up your plots using the familiar {ggplot2} toolkit sans the headache of mastering new coding syntax.

It’s not just about learning a new tool – it’s about enriching your data storytelling capabilities.

The Perks of Joining

Interactive Learning: Engage with experts and peers in an interactive online setting.
Skill Enhancement: Elevate your data visualization prowess, specifically in survival analysis.
Network Building: Connect with fellow data enthusiasts and professionals from diverse fields.

Register Now!

See you there, where data meets innovation!

Jan 02

Love0

How Inclusivity, Women in STEM, and Trips to the Pub Together Enrich the Manchester R User Group in the UK

By R Consortium Blog

Zac Nash and Abbie Brookes, co-founders of MancR (left), and Jeremy Horne (right), Director of Datacove

The R Consortium recently reached out to Abbie Brookes, Senior Analyst and AI Consultant at Datacove, co-founder and organizer of the Manchester R User Group. During the conversation, Abbie discussed her active participation in the R community and the rapid growth of the community in different parts of the United Kingdom, particularly in Manchester, London, Brighton, and Bristol.

Zac Nash, the other co-founder of Manchester R and a Data Scientist, began his career as a PhD Researcher in Computer Science at Bangor University and in 2022, publishing his research paper ‘Tracking the Fine Scale Movements of Fish using Autonomous Maritime Robotics: A Systematic State of the Art Review’ (Nash et al 2021, viewable here: Tracking the fine scale movements of fish using autonomous maritime robotics: A systematic state of the art review – ScienceDirect) Zac started a job as a Data Scientist at Datacove in 2022, developing machine learning and AI products for many different clients. Zac still actively attends and engages in the Manchester R community, and now works as a Senior Data Engineer at Fresh Egg – a digital marketing consultancy in Worthing, West Sussex. Here he continues to develop new skills within the data space and participate in the R and Python communities, at his home in Manchester and near his work in West Sussex.

Please share your background and involvement with the RUGS group

Abbie Brookes, co-founder and organizer of the Manchester R User Group

I started working for Datacove in the summer of 2022. Datacove is a Data and Analytics Consultancy Team. They work across multiple sectors, specializing in customer and marketing analytics, reporting and visualization techniques, web analytics, R and Python training, and much more! Our company director, Jeremy Horne, who is well-known in the Brighton R community on the south coast, has been deeply involved in the R scene since late 2005. His journey began in London, where he made many friends and connections, inspiring him to create Brighton R.

Zac Nash, far left, presenting at the June 2023 Meetup

When I joined Datacove, I took on the role of co-organizing Brighton R. We operated as a remote company with team members on the south coast and the north of the UK. My ex-colleague Zac, who is based in Manchester, pointed out the lack of significant tech communities and idea-sharing groups. Inspired by our success in Brighton, we established a similar initiative in Manchester. Despite the challenge of the long commute, Zac and I worked together to rejuvenate the R scene in Manchester, starting with our first major event in June 2023, followed by another one in October. Our expansion from Brighton to Manchester highlights the growth and impact of our community-driven R initiatives.

Can you share what the R community is like in Manchester, United Kingdom?

October 2023 Meetup

The Manchester R User Group was considered our largest R community until London R! We consistently attract a lot of attendees, more so than our other groups. It’s an incredibly lively and passionate group, and the level of engagement is just fantastic.

What I find particularly rewarding about the Manchester group is its inclusivity. We see diverse individuals from various backgrounds and minorities in the data field. This diversity is important to me, especially as a woman in a STEM field. Women in tech or data fields are often in the minority, and it’s refreshing to see a different dynamic in Manchester.

Our attendees range from complete beginners who have never engaged with R before to seasoned professionals in the field since the early 2000s. It’s this mix of experience levels that makes our meetups so enriching.

Knife dancers at Manchester R User Group October 2023

We’ve also been fortunate to receive strong support from local companies. A tech recruitment firm, Better Placed, sponsored our last event in October and is sponsoring our next one in March. They provide an amazing venue with a stunning bar adorned with palm trees, serving champagne, craft beers, and cocktails, plus the usual pizzas and chips because everyone loves that at a meetup.

After our events, we often take the group to a pub, adding to the experience as a community. Manchester’s nightlife is vibrant, with many young professionals and great bars. Our last meetup even had impromptu dancers at the bar, making it a unique and quirky experience. It’s a fun and unusual meetup, but that’s what makes it so memorable and enjoyable.

You have a Meetup on Manchester R Meetup in March 2024. Can you share more on the topic covered? Why this topic?

Our next event is scheduled for the 14th of March 2024 (Manchester R Meetup! March 2024). Unfortunately, I can’t share much information about the speakers yet, you’ll just have to join the group to find out! We typically announce the speakers about one to two months before the event, but I assure you, we have a great lineup planned. We’ve hosted many interesting talks in the past.

For instance, we had a fascinating geospatial talk at our last event in October. We’ve also had presentations from people within our company, including our director. Additionally, we’ve had speakers from Posit and various maintainers of R packages. One notable talk was about the ‘Arrow’ package, which was fantastic.

So, for now, we’ll just have to wait and see. We will inform the R Consortium when everything is confirmed and ready to be announced.

Who is the target audience for attending this event?

Our target audience is incredibly broad and inclusive. To join our events, all you need is an interest in data and a desire to enjoy time with others who share similar passions. We welcome absolutely anyone, regardless of their career stage or academic background. Whether you’re a complete novice in the field or possess extensive knowledge, our events are designed to cater to all levels.

We carefully curate our talks to ensure they’re enjoyable and accessible, regardless of the attendees’ expertise. We provide introductory talks for beginners, and for those with more experience, we offer content that delves into deeper expertise. Even in our more advanced talks, we include steps on getting involved and starting, ensuring nobody feels left out.

An essential part of our events is the interactive aspect. There’s always an opportunity to ask questions; we often share example code. This approach ensures that everyone, no matter their knowledge level, feels welcomed and can benefit from our events. We’re dedicated to creating an inclusive environment where everyone can learn, share, and grow.

Of course, I’d happily explain how we accommodate those who can’t attend our events in person. Firstly, we always ensure that our event venues are disability-friendly and accessible. However, we have alternatives for those who prefer not to attend in person.

For our Brighton events, which have been running for over three years, we livestream them on YouTube every time. This allows anyone, regardless of location or ability to attend physically, to participate in our events. We’re planning to implement the same for our Manchester events soon. It’s still in its early stages, so we haven’t set up live streaming yet, but it’s definitely on our agenda. The only challenge is ensuring we have the right technical equipment to provide a high-quality streaming experience.

Additionally, we accommodate speakers who can’t be present physically. Often, speakers join us via Zoom, and we broadcast their presentations to the attendees in the room. We understand that not all speakers are able or willing to come in person, but we still want to include their valuable insights. A good talk can be delivered over Zoom, and we believe in leveraging technology to make our events as inclusive and engaging as possible, regardless of physical presence.

Please share any additional details you would like included in the blog.

Datacove 2022 company photo with Laura (Brighton Py organizer and Data Scientist), Zac (Manchester R co-founder and Senior Data Engineer), Jeremy (BrightonR co-founder and Company Director), and Abbie (Manchester R co-founder and organizer and Senior Analyst and AI Consultant). Other members present: Sarah (director), Nathaniel (Jeremy’s wonderful son!), and Yagyansh.

The most interesting thing happening right now is the expansion of our meetup groups. As I’ve mentioned, it’s becoming complex because we’re growing rapidly. We started with Brighton R, which has been running for over three years. From there, we expanded to Manchester R, which has been active for about half a year.

Our director loves the R community; therefore, we take pride in organizing Brighton R, Brighton Py, and Manchester R – with our new ‘Shiny’ editions of London R and Bristol R to come soon, hopefully! Plus, keep your eyes peeled for some huge announcements coming soon. I can’t reveal it just yet. It’s scheduled for next year, and I can barely contain my excitement about it. Our expansion has a lot of momentum, and seeing our community grow is incredibly exciting.

We would like to get to know you more on the personal side. Can you please tell me about yourself? For example, hobbies/interests or anything you want to share about yourself.

My hobbies and interests initially leaned more towards academia, which is surprising to most people, considering I’m now a senior data analyst. I pursued a psychology degree, primarily fascinated by its research, statistics, and coding aspects. This interest led me to conduct a significant research project during my degree, coincidentally during the pandemic. This timing posed challenges, as all my lectures were online and difficult to navigate.

My research project focused on at-home interventions for managing anxiety related to the COVID-19 pandemic. I used the programming language R extensively for this project. Interestingly, during this project, I realized my preference for data and coding over psychology. As a result, I transitioned into the data field, leaving most of the psychology aspects behind.

My path to data science has been unconventional. Unlike many who may go directly from a mathematics or computer science degree into data, I took a unique and somewhat unexpected route from psychology to data science. It’s an odd path, but I’ve embraced it.

How do I Join?

Learn more

Dec 27

Love0

Salt Lake City R User Group’s Success Story: Blending In-Person and Online Events

By R Consortium Blog

Last year, Julia Silge, co-organizer of the Salt Lake City R User Group discussed the group’s plans to meld in-person and online activities with the R Consortium. This year, Andrew Redd, founder of the group, provided an update on the group’s recent and upcoming events. The group has successfully implemented its plan, with online presentations coupled with in-person networking events. Andrew also discussed his work with the Veterans Affairs and trending topics being discussed at the group’s events.

Andrew is a Biostatistician and works as an Assistant Professor at the University of Utah School of Medicine. He also works as a Research WOC at the Department of Veterans Affairs. Andrew is also an R expert for VINCI.

Please share about your background and involvement with the RUGS group.

I was initially introduced to the R programming language during my time in graduate school at Texas A&M University. I was a member of the statistics department there and was working on my PhD. Even though Texas A&M has a close affiliation with Stata, R was the language of choice for that year. I quickly took to the language, as I already had a background in programming in C. I had also done extensive work in Mathematica during my undergraduate studies and other languages, such as Visual Basic and various other programming languages. I found R to be a pleasant language to use and am a firm supporter of open source software. As a result, I quickly became proficient in the language. I have since made a career out of working with R and have published a few early packages. One thing that brought me early recognition was my NppToR package, which is still available online. I abandoned this package when RStudio became sufficiently developed to fully utilize the capabilities that I relied on with Notepad++ as my primary editor.

I arrived at the University of Utah in 2010 and founded the Utah R Users Group shortly thereafter. The group was originally called the University of Utah and Salt Lake City R Users Group as it was centered around the university and its users. We later expanded beyond the university and now have members from all over the world. Our meetings where we present material are now fully online. We supplement these meetings with social gatherings that are not centered on presentations. Instead, we meet at local venues such as bars or ice cream shops and simply talk about R. These gatherings allow us to meet new people, see what others are doing with R, and network in a more casual setting.

Would you like to tell us about some recent and upcoming events from your group?

The next meetup we have is in January. January is always a great meetup, as we have a tradition of doing lightning talks as our first meeting of the year. Our lightning talk series aims to highlight our local members. We prioritize our local members and give them five minutes to present an interesting project they have completed, such as a cool analysis or a new package. These presentations are low-stress and brief.

Our February event will focus on package development and the latest developments from R Studio and consortium regarding package development and maintenance. As for recent events, we had an event titled “Slide Crafting with Quarto” in November and another meetup titled “Fairness and Machine Learning” in December. We strive to provide a wide range of topics for all levels of our programming.

Any techniques you recommend using for planning for or during the event?

Meetup has been extremely beneficial. It did not exist when we founded the group, or at least I was not aware of it when we first organized the R users’ group. Thanks to the R Consortium grant, which pays for the Meetup page, it has proven to be a very useful tool. We are currently in a hybrid format, with all our presentations being held online. We live stream the recordings to YouTube and on Zoom, which is where we usually host them. It then simulcasts to YouTube, where it is saved. This allows anyone who wishes to do so to view our previous meetings.

We have achieved significant success, and many of our presentations have garnered a considerable number of views. While these views are not viral by YouTube standards, they are a significant number for our community.

However, I must admit that we have always operated our organization in a manner that differs somewhat from other user groups. We have a unique culture here in Utah that makes it easier for us to meet during the day, which I know is not typical of other user groups, which typically meet in the evening. However, we have found a schedule that works for us and have stuck with it. I believe that the most important thing for anyone trying to organize a user group is to find a schedule that works and stick with it.

Please share about a project you are currently working on or have worked on in the past using the R language?

I would like to discuss my work with the Veterans Affairs. The Veterans Health Administration has the VA Informatics Computing Infrastructure (VINCI), a secure remote desktop environment for conducting research. With this infrastructure, we have access to all the VA records. Once we have approval and access, we can use tools such as R, SAS, or Stata, along with other various tools to perform all the data analysis that we need. This is a very useful resource that I have been working with for about 10 years. I am the R expert for VINCI, so I receive a lot of questions regarding R.

Most of the questions asked of us are related to connecting R with databases. In particular, I rely heavily on dbplyr, DBI, and odbc, since the VA is SQL Server based. My compliments to the team behind the DBI and odbc packages, as they have saved me from many difficult situations.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

Everything is trending towards tidyverse and tidy principles. This has been a trend for several years now, with everything trying to be more uniform in the way it is done. This is done by treating data as data, which I really appreciate. It makes it easier to program and extend.

Our group has also had a lot of topics that are not just about R, but also about statistical analysis‌. For example, I will point out the meeting on Fairness in machine learning, which is a very important topic. If you have biases in your data going into a machine learning model, those biases can easily be propagated through the model. Sensitivity to this is something that we should all be aware of. So, we are not only talking about the hows of programming and how to work with R, but also a lot of best practices for programming. Just because something works does not mean it is necessarily the best way to do it. At least in our group, we have been very mindful of these things.

How do I Join?

Learn more

Dec 19

Love0

In-Person Shiny in Production Conference Hosted by North East Data Science Group in Newcastle, UK

By R Consortium Blog

The North East Data Scientists group hosts the annual Shiny in Production Conference. Colin Gillespie shared the details of the event, how it’s grown in its second year, and how planning is proceeding for next year. We talked with Colin a year and a half ago. We wanted to find out about recent activities with the North East Data Science group and ask about the Shiny in Production Conference. Colin also talked about R’s prevailing use in academia and Newcastle’s industry.

Colin holds a PhD in statistics from the University of Strathclyde. He is a senior lecturer at Newcastle University and also CTO of Jumping Rivers. He is the author of the book “Efficient R Programming” published by O’Reilly Media.

Please share about your background and involvement with the RUGS group.

I’m Colin Gillespie, and I’ve been using R for a long time. I started using it in 1999-2000 during my PhD, so I’ve used it for about 23-24 years. After my PhD, I did the usual academic career stuff. Then, in 2016, I co-founded Jumping Rivers, which does R consultancy.

In terms of events, we established the R user group in 2015 or 2016. However, the R user group lasted a few years before we rebranded to North East Data Science, which now covers a mixture of R and other data science topics. Sometimes, the topics are more specific to R, while at other times, they are more general. The benefit of changing the name was that five times more people attended the event. We went from 10 people to perhaps 30 or 40 – a significant increase. These events have been running continuously since 2016. We had a meetup two months ago, and another one is scheduled for January.

Can you share what the R community is like in Newcastle?

Newcastle is a city in the northeast of England. The city has three universities with strong academic involvement in R. All three universities use R for undergraduate teaching. They also use it extensively for postgraduate teaching. In terms of businesses, several government agencies and banks are located in the city, and these organizations also use R. In addition, a few other companies in the city have adopted R.

You hosted the Shiny in Production Conference in October 2023. Why focus on Shiny?

Shiny in Production, 2023 Speakers

The motivation behind starting this conference was that no in-person Shiny conference was being run. There are more generic R and Posit Conferences that are hosted every year. We do a lot of Shiny and did not see an in-person conference with tutorials and talks.

It’s the second time we have hosted this conference. We held it last year in October and held it again this year on the 12th and 13th of October. We have also set dates for Shiny in Production 2024, which will also be in October.

This year, we had a variety of speakers. We had three tutorials: one on Python and Shiny, another on React for Shiny apps, and another on testing Shiny apps. The following day, we had a TOPS, a combination of invited and submitted abstracts. We had over 100 people attend, and they represented a variety of industries, including pharma, banking, insurance, tech startups, academia, and marketing companies.

The talks are recorded, and the recordings are available on YouTube but take a little while before being released. We spend substantial time editing videos and adding subtitles to ensure they are fully accessible. The process takes several weeks to complete.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

For many attendees at the conference, the focus was on improving an existing application. These data scientists had learned a little bit about Shiny and were able to create something quite nice quickly. However, a point is reached where they need to take it to the next level – for example, ensuring that the app deploys consistently and has tests.

One area that we have been working with several companies on is making Shiny apps accessible. This involves adhering to the WCAG 2.1 guidelines, which are guidelines that help to ensure accessibility. For example, the guidelines address whether a screen reader can navigate the app and whether keyboard shortcuts can be used. So, are the colors sensible, and do they have sufficient contrast?

How do I Join?

Learn more

Dec 18

Love0

Scaling R to Use with Enterprise Strength Databases

By R Consortium Blog

This blog is contributed by Mark Hornick, Senior Director, Oracle Machine Learning. Oracle is an R Consortium member.

Hey R users! You rely on R as a powerful language and environment for statistical analysis, data science, and machine learning. You can use R with a database to analyze and manipulate data. R provides various packages and tools for working with databases, allowing you to connect to a database, retrieve data, perform analyses, and even write data back to the database.

Many of you also work with data in—or extracted from—Oracle databases. As data volumes increase, moving data, scaling to those volumes, and deploying R-based solutions in enterprise environments complicates your life, but Oracle Machine Learning for R can simplify it!

Oracle Machine Learning for R 2.0

With OML4R 2.0, we’ve expanded the set of built-in machine learning algorithms you can use from R with Oracle Database 19c and 21c. Now you also have in-database XGBoost on 21c and Neural Networks, Random Forests, and Exponential Smoothing on 19c and above. In-database algorithms allow you to build models and score data at scale without data movement.

In-database algorithms can now give you explanatory prediction details through the R interface – just like in the SQL and Python APIs. Prediction details allow you to see which predictors contribute most to individual predictions. Simply specify the number of top predictors you want and get their names, values, and weights.

> IRIS <- ore.create(iris, table='IRIS')     # create table from data.frame, return proxy object

> MOD  <- ore.odmRF(Species~., IRIS)         # build random forest model, return model proxy object

> summary(MOD)                               # display model summary details

Call:

ore.odmRF(formula = Species ~ ., data = IRIS)

Settings:

                                               value

clas.max.sup.bins                                 32

clas.weights.balanced                            OFF

odms.details                             odms.enable

odms.missing.value.treatment odms.missing.value.auto

odms.random.seed                                   0

odms.sampling                  odms.sampling.disable

prep.auto                                         ON

rfor.num.trees                                    20

rfor.sampling.ratio                               .5

impurity.metric                        impurity.gini

term.max.depth                                    16

term.minpct.node                                 .05

term.minpct.split                                 .1

term.minrec.node                                  10

term.minrec.split                                 20

Importance:

  ATTRIBUTE_NAME ATTRIBUTE_SUBNAME ATTRIBUTE_IMPORTANCE

1   Petal.Length              <NA>           0.65925265

2    Petal.Width              <NA>           0.68436552

3   Sepal.Length              <NA>           0.19704161

4    Sepal.Width              <NA>           0.09617351

> RESULT <- predict(MOD, IRIS, topN.attrs=3) # generate predictions w/details, return proxy object

> head(RESULT,3)                             # view result

  PREDICTION       NAME_1 VALUE_1 WEIGHT_1      NAME_2 VALUE_2 WEIGHT_2       NAME_3 VALUE_3 WEIGHT_3

1     setosa Petal.Length     1.4    6.717 Petal.Width    .200    5.932 Sepal.Length     5.1     .446

2     setosa Petal.Length     1.4    6.717 Petal.Width    .200    5.932 Sepal.Length     4.9     .446

3     setosa Petal.Length     1.3    6.717 Petal.Width     200    5.932 Sepal.Length     4.7     .446

Figure 1: Code to build an in-database Random Forest model using a table proxy object and use it for predictions

With datastores, you can store, retrieve, and manage R and OML4R objects in the database. To simplify object management, you can also easily rename existing datastores and all their contained objects. And you can conveniently drop batches of datastore entries based on name patterns. The same holds with the R script repository for managing R functions – load and drop scripts in bulk by name pattern.

> x <- stats::runif(20)                   # create example R objects

> y <- list(a = 1, b = TRUE, c = 'value')

> z <- ore.push(x)                        # temporary object in the database and return proxy object

> ore.save(x, y, z, name='myDatastore',   # save objects to datastore 'myDatastore' in user's schema

           description = 'my first datastore')

> ds <- ore.datastore()                   # list information about datastores in user's schema

> ore.move(name='myDatastore', newname='myNewDatastore') #rename a datastore

> ore.move(name='myNewDatastore',                        #rename objects within a datastore

           object.names=c('x', 'y'), 

           object.newnames=c('x.new', 'y.new'))

> ore.datastoreSummary(name='myNewDatastore') # display datastore content

A screenshot of a computer code

Description automatically generated

Figure 2: Code illustrating datastore functionality to store and manage R objects in the database

Some background

OML4R lets you tap into the power of Oracle Database for faster, scalable data exploration, transformation, and analysis using familiar R code. You can run parallelized in-database ML algorithms for modeling and inference without moving data around. And you can store R objects and user-defined functions right in the database for easy sharing and hand-off to application developers. You can even run R code from SQL queries and, on Autonomous Database, REST endpoints. OML4R now uses Oracle R Distribution 4.0.5 based on R 4.0.5.

Last December, as part of the R Consortium’s R/Database webinars, we presented Using R at Scale on Database Data. We highlighted the release of OML4R for Oracle Autonomous Database through the built-in notebook environment. But now you can use the same client package to connect to Autonomous Database, Oracle Database, and Oracle Base Database Service, too.

Getting started

Getting started is easy. Download and install the latest OML4R client package in your R environment and use your favorite IDE. If you’re working with Oracle Autonomous Database, you’re good to go. If you’re using Oracle Database or Oracle Base Database Service, install the OML4R server components as well. Links are below!

Get OML4R 2.0 here for use with Oracle Database and Autonomous Database and try OML4R on your data today. We think you’ll like it!

For more information

Online Workshop: Introduction to Oracle Machine Learning for R on Oracle Database
Online Workshop: Oracle Machine Learning Fundamentals on Oracle Autonomous Database Lab 4 on OML4R
Blog: Oracle Machine Learning for R v2.0 now available for Oracle Database
Blog: OML4R for Autonomous Database now available for Oracle Autonomous Database
Documentation: Oracle Machine Learning for R
Documentation: Oracle Machine Learning for R API Reference
Download: Oracle Machine Learning for R
Download: Oracle R Distribution