Paul Stewart of the Moffitt Cancer Center Bio-Data Club talked to the R Consortium about the group’s experience of shifting events online. Paul shared that despite his initial concerns they had a smooth transition to online events. He also shared some of the techniques he uses to keep the meetings inclusive and understandable for all the participants.
Can you tell us about your professional background?
I am an Assistant Member at the Moffitt Cancer Center, in Tampa Florida. I am an equivalent of an Assistant Professor. Moffitt is a non-profit cancer hospital and research institute. I work in the department of Biostatistics and Bioinformatics, and my research is in bioinformatics. To put it simply, we use big molecular data to profile and understand cancer. We try to figure out what these tumors are doing and what are potential treatments based on the molecular profile of individual patients, also known as “personalized medicine”. My expertise, in particular, is in computational proteomics and metabolomics. People may have heard of the genome or genomics, which is the field that studies all of the genes expressed by an organism. Genes are the blueprints for proteins that are functioning inside all cells in our bodies. Proteomics is the field that studies all proteins expressed by an organism. This means the profiling of big data and analyses that go into profiling proteins. Similarly in metabolomics, you are looking at metabolites, small molecules in say blood and urine that can be biomarkers for the early detection of cancer.
How did you start this group? And how has your experience been so far?
I was introduced to the R Consortium several years ago through the Tampa Bay R Users Group. Our group is called the Moffitt Cancer Center Bio-Data Club and even though there is a Moffitt name in the title, we are open to the public. And we have people coming and joining us virtually from all over, all the time.
We started our group in 2018 and we have around 30-50 people attending our meetups. We also hold an annual hackathon which is very successful. There has been no shortage of speakers who want to give these “Hello World” talks that the audience can understand even if they are not hardcore programmers or statisticians. At Moffitt, we work at the intersection of cancer biology, computer science, statistics, machine learning, and mathematics. It is a challenge to be an expert in all of these topics at the same time, so having this venue has been really helpful.
I think we have done a good job keeping our group accessible given the wide range of backgrounds that we have. Attendees include lab technicians, data analysts, epidemiologists, medical doctors, bioinformaticians, and statisticians. Our meetings are designed such that everybody can benefit from them even if they don’t have a really great programming background.
What is the R community like in Tampa? Can you name a few industries using R in Tampa?
The R community in Tampa is great and constantly growing. As an academic, I tend to hang in academic circles, and locally a lot of the interactions I have is with colleagues working at other universities like the University of South Florida here in Tampa or at the University of Florida in Gainesville to our North.
Besides the academic circles, it feels like the pandemic really helped to attract more data scientists, developers, and IT folks to Tampa. People realized that you can work from anywhere and Tampa is a great place to live. The weather is great, the beach is nearby, and Disney World is a short car drive away in Orlando. So now we have people from big tech companies like Microsoft and Google all the way down to smaller companies.
We have a robust R and data science community at the Moffitt Cancer Center thanks to our organizational structure. We were one of the first cancer centers to create a Division of Quantitative Sciences led by a VP-level data scientist (Dr. Dana Rollison), and our Division now includes my department (led by Dr. Brooke Fridley, a brilliant biostatistician and data scientist), the Department of Machine Learning, and the Integrated Mathematical Oncology Department. On the hospital side, they use data science for business intelligence and guiding operational decisions. Many companies in the area are hiring in data science and machine learning, and as a quick plug, Moffitt is no exception. Right now we are looking for postdoctoral fellows for our Integrated Program in Cancer and Data Science (ICADS) program as well as a Vice Chair for my department.
How has COVID affected your ability to connect with members?
At first, it was a bit of a struggle to get a feel for online meetings. Overall, I think we did a good job transitioning to virtual only. At first, I didn’t really like the idea and felt that online meetings are going to be a poor substitute for in-person meetings, but now I totally see the benefit.
You are able to connect from anywhere in the world, and it’s very easy for the speakers to share their screens. During in-person meetings, speakers often had to use a foreign computer and mostly shared just screenshots or code snippets. So even though our presentations are tutorial-focused, these presentations were not truly interactive. With the transition to online, it is much easier to move from introductory slides to the actual tool/library/package and share how it works. Now it is trivial to run some commands and show the group the output live.
On the audio-visual side too, I was skeptical that people will have trouble hearing. But we didn’t have any audio-visual issues during the meetings. I also had some apprehensions about participation and felt people will not attend because that sense of community will be missing in online meetings. But the numbers really didn’t take a hit and with the amazing advances in software like Zoom, it has been great.
In a Zoom with 30-40 people, I understand it can be a bit intimidating to unmute yourself and ask questions. To overcome this, I try to help the audience by being the conversation facilitator and by providing references to things the audience might be familiar with or asking the speaker to explain a bit more. I think these efforts have helped the audience get more involved. Going online only certainly has been a bit of a learning experience but I think overall it’s been good. And for the time being just because the logistics are much easier, we are keeping them online for the newer term.
In the past year, did you have to change your techniques to connect and collaborate with members? For example, did you use GitHub, video conferencing, online discussion groups more? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?
For video conferencing, we have been using Zoom. We also use the messaging feature of Zoom a lot as it’s really easy to message other club members at Moffitt. We have been using GitHub fairly extensively even prior to the pandemic. So during the pandemic, we have mainly relied more on Zoom, maybe some features of Zoom like the option to annotate and draw on the screen.
We wanted to keep our club as open and accessible as possible from the beginning. So every meeting, the accompanying slides, and the links to the presenter goes on GitHub. So we do have an archive of all the meetings and it is not something new for us.
As a lot of the meetings are hosted at Moffitt if there is an internal speaker we don’t record that session and release it to the public. We do have recordings but they are password protected for privacy reasons. We do provide access to our members if they have a question or are confused. Due to these privacy issues, we are still working on setting up a Youtube channel.
I think we will definitely keep using these techniques in the future. We have had some amazing speakers from around the globe. Our two most recent speakers were Olivier Teytaud, the developer of the Python-based Nevergrad gradient-free optimization platform, and Zuguang Gu, author of the ComplexHeatmap Bioconductor package. Also because these technologies make it much easier for the members to connect from anywhere and they don’t have to deal with traffic and parking to get to Moffitt during the workday. This has definitely been helpful for people and I think it has helped our numbers.
Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?
My favorite presentation most recently was by Zuguang Gu, author of the ComplexHeatmap package. This package is an amazing data visualization tool. It allows you to take some data frame or matrix and, with a couple of lines of code, turn it into a publication-ready heatmap.
A lot of data I work with is high-dimensional, and there is often related information like clinical features, gene mutations, etc., and this software allows you to add these as annotations to the heatmap very easily. You can even have a plot on top of your plot. It is amazing software, and it’s been a very helpful tool in my work. He’s been developing this for years, and I highly recommend this package for heatmaps.
What trends do you see in R language affecting your organization over the next year?
I think it will be moving more and more into the tidyverse. I am slightly more old school as I learned R more than 10 years ago, and I still do a lot of work in base R. I am slowly working my way into some of the tidyverse packages. stringr is currently my favorite. I think I and others need to get on board because the tidyverse does make a number of data processing and transformation steps much easier.
Machine learning is also becoming in demand in the field and a lot of that work is done in Python. But I think there are some packages and libraries enabling efficient machine learning to be done within R. I think there will be more development in this area and there will be better ways for R to interact with other programming languages.
Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?
I think the New York Times has some wonderful examples of data journalism and they set a high bar with their data visualizations. We can learn a lot from them on how to take all these numbers and transform them into understandable results so that readers from all backgrounds can understand them.
Of the Funded Projects by the R Consortium, do you have a favorite project? Why is it your favorite?
I am a big fan of outreach and education (part of the reason why I organize our club), so my favorite funded project is “Setting up an R-Girls-Schools Network“.
Of the Active Working Groups, which is your favorite? Why is it your favorite?
My favorite active working group is the “R7 Package“. I really like the idea of a modernized successor to S3 and S4.
When is your next event? Please give details!
We meet on the third Thursday of the month at 3 pm New York time. We just had our annual hackathon, and I am up against the holidays, so I am still in the middle of arranging a speaker for next month. If you are interested in sharing a package you authored (or even a package that you like using), then please reach out to me.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!