CCV·Con

Reproducible Science

Monday – October 28th

Session chair: Ashok Ragavendran & Isabel Restrepo

3:10 – 3:15

CCV-Con Opening Remarks

Paul Stey

Director of Data Science and Scientific Computing, CCV

3:15 – 3:20

Session Intro and Welcome

Isabel Restrepo

Lead Data Scientist, CCV

3:20 – 3:40

Enhancing Reproducibility using Containers

Bradford Roarr

Senior Web Developer, CIS

An overview of software reproducibility with a focus on containers. Learn to avoid common pitfalls and leverage lesser used features when building containers.

3:40 – 3:50

Lightning Talk: Reproducible bioinformatics with Bioflows

Ashok Ragavedran

Lead Data Scientist, CCV

Reproducibility of computational analysis is fast becoming the norm and bioinformatics analysis of Next-Gene Sequencing data is no exception. We present bioflows, a tool for reproducible bioinformatics workflows that serves a heterogenous user base, ranging from basic to advanced computational skills. Workflows are defined by a YAML interface, which allows for submitting jobs on OSCAR with explicitly defined dependencies. All the necessary tools associated with bioflows are provided as CONDA packages, making it a self contained environment for running the workflow and easy to install. We will provide a quick overview of the current features.

3:50 – 4:00

Lightning Talk: Tools For Reproducible Analysis

August Guang

Genomics Data Scientist, CCV

We can all agree that reproducible analyses are incredibly important, but that it is not widely practiced. However, computational tools that enable reproducible research while integrating with exploratory data analysis are more widespread and easier to use than ever. I will talk about some of these tools and general guidelines for how to make your research workflow reproducible without hindering your workflow.

4:00 – 4:10

Lightning Talk: RefChef: Provenance of Genomic reference datasets

Joselynn Wallace

Genomics Data Scientist, CCV

Over the last 18 years, the approximate total cost of sequencing a genome has dropped from $100,000,000 to $1,000, leading to a dramatic growth of the field of genomics and its many other ‘omics sequencing offshoots. The increased access to large amounts of sequencing data has created a new set of challenges around best practices for storage, management, and sharing of data. Here, we present RefChef – a reference sequence management system to record the provenance of reference sequences, indices, annotations, and their associated metadata. RefChef uses a master yaml file to store references’ metadata and the commands to download and process them, while Git (and optionally GitHub) version control is used to track changes to the master yaml file as new references are added. RefChef also allows users to view summary tables of available references, which can be viewed either inside the command line or can be hosted on an external website easily accessible to collaborators. RefChef helps researchers adhere to data management plans, makes research more reproducible, and saves valuable time and space on computing resources by facilitating the maintenance of clearly documented, communal reference sequences for research groups.

4:10 – 4:20

Lightning Talk: Visualization and Web Intro

Mary McGrath1 and Maura Driscoll2

1. Data Scientist, CCV 2. Web Development Intern, CCV

The internet is a powerful medium for story telling with data (scientific and otherwise), but creating compelling, interactive graphics can be difficult. This talk will walk through a few approaches to creating compelling visualizations and publishing them to the web. We’ll highlight the work of one of our CCV interns who has visualized CCV’s publication data.

4:20 – 4:30

Lightning Talk: Quantifying OCD-related behaviors via behavioral tasks in JavaScript

Nicole Provenza 1 and Fernando Gelin2

1. PhD Candidate - Neuromotion Lab 2. Research Software Engineer, CCV

Quantification of behaviors related to neuropsychiatric disorders presents a challenge for researchers; symptoms are not homogenous across individuals with the same diagnosis and symptoms wax and wane over time. Psychophysical tasks provide a controlled setting to probe various behavioral and cognitive states, including reward evaluation, uncertainty, and error-monitoring. However, psychophysical tasks are typically deployed in the lab or the clinic, making it difficult for repeated testing over time. We have developed a code base for deploying behavioral tasks across various platforms (e.g. in clinic, at home, online) to study how OCD (Obsessive Compulsive Disorder) related behaviors change over time and vary across individuals.

4:30 – 4:45

Reproducibility in the classroom with Jupyter Hub and GitHub Classroom. Introduction and Faculty Panel

Carsten Eickhoff 1, David Sheinberg 2, Isabel Restrepo (Moderator)3

1. Assistant Professor of Medical Science, Assistant Professor of Computer Science 2. Professor of Neuroscience 3. Lead Data Scientist, CCV

JupyterHub provides a convenient way to serve Jupyter Notebooks for multiple users with a pre-configured computing environment where users do not need to worry about installing any software packages. With the goal of lowering barriers to computing, various teams of CIS have worked together to make JupyterHub available for courses and workshops at Brown. In this talk, we will briefly introduce the service, it’s history, and roadmap. We will follow with a panel to learn directly from faculty about their experience using JupyterHub in the classroom

4:45 – 5:10

CCV Showcase

Thomas Serre 1, Paul Stey 2, Linnea Wolfe 3, Jill Pipher4

1. Faculty Director, CCV 2. Director of Data Science and Scientific Computing, CCV 3. Director of Operations, CCV 4. Elisha Benjamin Andrews Professor of Mathematics, Vice President for Research Faculty Provost's Office

We will discuss CCV’s role in research at Brown and the new services and resources we provide.

5:10 – 7:00

Faculty Reception

We invite all faculty to attend a meet and greet with CCV staff at the Carney Institute for Brain Science (164 Angell St, 4th Floor).