Dynamic collaboration

SORTEE H04 - Promoting the use of Github in EvoEco

2021-10-24

tags: Hackathon Github Ecology Evolution Additional resources
Session organizers:
Pedro Henrique P. Braga\(^{-*}\) and Katherine Hébert \(^{=*}\)

badge badge badge badge

\(^-\) Ph.D. Candidate in Biology at Concordia University, Montréal, Canada.
\(^=\) Ph.D. Candidate in Biology at Université de Sherbrooke, Sherbrooke, Canada.
\(^*\) Equally contributed.

Material status and resources:

hackmd-github-sync-badge

Welcome!

We are both coordinators of a student-run, and student-oriented series of R workshops for Ecology and Biodiversity Science: the Québec Centre for Biodiversity Science R Workshop Series (GitHub here, if you are curious!).

These workshops are updated annually by students and postdoctoral fellows, in two languages (English and French), according to participant feedback we receive each year. This means there are a lot of moving parts, and GitHub helps us track and manage this type of dynamic collaboration in several ways.

We are excited to talk about some of the tools we use for the workshop series, and how both of us and other people use them for research in ecology and evolutionary biology, and beyond!

We will show and discuss some of the available resources to manage collaborations dynamically, with transparency and traceability.

  • Managing teams that work on several projects (or repositories) via GitHub Organisations.
  • Collecting, addressing, and tracking feedback and contributions through issues and pull requests.
  • Discussing topics relevant to research development using Github Discussions.
  • Tracking and showcasing contributions to projects with GitHub Insights.
  • Using GitHub Actions to apply continuous integration during document development to safely add and test contributed changes from many users (who use different systems).

What do we need to prepare for this session?

  1. Sign up for a Github account;
  2. (Optional) Github Desktop.

Part 1: Managing collaborations

Collaborations are rarely static: when people work on projects together, it is a continuously developing process of contributing ideas, giving and getting feedback, and addressing this feedback with changes until a goal is achieved, whether it’s a manuscript, a tutorial, a package, etc.

These different steps often overlap in some way across contributors, meaning people are actively changing the project at the same time.

The dynamic nature of these changes makes it very hard (and very confusing!) to work on a document that gets passed around the group, especially when versions are tracked by updating a file name with initials or a date.

This is where GitHub comes in!

GitHub Organisation

  • Collaborate with a team (or several teams) on a larger project with several repositories.

    • Great for big projects that involve several outputs or sub-projects, so everyone can see each others work without necessarily writing to the same repository
    • Teams can help to assign roles to groups of people, to keep everyone’s goals clear
    • Each team can have different access to certain repositories

    For example:

Issues and pull-requests

Issues allow authors to get user/community feedback to keep their code or document(s) up to date with the literature, address issues, add features that other users want to use for their research.

  • Issues are public (on a public repository), so these conversations are transparent.
  • Issues also allow tracking of questions, bug reporting, suggestions and feature requests, which are often overlooked as contributions.
  • You can also assign specific people to address issues as they arise, which helps to manage who works on what.

Pull requests allow for contributions from within the team or from other contributors (in a traceable way).

  • Issues and pull requests are easily linked to each other (e.g. “This PR addresses #1”).
  • Pull requests are checked before including the changes into the “main” version of the project, to avoid “breaking” things accidentally. You can assign certain people to validate pull requests before they can be merged into the project.
  • This opens the door to external contributors, who can submit code (or writing).

Issues and pull-requests are at the heart of GitHub’s appeal for collaboration and open science.

For our workshop series, we use issues to invite feedback from our community, and to create open task lists for invited developers to address when they update workshop material.
For example: https://github.com/QCBSRworkshops/workshop08/issues.

Developers then make changes to the material, submit a pull request, which is then reviewed by the coordination team. This allows us to approve the changes, or give feedback to address any issues in the changes.
For example: https://github.com/QCBSRworkshops/workshop08/pull/9.

Issues and feature requests are often used in ecology and evolutionary biology research, in particular for packages. A few examples below:

Hands-on exercise (15 mins):

This exercise’s objective is to get a feel for how issues and pull requests work to allow for many changes to occur, even when timelines overlap.

Prompt: This document is public, and open to contributions - from you! If you don’t have a GitHub account, feel free to make one. You will need an account to write an issue or make a pull-request.

  • Option 1: Open an issue in our repository to make a suggestion about the following passage, or about anything in the document.

  • Option 2: Correct errors or add to the following passage, and submit a pull request.

  • Option 3: There is a small list of resources at the end of this document. If you know of interesting resources about this topic, add one via a pull request, or open an issue to let us know which resource should be added!

Here is a passage with some errors:

This hakkathon has shown me that gitHUB is noT useful at all for collaborrations. It is especially NOT useful for the following things: .

You can find this information in an issue on our repository.

Part 2: Tracking (and showcasing) collaborations

GitHub Insights

GitHub has several built-in display functions to track and see contributions from different users.

You can also find tools for automatically showcasing contributions

Github Discussions (beta)

This is a new tool that is being introduced to GitHub, which is intended as a way to keep broader conversations open. For example, people can discuss a project at a conceptual level, have more in-depth discussions about what outputs mean (or don’t mean), and more. This is another way of tracking and showcasing contributions outside of code or writing, which are often left out because they happen in private email chains or are simply hard to keep track of.

You can take a peek on the Discussions tab for this repository. But, here are some examples of Github Discussions:

Part 3: Integrating contributions dynamically

Continuous Integration through GitHub Actions

Continuous integration (CI) is a practice where developers establish a consistent and automated way to build, package, and test applications and thus integrate their code changes early and often to the main branch or code repository.

Committing code more often detects errors sooner and reduces the amount of code a developer needs to debug when finding the source of an error.

Github Actions in a nutshell

Github Actions can help with task automation within your research development life cycle.

Its basic workflow is consisted of events that trigger jobs, which are sent to runners and have specific steps with actions it must complete.

Let us begin with the following example!

The instructions to run the above workflow is set in a .yaml file stored within the .github/workflow directory of your Github repository.

The YAML file for the above workflow can be written like this:

on: [push]

name: learn-github-actions

jobs:
  check-bats-version:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v2
        
      - name: Install node
        uses: actions/setup-node@v1
      
      - name: Install bats
        run: npm install -g bats
        
      - name: Run bats  
        run: bats -v

Specifically, the workflow can be detailed as:

  1. name sections: which provide names to the entire workflow and its steps;

  2. on: indicates the events that trigger this workflow. In this case, it will run whenever one pushes a commit to the repository;

  3. jobs: defines the jobs that will be run when trigger by the on: event. In this case, check-bats-version (a user-defined name) runs-on the latest version of the Ubuntu OS;

  4. steps: contains all actions that will be executed by the job. Our case contains four actions, two uses: predefined Github Actions, and two run:s commands that are executed directly in the OS terminal;

    1. The first step uses: the action hosted at actions/checkout on the branch or tag v2. This action allows one to check out the repository so the other actions can access and work at your repository. We can see more about this action here;
    2. The second uses: the action hosted at the actions/setup-node repository on the v1 branch or tag. This action installs and set up Node.js runtime environment within the runner;
    3. The third step run: a command that will install bats (which stands for Bash Automated Testing System);
    4. The final fourth step run: the command bats -v to check the version of bats.

We have stored this action within the .github/workflow directory of our Github repository.

Status of a Github Action workflow

Implemented Github Actions workflows can be found within the Actions directory of your Github repository.

For instance, the action workflow learn-github-actions that contains check-bats-version can be found at SORTEE-Hackathon-Dynamic-Collaboration/actions.

We can also see obtain badges for our action: learn-github-actions

Advanced examples

Example #1: This .Rmd document

To produce this document, we first have written it in RMarkdown and then converted it into HTML using the parsers knitr and pandoc.

Markdown is a type of light-weight markup language, where instead of editing text with What-You-See-Is-What-You-Get software (e.g. Microsoft Word, Google Docs), we combine data, code, and narrative in a single file and add formatting elements using plain-text.

For instance, the above text is written as:

Markdown is a type of light-weight markup language, where instead of editing text with 
[**W**hat-**Y**ou-**S**ee-**I**s-**W**hat-**Y**ou-**G**et](https://en.wikipedia.org/wiki/WYSIWYG) 
software (e.g. Microsoft Word, Google Docs), we combine *data*, *code*, and
*narrative* in a single file and add formatting elements using plain-text. 

With this, each of the components of your project can be tied together and easily re-run when data are updated or changes need to be made to other steps in the research workflow.

Alternatives to RMarkdown exist in several languages, including Weave.jl and Python-Markdown. For simplicity, we will use R here.

To obtain the HTML file we are looking at, we are required to render our RMarkdown document using the R function called rmarkdown::render().

Every time we make a change to the dynamic_collaboration_material.Rmd document, we must go to our directory docs/ and run rmarkdown::render() in our R console:

rmarkdown::render("dynamic_collaboration_material.Rmd")

This will create a .html file within /docs/ called dynamic_collaboration_material.html. However, this will only work if we have all packages required to do it, including rmarkdown and rmdformats (which contains the downcute theme)!

Requirement #1: Any user that would like to work on this document needs to have all required libraries and software installed within their computer, including R, (usually) RStudio, and the libraries rmarkdown and rmdformats, at least;

Requirement #2: To ensure a standardized output, all collaborating users (in this case, Katherine and Pedro) must have the same versions of the packages and software;

Requirement #3+: Discuss!

To contour these requirements, we created a Github Action workflow named render-rmd-material.yaml that will add some continuous integration to our document development process!

Let us see what our action does!

Example #2: R Packages and Python Libraries

We can bring this to a more academic and research-oriented application!

Take a look into the Github Actions workflow for the PyEI Python library to perform Ecological Inference!

Take a look into the Github Actions workflow for the ade4 (Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences) R package!

Hands-on-exercise #2: Modifying a Github Action! (~10 mins)

It is fun-time #2!

We have compiled data on penguins inhabiting the Palmer archipelago (Antarctica) and we created this document (open it!) to report the results of our analysis!