Mining Software Repositories (MSR) is an established research approach to extract generalizable knowledge from code-hosting platforms (e.g., GitHub and GitLab) and associated tools (e.g., issue trackers such as Jira, and email lists). Due to the scale of MSR studies (e.g., investigating thousands of repositories), tooling is a central piece of any method to reduce an already very time-consuming set of tasks. However, reusing tools between studies is not trivial and seldom happens in practice.
This project aims at building an MSR data collection infrastructure that can evolve and scale with minimum interruption. Although not a requirement per se, this project originated with the idea to leverage recent technologies such as GrimoreLab.
Ideally, some of the intended requirements for this infrastructure are: