We are excited to share the newly released SoFAIR Workflow documentation: A workflow for the management of the software assets lifecycle (guide for developers). The SoFAIR workflow is a specification for the development of an AI-assisted service for the identification and tracking of research software. It consists of three key parts, specifically, 1) AI-assisted identification and extraction of software mentions from research manuscripts (facilitated by CORE), 2) their validation with publication authors’ (facilitated by an open repository) and 3) software asset archival (facilitated by Software Heritage) to support its enduring availability as part of the academic record. The documentation is a living repository which will evolve with the implementation of the workflow.
The SoFAIR workflow (Figure 1) depicts how services, tools and infrastructures work together to facilitate the identification, validation and archival of research software mentioned in research manuscripts. It consists of the following 8 steps: An author deposits a piece of research software in a code repository [1]. The author then deposits a manuscript that contains either explicit or implicit mentions of that software [2]. The research paper is then harvested from the repository by CORE and software mentions extracted from the full text research paper [3]. Via the CORE Repository Dashboard, a request to validate the extracted mentions is made available to the repository [4] and, with the authorisation of the repository manager, routed to the author (e.g. by means of an email notification) who validates this request [5]. Once validated, the repository issues an asset registration request to Software Heritage [6] to permanently archive the new software asset [7] and issue a permanent identifier for the new asset and send this back to the repository [8].
More detailed technical description:
1. Software development: A research team develops software on platforms like GitHub or GitLab, maintaining a dedicated code repository linked to their article.
2. Manuscript and metadata deposition: The corresponding author deposits the manuscript and its metadata into Open Access repositories. This may include a moderation process to ensure uniqueness and cohesion.
3. Indexing by CORE: The Open Access repository metadata is indexed by CORE using the OAI-PMH protocol, making the resources discoverable on a larger scale.
4. Processing by CORE Systems: CORE processes the metadata and manuscript, extracting software mentions with machine learning models, and returns a TEI/XML file with these mentions.
5. Validation module: CORE sends the TEI/XML file and manuscript to a validation module, which sends a validation request to the Open Access repository.
6. Validation by the corresponding author: The Open Access repository notifies the corresponding author of the validation request, allowing them to confirm the accuracy of the software mentions.
7. Archival through Software Heritage: After validation, the Open Access repository archives the software through Software Heritage, which assigns a Software Hash Identifier (SWHID).
8. Exposing the SWHID: The SWHID is exposed by the Open Access repository, making the software discoverable and citable, enhancing the visibility of the research team’s contributions.
To learn more and provide feedback:
- Register to the webinar here
- Read the full documentation
We invite the community—researchers, developers, repository managers, and open science advocates—to contribute to the SoFAIR initiative and help us refine and expand this workflow. Your input and feedback are crucial in ensuring that our tools and processes meet the diverse needs of the scholarly community. Whether through sharing your experiences, suggesting improvements, or collaborating on new how to guides, your contributions will drive the advancement of research software management and enhance the visibility and reproducibility of scientific work. Let’s work together to make research software a first-class citizen in the academic world!
For that you can always reach us through https://sofair.org/contact.