To find and understand your software, documentation that describes its function needs to be accessible to both the user and to search engines. Structured and meaningful documentation directly supports the reuse and citation of your software.
How?
Pick one, or more, of the following options (#1-5) for documenting your tool of workflow. These options are listed in order of increasing complexity and completeness.
1. Ontology terms and tags
Ontologies are standardised dictionaries for specific domains, and make sure that researchers are using the same names to describe the same concepts. EDAM is a good choice for bioscience (Ison et al., 2013). It is under constant development and is used by registries (e.g. bio.tools (Ison et al., 2019), WorkflowHub (Goble et al., 2021)).
If EDAM is not suitable, you can use FAIRsharing to locate additional ontologies (Sansone et al., 2019). You can also add custom tags or keywords to your software. Terms commonly used in your research community or domain would be a good start.
2. Standard metadata files
The next level for annotations is to include metadata files alongside your software that are both human and machine-readable. Good options include:
codemeta.json
and CITATION.cff
files!3. Long form text descriptions
Additional long form descriptions can be added directly to websites, platforms or registries where your software can be accessed (Lee, 2018; Hermann & Fehr, 2022).
4. Complete documentation sets
Long form text descriptions, which should cover purpose, scope and requirements, can be extended to create complete documentation (covered well in this Ten Simple Rules article (Lee, 2018)).
- Reuse a template that is used in your community. Did you know that by using a blank community template (e.g. nf-core creating a new pipeline, Australian BioCommons documentation guidelines) you can address almost all of the steps described here!
- Build a README template using readme.so (Oelsner, n.d.).
- Automated documentation tools, as described in rule number eight of
Ten Simple Rules for documenting scientific software
(Lee, 2018).
5. Software publication
The final, most time consuming, and arguably most valuable avenue is to document your software in a peer-reviewed journal publication (Romano & Moore, 2020). Aside from documentation, the literature is the most obvious place to look for software purpose and function, as publications explain the context in which the software was created. If you’re not sure where to publish, a list of target journals is available from the Software Sustainability Institute (Chue Hong, n.d.). When publishing, don’t forget to use your ORCID to identify yourself as an author step #6.
Check FAIRness
- FAIRsoft Evaluator (Del Pico et al., 2022)
- Self-assessment for FAIR research software (Spaaks et al., n.d.)
- FAIRshare - the FAIR-BioRS guidelines appear in the manuscript (Patel et al., 2023)
- FAIR software recommendations (“FAIR Research Software,” n.d.)
Examples
We have included examples here that follow one or more annotation and documentation best practices:
- PacBio HiFi genome assembly using hifiasm workflow (Price & Farquharson, 2022).
rnaseq nf-core
workflow (Harshil Patel et al., 2023).bio-cwl-tools
collection: includes tools and workflows that embed metadata into theCWL
code. For example,Kraken2
(Wood et al., 2019) in this collection usesSchema.org
(Guha et al., 2015) metadata annotations.
References
- Ison, J., Kalaš, M., Jonassen, I., Bolser, D., Uludag, M., McWilliam, H., Malone, J., Lopez, R., Pettifer, S., & Rice, P. (2013). EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats. Bioinformatics, 29(10), 1325–1332. https://doi.org/10.1093/bioinformatics/btt113
- Ison, J., Ienasescu, H., Chmura, P., Rydza, E., Ménager, H., Kalaš, M., Schwämmle, V., Grüning, B., Beard, N., Lopez, R., Duvaud, S., Stockinger, H., Persson, B., Vařeková, R. S., Raček, T., Vondrášek, J., Peterson, H., Salumets, A., Jonassen, I., … Brunak, S. (2019). The bio.tools registry of software tools and data resources for the life sciences. Genome Biology, 20(1), 164. https://doi.org/10.1186/s13059-019-1772-6
- Goble, C., Soiland-Reyes, S., Bacall, F., Owen, S., Williams, A., Eguinoa, I., Droesbeke, B., Leo, S., Pireddu, L., Rodríguez-Navas, L., Fernández, J. M., Capella-Gutierrez, S., Ménager, H., Grüning, B., Serrano-Solano, B., Ewels, P., & Coppens, F. (2021). Implementing FAIR Digital Objects in the EOSC-Life Workflow Collaboratory. https://doi.org/10.5281/ZENODO.4605654
- Sansone, S.-A., McQuilton, P., Rocca-Serra, P., Gonzalez-Beltran, A., Izzo, M., Lister, A. L., Thurston, M., & the FAIRsharing Community. (2019). FAIRsharing as a community approach to standards, repositories and policies. Nature Biotechnology, 37(4), 358–367. https://doi.org/10.1038/s41587-019-0080-8
- Druskat, S., Spaaks, J. H., Chue Hong, N., Haines, R., Baker, J., Bliven, S., Willighagen, E., Pérez-Suárez, D., & Konovalov, A. (2021). Citation File Format. https://doi.org/10.5281/ZENODO.5171937
- Jones, M. B., Boettiger, C., Mayes, A. C., Arfon Smith, Slaughter, P., Niemeyer, K., Gil, Y., Fenner, M., Nowak, K., Hahnel, M., Coy, L., Allen, A., Crosas, M., Sands, A., Hong, N. C., Cruse, P., Katz, D., & Goble, C. (2017). CodeMeta: an exchange schema for software metadata. KNB Data Repository. https://doi.org/10.5063/SCHEMA/CODEMETA-2.0
- Lee, B. D. (2018). Ten simple rules for documenting scientific software. PLOS Computational Biology, 14(12), e1006561. https://doi.org/10.1371/journal.pcbi.1006561
- Hermann, S., & Fehr, J. (2022). Documenting research software in engineering science. Scientific Reports, 12(1), 6567. https://doi.org/10.1038/s41598-022-10376-9
- Oelsner, K. readme.so. In readme.so. Retrieved October 27, 2023, from https://readme.so/
- Romano, J. D., & Moore, J. H. (2020). Ten simple rules for writing a paper about scientific software. PLOS Computational Biology, 16(11), e1008390. https://doi.org/10.1371/journal.pcbi.1008390
- Chue Hong, N. In which journals should I publish my software? https://www.software.ac.uk/top-tip/which-journals-should-i-publish-my-software
- Del Pico, E. M., Gelpi, J. L., & Capella-Gutiérrez, S. (2022). FAIRsoft - A practical implementation of FAIR principles for research software [Preprint]. Bioinformatics. https://doi.org/10.1101/2022.05.04.490563
- Spaaks, J. H., Honeyman, T., & Verhoeven, S. FAIR software checklist. Retrieved February 12, 2024, from https://github.com/ardc-fair-checklist/ardc-fair-checklist.github.io
- Patel, B., Soundarajan, S., Ménager, H., & Hu, Z. (2023). Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool. Scientific Data, 10(1), 557. https://doi.org/10.1038/s41597-023-02463-x
- Price, G., & Farquharson, K. (2022). PacBio HiFi genome assembly using hifiasm v2.1. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.221.3
- Harshil Patel, Ewels, P., Peltzer, A., Botvinnik, O., Sturm, G., Moreno, D., Pranathi Vemuri, Garcia, M. U., Silviamorins, Pantano, L., Binzer-Panchal, M., Nf-Core Bot, Syme, R., Zepper, M., Kelly, G., Hanssen, F., Yates, J. A. F., Cheshire, C., Rfenouil, … Di Tommaso, P. (2023). nf-core/rnaseq: nf-core/rnaseq v3.12.0 - Osmium Octopus. Zenodo. https://doi.org/10.5281/ZENODO.7998767
- Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. https://doi.org/10.1186/s13059-019-1891-0
- Guha, R. V., Brickley, D., & MacBeth, S. (2015). Schema.org: Evolution of Structured Data on the Web: Big data makes common schemas even more necessary. Queue, 13(9), 10–37. https://doi.org/10.1145/2857274.2857276