A common concern within the Computational Science (CS) community is that computational scientists are not trained in Software Engineering (SE) and hence might create less-than-good software. This concern is unfounded.
Computational Scientists study micro events within atoms that add up to predictable properties of macro materials (e.g. the sun). Similarly, empirical software engineers study micro patterns within software projects, to learn predictable properties of those projects.
Hypothesis: Integrating SE practices will help computational scientists produce better science (e.g. more reliable, more reproducible, & more efficient).
DR. TIM MENZIES: Email / LinkedIn / Website
In our study conducted on 21 CS projects, we found that in case of majority of them (15), adapated SE methods dominate traditional SE methods for defect forecasting.
In our study conducted on data of more than 3000 developers across 59 CS projects, we found that non-experts and novices are able to contribute to the software development without causing more bugs than existing heroes. Non-hero CS developers introduced only slightly higher number of bugs than hero developers (1.09-1.15 times), as compared to (1.3-1.9 times) in case of SE projects.
In our study conducted on data of more than 3000 developers across 59 CS projects and comparing with 1000+ SE projects on github, we observed that CS developers commited, closed issues, deployed/released, and tagged more while the projects are shorter in duration.
Which of the following is useful to you?
Or... what else should we do?
A friend who tells you where this question has been answered before
We collected basic measures for over 100 projects and test case / build time metrics on over 50.
In this paper we seek quantitative evidence (from dozens of Comptutational Science projects housed in Github) for 13 previously published conjectures about scientific software development in the literature. In all, we explore three groups of beliefs about (1) the nature of scientific challenges; (2) the implications of limitations of computer hardware; and (3) the cultural environment of scientific software development. We find that four cannot be assessed with respect to the Github data. Of the others, only three can be endorsed. Our conclusion will be that the nature of Computational Science software development is changing. Hence the tools we should develop to support software-based-science, need to change as well.
Standard automatic methods for recognizing problematic development commits can be greatly improved via the incremental application of human+artificial expertise.
In this approach, called EMBLEM, an AI tool first explore the software development process to label commits that are most problematic. Humans then apply their expertise to check those labels (perhaps resulting in the AI updating the support vectors within their SVM learner).
The source code we used for better labeling can be found here:- SillyWalk
We collected data of 678 computational science projects and identified 59 good ones which met our selection criteria for analysis.
Commits | >20 |
Duration | >1 year |
Issues | >10 |
Programmers | >7 |
Releases | >0 |
Details for the rest of 678 projects can be found here
We have used the following tools in our work:-
Git Miner is a python based application to mine GitHub repository.
Understand provides you with pertinent information regarding your code. Quickly see all information on functions, classes, variables, etc., how they are used, called, modified, and interacted with.
This tool is used to view automatically generated reports of a repository's commits