Billinge Group standards

We present here standards of coding and coding workflows that have been adopted in the Billinge research group over multiple years to make our code more uniform and maintainable by a diverse and shifting team of students and post-docs. They are mostly adopting what we have learned to be standards in the Python community, with some lessons learned along the way. This is not the only way to do things, but since it has taken us lots of trial and error to develop them over time, we share them here in case they are useful to you.

GitHub workflow

Commit messages and issue titles

For commit messages and issue titles, we add prefixes adopted from https://www.conventionalcommits.org:

feat: A new feature has been added.
fix: A bug has been fixed.
docs: Documentation changes only.
style: Changes that don't affect code functionality (e.g., formatting, whitespace).
refactor: Code changes that neither fix a bug nor add a feature.
perf: Performance improvements.
test: Adding missing tests or correcting existing ones.
chore: Updates to the build process or auxiliary tools.
build: Changes that affect the build artifact or external dependencies.
ci: Updates to CI configuration files and scripts.
revert: Reverts a previous commit.
release: A new package version is being prepared.
skpkg: Using scikit-package to create a new package or maintain an existing package.
  • Example 1: “feat: create a DiffractionObject.morph_to() method”

  • Example 2: “bug: handle divide by zero error in DiffractionObject.scale_to

Please see an example here: https://github.com/scikit-package/scikit-package/issues. There are a few benefits to adding prefixes to GitHub issue titles. First, it helps us prioritize tasks from the GitHub notifications. Second, it helps reference issues in a comment within an issue or pull request and organize tasks.

Attention

How do I write a good Git commit message?

A commit message is written for PR reviewers and for debuggers. Avoid verbosity for a quick overview. An ideal commit message communicates file(s) of interest, the reason for the modification, and what modifications were made. e.g., “chore: move all files from docs to doc for scikit-packaging.”

Attention

How do I write a good GitHub issue title?

The issue is where we discuss and determine whether to implement via a pull request. It’s much easier for the project maintainer(s) to recall the problem/intended behavior and the technique in that order. Here are some examples:

e.g., “feat: set custom default prompt values after package create <command> using hidden config file.”

e.g., “feat: preview rendered documentation in each PR with GitHub Actions”

If you have a suggestion, you can also create an issue with a question mark:

e.g., “refactor: how can we dynamically retrieve python min/max default Python versions in documentation?”

e.g., “refactor: can we standarlize the way we write CHANGELOG (news.rst file)?”

Pull request practices

  1. Have a theme for each PR to reduce cognitive overload for the reviewer.

  2. Make PRs small with the possibility of rejection.

  3. Write closes #<issue-number> in the PR comment to automatically close the issue when the PR is merged. See https://github.com/scikit-package/scikit-package/pull/350.

  4. A PR should close an issue on GitHub. If there is no issue, make one. If the issue is big, consider breaking it down into smaller issues.

  5. Review your own PR. Start as a draft PR, click Files changed, add comments, and then request a review. In-line comments are needed if the changes are not obvious for the reviewer. See https://github.com/scikit-package/scikit-package/pull/310.

  6. If another commit was pushed after “@username ready for review”, write another comment “@username ready for review after fixing ____” so that the reviewer is directed to the PR, not the file changes by the new commit.

  7. Use the pull request template provided in .github/PULL_REQUEST_TEMPLATE. In the PR comment, highlight inputs and outputs of the changes (screenshots/outputs).

  8. Address all in-line comments made by the reviewer(s) before asking for another round of review. If you have seen a comment and agree with it but no action is needed, a thumbs-up emoji is sufficient.

  9. Use > to quote the reviewer’s sentence(s) and write your response below it. If there are multiple comments, tag the reviewer(s) with @username.

  10. During PR review, when reviewers propose new ideas or suggestions beyond the scope of the PR, create an issue using the issue template and include a link to the specific PR comment URL (not the PR itself). Then, reply to the reviewer(s) to confirm that the issue has been created.

  11. PR from a new branch if it contains a meaningless commit history.

  12. Do not force push. Use git revert to unwind the previous commit.

  13. If you’ve made a mistake but have not used git add, use git restore <file-name>.

  14. Before CI is integrated, include local test passing results in each PR to save time for the reviewer.

  15. For migrating files from one folder to another folder, use git mv.

  16. For deleting files generated by the OS such as .DS_Store use git rm instead of git add to also remove from the Git index (staging area).

  17. When a PR is closed for any reason, add a single sentence in the comment explaining why the PR is being closed. If a new PR is created, add the new PR link in the comment.

Writing news items for release notes

Why do I need a news file for each PR?

We want to write a good CHANGELOG.rst for each release version. These news items are of interest to both developers and technical users looking for specific keywords. We can streamline the process of writing CHANGELOG.rst for each release by compiling the news items from the news directory.

Here is an example CHANGELOG.rst: https://github.com/scikit-package/scikit-package/blob/main/CHANGELOG.rst

How do I create a news file?

  1. In your repository, confirm that the news/TEMPLATE.rst file exists.

  2. Make a copy of news/TEMPLATE.rst and name it news/<branch-name>.rst. You can automate the process of creating the news file and adding new items by running the commands:

    package add news --add -m "<Describe the main feature added in the PR.>"
    

    Note

    As shown above, you may use -a or --add. The second flag, -m, stands for message.

  3. Confirm that your news/<branch-name>.rst has the following content:

    **Added:**
    
    * <Describe the main feature added in the PR.>
    
    **Changed:**
    
    * <news item>
    
    **Deprecated:**
    
    * <news item>
    
    **Removed:**
    
    * <news item>
    
    **Fixed:**
    
    * <news item>
    
    **Security:**
    
    * <news item>
    

    See also

    Once you are happy with the news item(s) that you have added, you don’t need to modify anything in news/<branch-name>.rst or news/TEMPLATE.rst. scikit-package will automatically parse your news items and add them to the CHANGELOG.rst file when you initiate a release.

    Note

    Although it is generally not recommended, if two tasks were completed on the same branch, you can still add another news item by running the package add news command again.

Where do I place my news item in <branch-name>.rst?

  • **Added:** (-a, --add) includes features or functionality of interest to users and developers, such as support for a new Python version or the addition of a useful feature.

  • **Changed:** (-c, --change) includes modifications that affect end users or developers, such as API changes or replaced dependencies.

  • **Fixed:** (-f, --fix) includes bug fixes or refactoring.

  • **Deprecated:** (-d, --deprecated) includes methods, classes, or workflows that will no longer be supported in a future release.

  • **Removed:** (-r, --remove) is the opposite of the “Added” section, referring to features or functionality that have been removed.

  • **Security:** (-s, --security) includes fixes or improvements related to vulnerabilities, authentication, or access control.

What are good practices for writing news items?

  • Maintain consistency when a new release note is created and added to the CHANGELOG.rst file:

    1. Start with an active verb.

    2. Capitalize the first word.

    3. End with a period.

    4. Use the rst style of backquotes instead of the markdown style. For example, use ``scikit-package`` instead of `scikit-package`.

    e.g., Add automatic linting of .md, .yml, and .rst files via a prettier hook in pre-commit.

My PR has trivial fixes that shouldn’t be in CHANGELOG.rst. Should I still create a news file?

Yes. In this case, we want to communicate to the reviewer(s) the reason it should not be in the CHANGELOG.rst. You can simply run the following:

package add news -n -m "<brief-reason>."

You will see a new file has been created: news/<branch-name>.rst

**Added:**

* No news: <brief-reason>

...

Writing tutorials

  1. In general, we want to provide step-by-step instructions, including CLI commands and expected outputs. However, to avoid “reinventing the internet”, we should provide recommended workflows and tools. For example, in scikit-package, we recommend using Git for Windows for Windows users so that everyone, including macOS and Linux users, can also follow the same steps.

Unit tests

Why do we write unit tests?

We write tests to document the intended behavior of the code. When refactoring tests, consider what we “want” to happen, rather writing tweaking the existing test functions to pass.

Group and organize test cases

The following practices have been developed to ensure consistency in writing tests:

  1. Comment starts with a uppercase letter (PEP8 standard) unless it’s a name starting with a lowercase letter like a function name.

  2. Include a high-level test function comment e.g., # Test conversion of q to tth with q and wavelength

  3. Use C1: Start with a capital letter... or Case 1: Start... for each condition under @pytest.mark.parametrize.

  4. If applicable, group similar test conditions under a single case. Numerate each test condition.

  5. Divide a test case comment into two parts: x, y, z (conditions), expect.... Ensure there is a expect keyword after the test conditions provided.

  6. Use descriptive yet concise variable names for expected values (e.g., expected_xarrays instead of expected)

  7. Order test cases from the most general to edge cases. This helps readers understand the basic function behavior first before utilizing or encountering unusual features or behaviors.

  8. Consider moving reusable code objects to conftest.py. See warning messages and objects defined in https://github.com/diffpy/diffpy.utils/blob/main/tests/conftest.py available in each test function in https://github.com/diffpy/diffpy.utils/blob/main/tests/test_diffraction_objects.py/

Pytest example 1

@pytest.mark.parametrize(
    "xtype, expected_xarray",
    [
        # Test whether on_xtype returns the correct xarray values
        # C1: tth to tth, expect no change in xarray value
        # 1. "tth" provided, expect tth
        # 2. "2theta" provided, expect tth
        ("tth", np.array([30, 60])),
        ("2theta", np.array([30, 60])),
        # C2: "q" provided, expect q converted from tth
        ("q", np.array([0.51764, 1])),
        # C3: "d" provided, expect d converted from tth
        ("d", np.array([12.13818, 6.28319])),
    ],
)
def test_on_xtype(xtype, expected_xarray, do_minimal_tth):
    pass

Pytest example 2 - multi-line arguments

  • Add # C1: inside within ( … ). More examples here.

@pytest.mark.parametrize(
    "do_args_1, do_args_2, expected_equality, wavelength_warning_expected",
    [
        # Test when __eq__ returns True and False
        (  # C1: Identical args, expect equality
            {
                "name": "same",
                "scat_quantity": "x-ray",
                "wavelength": 0.71,
                "xtype": "q",
                "xarray": np.array([1.0, 2.0]),
                "yarray": np.array([100.0, 200.0]),
                "metadata": {"thing1": 1},
            },
            {
                "name": "same",
                "scat_quantity": "x-ray",
                "wavelength": 0.71,
                "xtype": "q",
                "xarray": np.array([1.0, 2.0]),
                "yarray": np.array([100.0, 200.0]),
                "metadata": {"thing1": 1},
            },
            True,
            False,
        ),
        (  # C2: Different names, expect inequality
            {
                "name": "something",
                "xtype": "tth",
                "xarray": np.empty(0),
                "yarray": np.empty(0),
                "metadata": {"thing1": 1, "thing2": "thing2"},
            },
            {
                "name": "something else",
                "xtype": "tth",
                "xarray": np.empty(0),
                "yarray": np.empty(0),
                "metadata": {"thing1": 1, "thing2": "thing2"},
            },
            False,
            True,
        ),
    ],
)
def test_equality(do_args_1, do_args_2, expected_equality, wavelength_warning_expected):
    pass

Comment starts with a uppercase letter (PEP8 standard) unless it’s a name starting with a lowercase letter like a function name.

Docstrings

Please bookmark the following:

In the group, we follow the NumPy standard:

  1. A one-line summary that does not use variable names or the function name is added before a full description.

  2. Use “Return a dict” instead of “Returns a dict”. Comments are instructions.

  3. “The” is used for the starting description of attribute/parameter/return.

  4. Full docstrings are not required for private functions.

For examples, please refer to https://github.com/diffpy/diffpy.utils/blob/main/src/diffpy/utils/diffraction_objects.py.

Naming conventions

When should we use hyphens - or underscores _ in file and folder names?

Use hyphens - for project names, package names, GitHub repositories, folder names, branch names, and static files like .rst, .md, .yml, and .png. For CLI, also use hyphens - for args like gh pr list --author "@sbillinge" Check an example documentation of scikit-package here: https://github.com/scikit-package/scikit-package/tree/main/docs/source.

Use underscores _ in the following two cases:

  1. Python files, e.g., tests/test_diffraction_objects.py.

  2. Project directory names, e.g., src/<project_directory_name>. Modules and packages are imported with spaces replaced by underscores, like import bg_mpl_stylesheets. Here is an example project: https://github.com/scikit-package/bg-mpl-stylesheets/tree/main/src/bg_mpl_stylesheets.

Note

scikit-package automatically creates a folder with underscores _ for the project directory name and .py files. We recommend using a single word for folder names that contain Python scripts, e.g., src/example_package/parsers, so that it can be imported as from example_package.parsers import <module>. This follows Python conventions.

Warning

There are still cases where we do not strictly follow the above conventions, typically for configuration files. In such cases, we adhere to the naming conventions of the respective tool. For example, .codespell/ignore_lines.txt is a configuration file for the codespell tool. .github/ISSUE_TEMPLATE is the designated folder for storing GitHub issue templates. If you are unsure, please feel free to open an issue in the scikit-package GitHub repository.

Error message

Divide an error message into two sections: (1) reason for error, (2) what to do to fix it. Ex) “Both release and pre-release specified. Please re-run the command specifying either release or pre_release.” Error messages are for users. Consider users without programming knowledge.

Deprecating functions

As codebases evolve, function and method names are sometimes updated to follow improved naming conventions. When this happens, we want to warn users that the old name will be removed in a future release while keeping their existing code working in the meantime. This process is called deprecation.

This guide describes the simplest and most common case: renaming a function in place, without changing its location or API.

Example:

  • Old name: diffpy.package.MyClass.myFunction

  • New name: diffpy.package.MyClass.my_function

Import the deprecation utilities

For Python versions < 3.13, we use DiffPy’s internal deprecation decorator. (The syntax matches Python’s built-in decorator in newer versions.)

Add the following import near the top of the module,

from diffpy.utils._deprecator import deprecated, build_deprecation_message

Define the deprecation message

Define a single deprecation message near the top of the file. This message should clearly state,

  • what is deprecated

  • when it will be removed

  • what to use instead

For this example,

base = "diffpy.package.MyClass"
removal_version = "4.0.0"

myFunction_deprecation_msg = build_deprecation_message(
    base,
    "myFunction",
    "my_function",
    removal_version,
)

Because the function is not moving, new_base is omitted.

Add the new function

Create the new function with its updated name. This should be the canonical implementation going forward.

class MyClass:

    def my_function(self, x):
        """Perform an important calculation."""
        return x * 2

Mark the old function as deprecated

Keep the old function name, but,

  1. Decorate it with @deprecated

  2. Replace its body with a call to the new function

  3. Update its docstring to clearly state the deprecation

To generate a consistent deprecation docstring, use the helper command,

package add deprecation my_function 4.0.0 -n diffpy.package.MyClass

This command prints a ready-to-use docstring indicating the removal version and the replacement function. Copy and paste the output into the deprecated function.

The deprecated method should then look like this,

class MyClass:

    def my_function(self, x):
        """Perform an important calculation."""
        return x * 2

    @deprecated(myFunction_deprecation_msg)
    def myFunction(self, x):
        """This function has been deprecated and will be removed in
        version 4.0.0.

        Please use diffpy.package.MyClass.my_function instead.
        """
        return self.my_function(x)

Duplicate tests for the new function

Next, ensure that both the deprecated function and the new function are covered by tests.

Locate the existing tests that exercise the old function name myFunction. Copy these tests and update them to call the new function name my_function instead. This ensures that:

  • The new function behaves identically to the old one

  • Refactoring does not accidentally change behavior

  • Both code paths are validated during the deprecation period

Ideally, when you run tests in the following steps, only one deprecation warning should be emitted per function that has been deprecated. This is to keep the deprecation task organize.

For example, if the original test looks like this,

def test_myFunction():
    obj = MyClass()
    assert obj.myFunction(2) == 4

Duplicate it for the new function:

def test_my_function():
    obj = MyClass()
    assert obj.my_function(2) == 4

At this stage, both tests should pass and exercise the same underlying logic.

Do not remove the tests for the deprecated function until the removal release. These tests act as a safeguard to ensure the deprecated wrapper continues to work as expected.

Update internal usage

Now that both the new and deprecated function names exist, update the codebase to use the new function name everywhere internally.

Do a global search for the old name myFunction and replace it with my_function in all source files except for,

  • The deprecated wrapper function itself

  • Test cases that explicitly test the deprecated behavior

  • API documentation where the function is intentionally marked as @deprecated

The goal is to ensure that,

  • All internal code paths exercise the new implementation

  • The deprecated function exists only as a thin compatibility layer for external users

  • No new internal code is added that depends on the deprecated name

After this step, the only remaining references to myFunction should be the deprecated method definition and its associated tests.

This ensures that removing the deprecated function in the future will not require any additional internal refactoring.

Verify the deprecation warning

Install the package locally and run the test suite,

pip install .
pip install pytest
pytest

If done correctly, pytest should pass and emit a warning when using myFunction. The message should look something like,

'diffpy.package.MyClass.myFunction' is deprecated and will be removed
in version 4.0.0. Please use 'diffpy.package.MyClass.my_function' instead.

Check the documentation

To check if the documentation is updated with the deprecation message, build the documentation locally and open the index page,

make html docs/ && open docs/build/html/index.html

Note, this command might vary depending on your project’s documentation setup.

Removal

In the removal release (e.g. 4.0.0),

  • Delete myFunction

  • Remove the deprecation message

  • Remove the tests for myFunction

  • Keep my_function as the sole implementation

Congrats, you have successfully deprecated a function!

Other considerations for maintaining group infrastructure

  1. Be extremely careful with changes that are visible to users.

  2. Try not to pass down technical debt to future members. Do the extra work so that others can save time. Ex) making a PR to the scikit-package repo once an issue has been identified in a project standardlized with scikit-package.

  3. Design code to save developer time in the long run rather than focusing solely on reducing compute time, especially when computing resources are not the primary constraint.

  4. It is easier to remove things (e.g., dependencies) we don’t want than to add things that are needed in certain circumstances.