5. Use detect-secrets for secret scanning#

Date: 21/08/2025

Status#

Accepted

Context#

The SmartEM Decisions project requires robust secret scanning to protect sensitive research data, database credentials, API keys, and Kubernetes cluster secrets. As part of the Diamond Light Source facility infrastructure, high security standards are essential whilst supporting scientific computing workflows.

The development team evaluated secret scanning tools for integration into the existing sophisticated pre-commit and CI/CD pipeline (Python 3.12+, ruff, pyright). The organisational cybersecurity team recommended Gitleaks for standardisation across projects.

Key requirements included:

  • Integration with Python 3.12+ ecosystem and existing toolchain

  • Handling scientific computing patterns (chemical formulas, gene sequences, scientific notation) without excessive false positives

  • Support for high-throughput processing (1000+ images/hour) without workflow disruption

  • Enterprise-grade baseline management for research environments

Three tools were evaluated:

  • Gitleaks: High-performance Go implementation, organisational preference, but higher false positives in scientific contexts

  • TruffleHog: Advanced entropy analysis, but resource-intensive with SaaS dependencies

  • detect-secrets: Python-native, superior false positive handling, sophisticated baseline management

Decision#

We will use detect-secrets as the primary secret scanning tool, integrated into both pre-commit hooks and CI/CD pipelines, despite the organisational preference for Gitleaks standardisation.

Consequences#

Positive:

  • Native Python integration with existing development workflow

  • Superior false positive management for scientific computing patterns

  • Enterprise-grade baseline system for managing known safe patterns

  • Faster CI/CD execution through incremental scanning approach

  • Flexible plugin architecture for research-specific customisation

Negative:

  • Divergence from organisational tooling standardisation

  • Potential knowledge silos between teams using different tools

  • Responsibility for maintaining tool-specific expertise within the team

Mitigation:

  • Comprehensive documentation of configuration and workflows