Development Tools#
Collection of utility tools for development, testing, and maintenance of the SmartEM Decisions project.
XML Formatting Tools#
Format XML Files for Human Readability#
Transform single-line XML and .dm files into human-readable format with proper indentation:
# Reformat all .xml and .dm files in a directory recursively
python tools/format_xml.py <directory_path> -r
# Process multiple directories
python tools/format_xml.py -r \
../smartem-decisions-test-datasets/metadata_Supervisor_20250114_220855_23_epuBSAd20_GrOxDDM \
../smartem-decisions-test-datasets/metadata_Supervisor_20241220_140307_72_et2_gangshun \
../smartem-decisions-test-datasets/metadata_Supervisor_20250108_101446_62_cm40593-1_EPU
# Display all available options
python tools/format_xml.py --help
Data Analysis and Debugging Tools#
Find Foil Hole Manifest Duplicates#
Identify duplicate foil hole manifests within directory structures to detect data inconsistencies:
# Display help and usage information
tools/find_foilhole_duplicates.py --help
# Example: Search for duplicates in test data
tools/find_foilhole_duplicates.py ./tests/testdata/bi37708-28
File Size Analysis#
List files matching specific patterns, sorted by size for storage analysis:
# Find GridSquare files sorted by size (largest first)
rg --files -g 'GridSquare_*.dm' ./tests/testdata/bi37708-28 \
| xargs -d '\n' ls -lh | sort -k5 -rn | awk '{print $9, $5}'
Test Dataset Management#
File Extension Analysis#
Analyse the composition of test datasets by file type:
# Recursively find all distinct file extensions with counts
find . -type f |
sed -E 's/.*\.([^.]+)$/\1/' |
grep -v "/" |
sort |
uniq -c |
sort -nr
Dataset Size Reduction#
Reduce test dataset storage requirements whilst maintaining directory structure:
# Empty image and data files whilst preserving metadata structure
find . -type f \( -name "*.jpg" -o -name "*.png" -o -name "*.mrc" \) -exec truncate -s 0 {} \;
Warning: This command permanently removes file contents. Use only on test datasets, not production data.
Development Monitoring#
Directory Growth Monitoring#
Monitor directory metrics during data acquisition or processing:
# Watch directory size and file count with 1-second updates
watch -n 1 'echo "Size: $(du -sh .)"; echo "Files: $(find . -type f | wc -l)"'
This tool is particularly useful for monitoring EPU data acquisition progress or debugging processing pipeline performance.
Additional Development Commands#
Pre-commit Workflow#
Maintain code quality during development:
# Run pre-commit checks on specific files
pre-commit run --files <file1> <file2>
# Run all pre-commit checks
pre-commit run --all-files
Testing and Quality Assurance#
# Run comprehensive test suite
pytest
# Type checking with pyright
pyright src tests
# Code formatting and linting
ruff check
ruff format