Run SmartEM Agent (EPU Agent)#
The SmartEM Agent is a data collection service that monitors EPU (Electron Physical User) output directories and communicates acquisition data to the backend service in real-time.
Overview#
The EPU agent runs on EPU workstations either as a Python script or bundled Windows binary. EPU workstations are typically Windows machines isolated from the main network, where specific connectivity is achieved through a proxy and configured via an allow-list. The primary purpose of the EPU agent is to parse EPU software output from the filesystem and communicate data and events to the core backend component.
An EPU data directory is generated by closed-source EPU software and represents an acquisition session using a cryo-electron microscope. The SmartEM Agent provides comprehensive capabilities for processing this data:
Core Capabilities#
Real-time monitoring: Watch EPU directories for file changes during active acquisitions
Comprehensive parsing: Extract data from all EPU file types (sessions, atlases, grid squares, foil holes, micrographs)
Data validation: Verify EPU directory structure and completeness
Backend integration: Communicate with SmartEM backend via REST API and Server-Sent Events (SSE)
Connection health: Automatic heartbeat monitoring for reliable data transmission
Flexible deployment: Run in development, testing, or production modes
Agent Modes#
The agent operates in several modes depending on the timing of EPU data acquisition:
Pre-acquisition mode: Watcher launched before EPU starts writing - real-time monitoring only
Mid-acquisition mode: Watcher launched after EPU starts writing - combines parsing existing files with real-time monitoring
Post-acquisition mode: Watcher launched after EPU finishes - parses complete dataset then monitors for changes
Quick Start#
For comprehensive parameter documentation, see the CLI Reference. For troubleshooting, see the CLI Troubleshooting Guide.
Basic Directory Monitoring#
# Monitor a directory with default settings
python -m smartem_agent watch /path/to/epu/data
# Monitor with verbose output
python -m smartem_agent watch /path/to/epu/data --verbose
# Dry run for testing (no API calls)
python -m smartem_agent watch /path/to/epu/data --dry-run --verbose
Production Deployment with Backend Integration#
# Full production setup with real-time communication
python -m smartem_agent watch /data/microscopy/active_session \
--api-url https://smartem-backend.facility.ac.uk \
--agent-id microscope-titan-01 \
--session-id session-20240115-001 \
--heartbeat-interval 45 \
--verbose
Command Categories#
The SmartEM Agent CLI is organised into three main command categories:
1. Parse Commands#
Extract and analyse data from EPU files without backend communication. Useful for development, debugging, and data validation.
2. Validate Commands#
Check EPU directory structure for completeness and compliance with expected formats.
3. Watch Commands#
Monitor directories in real-time for file changes with full backend integration.
Parsing Operations#
Parse commands extract and analyse data from EPU files without communicating with the backend API. These commands are ideal for development, debugging, data validation, and understanding EPU data structures.
Complete Directory Parsing#
Parse entire EPU directories containing multiple grids or complete acquisition sessions:
# Parse complete EPU session directory
python -m smartem_agent parse dir \
../smartem-decisions-test-datasets/metadata_Supervisor_20250108_101446_62_cm40593-1_EPU
# Parse different session types
python -m smartem_agent parse dir \
../smartem-decisions-test-datasets/metadata_Supervisor_20250114_220855_23_epuBSAd20_GrOxDDM \
--verbose
python -m smartem_agent parse dir \
../smartem-decisions-test-datasets/metadata_Supervisor_20241220_140307_72_et2_gangshun \
--verbose --verbose # Debug level output
Individual Component Parsing#
Parse specific EPU file types to understand data structures and debug issues:
Session Files#
# Parse EPU session manifest
python -m smartem_agent parse session \
../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250129_134723_36_bi37708-28_grid7_EPU/EpuSession.dm \
--verbose
Atlas Files#
# Parse atlas overview data
python -m smartem_agent parse atlas \
../smartem-decisions-test-datasets/bi37708-28-copy/atlas/Supervisor_20250129_111544_bi37708-28_atlas/Atlas/Atlas.dm \
--verbose
Grid Square Files#
# Parse grid square manifest (XML format)
python -m smartem_agent parse gridsquare \
../smartem-decisions-test-datasets/epu-Supervisor_20250404_164354_31_EPU_nr27313-442/metadata_Supervisor_20250404_164354_31_EPU_nr27313-442/Images-Disc1/GridSquare_3568837/GridSquare_20250404_171012.xml
# Parse grid square metadata (DM format)
python -m smartem_agent parse gridsquare-metadata \
../smartem-decisions-test-datasets/epu-Supervisor_20250404_164354_31_EPU_nr27313-442/metadata_Supervisor_20250404_164354_31_EPU_nr27313-442/Metadata/GridSquare_3568837.dm \
--verbose
# Alternative dataset example
python -m smartem_agent parse gridsquare-metadata \
./tests/testdata/bi37708-28/Supervisor_20250129_134723_36_bi37708-28_grid7_EPU/Metadata/GridSquare_29273435.dm
Foil Hole Files#
# Parse foil hole positioning data
python -m smartem_agent parse foilhole \
tests/testdata/epu-dir-example/Images-Disc1/GridSquare_8999138/FoilHoles/FoilHole_9015889_20250108_154725.xml \
--verbose
# Parse foil hole acquisition data (alternative location)
python -m smartem_agent parse micrograph \
../smartem-decisions-test-datasets/epu-Supervisor_20250404_164354_31_EPU_nr27313-442/metadata_Supervisor_20250404_164354_31_EPU_nr27313-442/Images-Disc1/GridSquare_3568837/Data/FoilHole_3595930_Data_3590445_56_20250405_084025.xml
Validation Operations#
Validation commands check EPU directory structure for completeness and compliance with expected formats.
Examples with Expected Outcomes#
Invalid Directories (Expected to Fail)#
These examples demonstrate directories with structural issues:
# Incomplete or malformed directories
python -m smartem_agent validate \
../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250129_114842_73_bi37708-28_grid7_EPU \
--verbose
python -m smartem_agent validate \
../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_105058_11 \
--verbose
python -m smartem_agent validate \
../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_145409_68
python -m smartem_agent validate \
../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_150924_1grid3
Valid Directories (Expected to Pass)#
These examples show properly structured EPU directories:
# Complete, well-formed directories
python -m smartem_agent validate \
../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250129_134723_36_bi37708-28_grid7_EPU \
--verbose
python -m smartem_agent validate \
../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_133418_68apoferritin \
--verbose
python -m smartem_agent validate \
../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_143856_44Practice
python -m smartem_agent validate \
../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_145409_68practice2
Understanding Validation Results#
Successful validation returns exit code 0 and confirms the directory structure is valid:
EPU project dir is structurally valid
Failed validation returns exit code 1 and lists specific issues:
Invalid EPU project dir. Found the following issues:
- Missing required file: EpuSession.dm
- Invalid directory structure: Images-Disc1 not found
- Incomplete atlas data: Atlas/Atlas.dm missing
Real-Time Monitoring (Watch Operations)#
The watch command provides real-time monitoring of EPU directories, automatically processing new files and communicating with the SmartEM backend. This is the primary operational mode for production deployments.
Basic Watch Operations#
# Simple directory monitoring (development mode)
python -m smartem_agent watch ../test-dir --log-file output.log
# Monitor with detailed logging
python -m smartem_agent watch /data/microscopy/active_session \
--log-file /var/log/smartem/session.log \
--log-interval 5.0 \
--verbose
# Dry run for testing (no backend communication)
python -m smartem_agent watch ../test-dir \
--dry-run \
--verbose --verbose
Production Monitoring with Backend Integration#
# Full production deployment
python -m smartem_agent watch /data/microscopy/active_session \
--api-url https://smartem-backend.facility.ac.uk \
--agent-id microscope-titan-01 \
--session-id session-20240115-001 \
--heartbeat-interval 45 \
--sse-timeout 60 \
--log-interval 10.0 \
--verbose
# High-frequency monitoring setup
python -m smartem_agent watch /data/high_throughput \
--api-url http://backend:8000 \
--agent-id facility-workstation-03 \
--session-id batch-processing-session \
--heartbeat-interval 30 \
--log-interval 5.0 \
--sse-timeout 120
Watch Operation Modes#
The watch command is designed to gracefully handle different timing scenarios relative to EPU data acquisition:
Pre-acquisition mode: Watcher launched before EPU starts writing
Only real-time monitoring is necessary
Files are processed as they are created
Most efficient mode for active acquisitions
Mid-acquisition mode: Watcher launched after EPU starts writing
Combines initial parsing of existing files with real-time monitoring
Automatically detects and processes pre-existing data
Seamlessly transitions to real-time monitoring
Post-acquisition mode: Watcher launched after EPU finishes
Parses complete dataset then monitors for any changes
Useful for processing archived or completed datasets
Continues monitoring for potential updates
Testing with Simulated EPU Data#
For development and testing, use the fsrecorder tool to simulate realistic EPU file writing patterns:
Recording EPU Patterns#
# Record filesystem events from an existing EPU dataset
python tools/fsrecorder/fsrecorder.py record \
../smartem-decisions-test-datasets/epu-Supervisor_20250326_145351_30_nt33824-10_grid2_1in5dil \
../test-recording.tar.gz
Replaying EPU Patterns#
# Replay recorded events with accelerated timing
python tools/fsrecorder/fsrecorder.py replay \
../test-recording.tar.gz \
../test-dir \
--fast
# Monitor the replayed data
python -m smartem_agent watch ../test-dir \
--dry-run \
--verbose --verbose \
--log-interval 2.0
Quick Testing Alternative#
# For rapid testing, copy data manually (less realistic timing)
cp -r "../smartem-decisions-test-datasets/epu-Supervisor_20250326_145351_30_nt33824-10_grid2_1in5dil/"* ../test-dir/
# Monitor the copied data
python -m smartem_agent watch ../test-dir --dry-run
Important Considerations#
Critical File: The
EpuSession.dm
file is essential for proper operation as it:
Provides references to atlas data
Triggers instantiation of new grid entities in the datastore
Contains acquisition metadata required for processing
Missing
EpuSession.dm
: Will prevent proper grid instantiation and data processing.
fsrecorder Tool: The
tools/fsrecorder/
utility provides accurate simulation of EPU file writing patterns with proper timing and ordering, making it ideal for development and testing scenarios.
Real-Time Communication Features#
When using --agent-id
and --session-id
parameters, the agent establishes real-time communication with the backend:
Server-Sent Events (SSE): Receives instructions and commands from the backend
Heartbeat Monitoring: Sends periodic heartbeats to maintain connection health
Automatic Reconnection: Handles connection failures with exponential backoff
Instruction Acknowledgment: Confirms receipt and processing of backend instructions
Performance Tuning#
Adjust parameters based on your deployment requirements:
High-frequency acquisitions: Lower
--log-interval
(1-5 seconds), lower--heartbeat-interval
(15-30 seconds)Stable networks: Standard
--sse-timeout
(30-60 seconds)Unstable networks: Higher
--sse-timeout
(120+ seconds), higher--heartbeat-interval
(60+ seconds)Development/testing: Use
--dry-run
to avoid backend communication