Run SmartEM Agent (EPU Agent)#

The data collection agent that monitors EPU output directories and communicates acquisition data to the backend service.

The EPU agent runs on EPU workstations either as a Python script or bundled Windows binary. EPU workstations are Windows machines isolated from the main network, where specific connectivity is achieved through a proxy and configured via an allow-list. The primary purpose of the EPU agent is to parse EPU software output from the filesystem and communicate data and events to the core backend component.

An EPU data directory is generated by closed-source EPU software and represents an acquisition session using a cryo-electron microscope. The agent can:

  • Parse data out of specific types of files typically found in EPU directories

  • Validate a finished EPU directory for correct structure and completeness

  • Parse out complete acquisition dataset from a finished EPU directory

  • Parse out incomplete acquisition dataset from an unfinished EPU directory

  • Run in filesystem watcher mode to incrementally update the acquisition dataset as EPU directory is written to

  • Run in default mode - combining the above to make it safe starting the data intake at any point in relation to EPU execution

Agent Parse and Validate Operations#

# parse complete EPU directory
python -m smartem_agent parse dir \
  ../smartem-decisions-test-datasets/metadata_Supervisor_20250108_101446_62_cm40593-1_EPU

python -m smartem_agent parse dir \
  ../smartem-decisions-test-datasets/metadata_Supervisor_20250114_220855_23_epuBSAd20_GrOxDDM

python -m smartem_agent parse dir \
  ../smartem-decisions-test-datasets/metadata_Supervisor_20241220_140307_72_et2_gangshun

# parse things
python -m smartem_agent parse session \
  ../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250129_134723_36_bi37708-28_grid7_EPU/EpuSession.dm

python -m smartem_agent parse atlas \
  ../smartem-decisions-test-datasets/bi37708-28-copy/atlas/Supervisor_20250129_111544_bi37708-28_atlas/Atlas/Atlas.dm

python -m smartem_agent parse gridsquare \
  ../smartem-decisions-test-datasets/epu-Supervisor_20250404_164354_31_EPU_nr27313-442/metadata_Supervisor_20250404_164354_31_EPU_nr27313-442/Images-Disc1/GridSquare_3568837/GridSquare_20250404_171012.xml

python -m smartem_agent parse gridsquare-metadata \
  ../smartem-decisions-test-datasets/epu-Supervisor_20250404_164354_31_EPU_nr27313-442/metadata_Supervisor_20250404_164354_31_EPU_nr27313-442/Metadata/GridSquare_3568837.dm

python -m smartem_agent parse gridsquare-metadata \
  ./tests/testdata/bi37708-28/Supervisor_20250129_134723_36_bi37708-28_grid7_EPU/Metadata/GridSquare_29273435.dm

python -m smartem_agent parse foilhole \
  tests/testdata/epu-dir-example/Images-Disc1/GridSquare_8999138/FoilHoles/FoilHole_9015889_20250108_154725.xml

python -m smartem_agent parse gridsquare-metadata \
  ../smartem-decisions-test-datasets/epu-Supervisor_20250404_164354_31_EPU_nr27313-442/metadata_Supervisor_20250404_164354_31_EPU_nr27313-442/Images-Disc1/GridSquare_3568837/Data/FoilHole_3595930_Data_3590445_56_20250405_084025.xml

# Validate epu project dirs (expect failure):
python -m smartem_agent validate \
  ../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250129_114842_73_bi37708-28_grid7_EPU

python -m smartem_agent validate \
  ../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_105058_11

python -m smartem_agent validate \
  ../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_145409_68

python -m smartem_agent validate \
  ../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_150924_1grid3

# Validate epu project dirs (expect success):
python -m smartem_agent validate \
  ../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250129_134723_36_bi37708-28_grid7_EPU

python -m smartem_agent validate \
  ../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_133418_68apoferritin

python -m smartem_agent validate \
  ../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_143856_44Practice

python -m smartem_agent validate \
  ../smartem-decisions-test-datasets/bi37708-28-copy/Supervisor_20250130_145409_68practice2

Agent Watch Operations#

The agent monitors EPU output directories for changes:

# Launch the watcher:
python -m smartem_agent watch ../test-dir --log-file output.log

# For testing incremental file writes, use the fsrecorder tool to simulate EPU behavior:
# First, record from an existing EPU dataset:
python tools/fsrecorder/fsrecorder.py record \
  ../smartem-decisions-test-datasets/epu-Supervisor_20250326_145351_30_nt33824-10_grid2_1in5dil \
  ../test-recording.tar.gz

# Then replay it to your test directory with accelerated timing:
python tools/fsrecorder/fsrecorder.py replay ../test-recording.tar.gz ../test-dir --fast

# Alternatively, for quick testing, copy data manually:
cp -r "../smartem-decisions-test-datasets/epu-Supervisor_20250326_145351_30_nt33824-10_grid2_1in5dil/"* ../test-dir/

Note: The fsrecorder tool (tools/fsrecorder/) provides accurate simulation of EPU file writing patterns with proper timing and ordering. The absence of EpuSession.dm file is pretty much a show-stopper as it provides references to atlas and is a trigger for instantiating a new grid entity in the internal datastore.

A watch operation is designed to gracefully handle one of the following invocation scenarios:

  1. watcher launched before EPU starts writing to filesystem - only watcher is necessary

  2. watcher launched after EPU starts writing to filesystem - both parser and watcher are necessary to pickup pre-existing and new writes

  3. watcher launched after EPU finishes writing to filesystem - only parser is necessary