SmartEM CLI Troubleshooting Guide#

This guide provides solutions for common issues encountered when using the SmartEM Agent command-line interface. For comprehensive parameter documentation, see the CLI Reference.

Quick Diagnostics#

Check CLI Installation#

# Verify the CLI is accessible
python -m smartem_agent --help

# Check version and dependencies
python -c "import smartem_agent; print('SmartEM Agent available')"

Test Basic Functionality#

# Test with a known good directory
python -m smartem_agent validate /path/to/test/data

# Test API connectivity
python -m smartem_agent watch /tmp --dry-run --verbose

Common Issues and Solutions#

1. Command Not Found Errors#

Error: `No module named 'smartem_agent'`#

Cause: SmartEM Agent package is not installed or not in Python path.

Solutions:

# Install in development mode
pip install -e .

# Install with all dependencies
pip install -e .[all]

# Verify installation
pip list | grep smartem

Alternative: Use the full path to the module:

PYTHONPATH=/path/to/smartem-decisions/src python -m smartem_agent --help

Error: `python: can't open file 'smartem_agent'`#

Cause: Trying to run as a script instead of a module.

Solution: Use the module syntax:

# Correct
python -m smartem_agent watch /data

# Incorrect
python smartem_agent watch /data

2. Directory and File Access Issues#

Error: `Directory /path/to/data does not exist`#

Diagnosis:

# Check if directory exists
ls -la /path/to/data

# Check parent directory
ls -la /path/to/

# Verify current working directory
pwd

Solutions:

Use absolute paths: /full/path/to/directory
Verify directory spelling and case sensitivity
Check directory permissions: ls -ld /path/to/data

Error: `Permission denied`#

Cause: Insufficient permissions to read directories or write log files.

Solutions:

# Check directory permissions
ls -la /path/to/data

# Check write permissions for log file location
touch /path/to/logfile.log && rm /path/to/logfile.log

# Run with appropriate permissions
sudo python -m smartem_agent watch /data  # Use sparingly

# Better: Fix directory permissions
chmod -R 755 /path/to/data
chown -R $USER:$GROUP /path/to/data

Error: Files not being detected during watch#

Diagnosis:

# Test with verbose output
python -m smartem_agent watch /data --dry-run -vv

# Check file patterns match
find /data -name "EpuSession.dm" -o -name "*.xml"

# Monitor filesystem events manually
inotifywait -m -r /data

Solutions:

Verify file patterns match EPU naming conventions
Check for symbolic links that might not be followed
Ensure files are completely written before processing
Consider filesystem-specific issues (NFS, network drives)

3. API Connection Problems#

Error: `API at http://127.0.0.1:8000 is not reachable`#

Diagnosis:

# Test basic connectivity
curl -v http://127.0.0.1:8000/status

# Check if service is running
netstat -tlnp | grep 8000

# Test from Python
python -c "import requests; print(requests.get('http://127.0.0.1:8000/status').json())"

Solutions:

Start the backend API:

cd /path/to/smartem-decisions
python -m smartem_backend.api_server

Use correct API URL:

# Local development
python -m smartem_agent watch /data --api-url http://localhost:8000

# Remote server
python -m smartem_agent watch /data --api-url https://smartem.example.com/api

Network connectivity issues:
- Check firewall settings
- Verify proxy configuration
- Test with different network interfaces
- Use --dry-run for offline testing

Error: `SSE connection failed` or `Failed to send heartbeat`#

Diagnosis:

# Test SSE endpoint manually
curl -H "Accept: text/event-stream" \
     http://127.0.0.1:8000/agent/test-agent/session/test-session/instructions/stream

# Check heartbeat endpoint
curl -X POST \
     -H "Content-Type: application/json" \
     http://127.0.0.1:8000/agent/test-agent/session/test-session/heartbeat

Solutions:

Verify agent and session IDs:

# Ensure IDs exist in backend
python -m smartem_agent watch /data \
  --agent-id valid-agent-id \
  --session-id valid-session-id

Adjust timeout settings:

# Increase timeouts for unstable connections
python -m smartem_agent watch /data \
  --agent-id agent-01 \
  --session-id session-01 \
  --sse-timeout 120 \
  --heartbeat-interval 30

Check backend logs:
- Monitor backend API logs for connection errors
- Verify database connectivity
- Check for authentication issues

4. Parsing and Validation Errors#

Error: `Grid data dir is structurally invalid`#

Diagnosis:

# Get detailed validation errors
python -m smartem_agent validate /path/to/grid -vv

# Check directory structure
find /path/to/grid -type f -name "*.dm" -o -name "*.xml" | head -20

Common Issues:

Missing EpuSession.dm file
Incorrect directory naming conventions
Incomplete or corrupted files
Wrong directory structure (not EPU format)

Solutions:

Verify the directory contains valid EPU data
Check for required files: EpuSession.dm, Atlas/Atlas.dm
Ensure directory structure matches EPU conventions
Use parse dir command to identify specific parsing issues

Error: `Could not extract instrument info` or parsing failures#

Diagnosis:

# Test individual file parsing
python -m smartem_agent parse session /path/to/EpuSession.dm -vv

# Check file integrity
file /path/to/EpuSession.dm
hexdump -C /path/to/EpuSession.dm | head -5

Solutions:

Verify files are not corrupted or partially written
Check file permissions and accessibility
Ensure files are in expected format (not binary corrupted)
Look for encoding issues or special characters in paths

5. Performance and Resource Issues#

Issue: High memory usage or slow processing#

Diagnosis:

# Monitor resource usage
top -p $(pgrep -f "smartem_agent")
htop

# Check directory size
du -sh /path/to/data
find /path/to/data -type f | wc -l

Solutions:

Optimise logging settings:

# Reduce logging frequency
python -m smartem_agent watch /data --log-interval 30.0

# Reduce verbosity
python -m smartem_agent watch /data  # No -v flags

Process smaller datasets:

# Process single grids instead of entire sessions
python -m smartem_agent watch /data/Grid_001

System resource limits:

# Increase file descriptor limits
ulimit -n 4096

# Monitor disk space
df -h /path/to/data

Issue: `OSError: [Errno 24] Too many open files`#

Solutions:

# Check current limits
ulimit -a

# Increase file descriptor limit temporarily
ulimit -n 4096

# Increase permanently (add to ~/.bashrc)
echo "ulimit -n 4096" >> ~/.bashrc

6. Logging and Output Issues#

Issue: No log output or missing log files#

Diagnosis:

# Check log file permissions
ls -la fs_changes.log

# Verify log directory exists
ls -la $(dirname /path/to/custom.log)

# Test with different log location
python -m smartem_agent watch /data --log-file /tmp/test.log

Solutions:

Ensure log directory exists and is writable
Use absolute paths for log files
Check disk space availability
Verify SELinux/AppArmor policies if applicable

Issue: Verbose output not showing#

Solutions:

# Ensure correct verbose syntax
python -m smartem_agent watch /data -v      # INFO level
python -m smartem_agent watch /data -vv     # DEBUG level

# Check if output is being redirected
python -m smartem_agent watch /data -v 2>&1 | tee output.log

7. Signal Handling and Process Management#

Issue: Process doesn’t stop gracefully with Ctrl+C#

Solutions:

# Send SIGTERM for graceful shutdown
pkill -TERM -f "smartem_agent"

# Force kill if necessary
pkill -9 -f "smartem_agent"

# Use timeout for automatic termination
timeout 3600 python -m smartem_agent watch /data  # Stop after 1 hour

Issue: Background process monitoring#

Solutions:

# Run in background with nohup
nohup python -m smartem_agent watch /data > smartem.log 2>&1 &

# Use screen or tmux for persistent sessions
screen -S smartem
python -m smartem_agent watch /data

# Monitor process status
ps aux | grep smartem_agent

Advanced Troubleshooting#

Debug Mode Activation#

Enable maximum debugging output:

python -m smartem_agent watch /data \
  --dry-run \
  --verbose --verbose \
  --log-interval 1.0 \
  --heartbeat-interval 10 \
  --sse-timeout 10

Network Debugging#

Test connectivity chain:

# 1. Basic network connectivity
ping backend-host

# 2. Port connectivity
telnet backend-host 8000

# 3. HTTP connectivity
curl -v http://backend-host:8000/status

# 4. SSE connectivity
curl -v -H "Accept: text/event-stream" \
     http://backend-host:8000/agent/test/session/test/instructions/stream

File System Monitoring Debug#

Manual file monitoring:

# Install inotify-tools (Linux)
sudo apt-get install inotify-tools

# Monitor directory changes
inotifywait -m -r --format '%w%f %e %T' --timefmt '%Y-%m-%d %H:%M:%S' /data

# Compare with agent detection
python -m smartem_agent watch /data --dry-run -vv

Database Connectivity Issues#

Test database connection (if using persistent storage):

# Check database connectivity
psql -h database-host -U username -d smartem_db -c "SELECT 1;"

# Verify tables exist
psql -h database-host -U username -d smartem_db -c "\\dt"

# Check for connection pooling issues
netstat -an | grep :5432

Environment-Specific Issues#

Windows Specific#

Path separator issues:

# Use forward slashes or raw strings
python -m smartem_agent watch "C:/data/microscopy"
python -m smartem_agent watch C:\\data\\microscopy

Service account permissions:

Ensure service account has appropriate file system access
Check Windows Defender exclusions for monitoring directories
Verify no Group Policy restrictions on file access

Linux/Unix Specific#

Permission issues:

# Check SELinux status
sestatus

# Check AppArmor status
sudo apparmor_status

# Verify no systemd restrictions
systemctl status user@$(id -u).service

NFS mounted directories:

# Check mount options
mount | grep nfs

# Test with local directory first
python -m smartem_agent watch /tmp/test_data --dry-run

Container/Docker Environments#

Volume mounting issues:

# Verify volume mounts
docker exec container-id ls -la /data

# Check container permissions
docker exec container-id id
docker exec container-id ls -la /data

Network connectivity in containers:

# Test from within container
docker exec container-id curl http://backend:8000/status

# Check container networking
docker network ls
docker inspect network-name

Getting Additional Help#

Collecting Diagnostic Information#

Create a diagnostic script:

#!/bin/bash
echo "=== System Information ==="
uname -a
python --version

echo "=== SmartEM Agent Status ==="
python -c "import smartem_agent; print('Available')" 2>&1

echo "=== Directory Information ==="
ls -la /path/to/data

echo "=== Network Connectivity ==="
curl -s http://127.0.0.1:8000/status 2>&1 || echo "API not reachable"

echo "=== Process Information ==="
ps aux | grep smartem

echo "=== Resource Usage ==="
free -h
df -h

Log Analysis#

Extract relevant log entries:

# Filter for errors
grep -i error fs_changes.log

# Filter by time range
awk '/2024-01-15 14:00:00/,/2024-01-15 15:00:00/' fs_changes.log

# Analyse patterns
grep "Heartbeat sent" fs_changes.log | tail -20

Reporting Issues#

When reporting issues, include:

Command used: Full command line with parameters
Error message: Complete error output
Environment: OS, Python version, installation method
Directory structure: Sample of the directory being processed
Logs: Relevant log entries with timestamps
Network information: API URLs, connectivity status
System resources: Available memory, disk space

Example issue report:

Command: python -m smartem_agent watch /data/Grid_001 --agent-id microscope-01 --session-id session-123 -v

Error: SSE connection failed: Connection refused

Environment:
- OS: Ubuntu 20.04 LTS
- Python: 3.12.1
- Installation: pip install -e .[all]

Directory: /data/Grid_001 (contains EpuSession.dm, 15 GridSquare directories)

API Status: curl http://127.0.0.1:8000/status returns {"status": "ok"}

Logs: (attach relevant log entries)

This comprehensive troubleshooting guide should resolve most common CLI issues. For backend-specific problems, consult the SmartEM Backend documentation.

SmartEM CLI Troubleshooting Guide#

Quick Diagnostics#

Check CLI Installation#

Test Basic Functionality#

Common Issues and Solutions#

1. Command Not Found Errors#

Error: No module named 'smartem_agent'#

Error: python: can't open file 'smartem_agent'#

2. Directory and File Access Issues#

Error: Directory /path/to/data does not exist#

Error: Permission denied#

Error: Files not being detected during watch#

3. API Connection Problems#

Error: API at http://127.0.0.1:8000 is not reachable#

Error: SSE connection failed or Failed to send heartbeat#

4. Parsing and Validation Errors#

Error: Grid data dir is structurally invalid#

Error: Could not extract instrument info or parsing failures#

5. Performance and Resource Issues#

Issue: High memory usage or slow processing#

Issue: OSError: [Errno 24] Too many open files#

6. Logging and Output Issues#

Issue: No log output or missing log files#

Issue: Verbose output not showing#

7. Signal Handling and Process Management#

Issue: Process doesn’t stop gracefully with Ctrl+C#

Issue: Background process monitoring#

Advanced Troubleshooting#

Debug Mode Activation#

Network Debugging#

File System Monitoring Debug#

Database Connectivity Issues#

Environment-Specific Issues#

Windows Specific#

Linux/Unix Specific#

Container/Docker Environments#

Getting Additional Help#

Collecting Diagnostic Information#

Log Analysis#

Reporting Issues#

This Page

Error: `No module named 'smartem_agent'`#

Error: `python: can't open file 'smartem_agent'`#

Error: `Directory /path/to/data does not exist`#

Error: `Permission denied`#

Error: `API at http://127.0.0.1:8000 is not reachable`#

Error: `SSE connection failed` or `Failed to send heartbeat`#

Error: `Grid data dir is structurally invalid`#

Error: `Could not extract instrument info` or parsing failures#

Issue: `OSError: [Errno 24] Too many open files`#