SmartEM CLI Troubleshooting Guide#
This guide provides solutions for common issues encountered when using the SmartEM Agent command-line interface. For comprehensive parameter documentation, see the CLI Reference.
Quick Diagnostics#
Check CLI Installation#
# Verify the CLI is accessible
python -m smartem_agent --help
# Check version and dependencies
python -c "import smartem_agent; print('SmartEM Agent available')"
Test Basic Functionality#
# Test with a known good directory
python -m smartem_agent validate /path/to/test/data
# Test API connectivity
python -m smartem_agent watch /tmp --dry-run --verbose
Common Issues and Solutions#
1. Command Not Found Errors#
Error: No module named 'smartem_agent'
#
Cause: SmartEM Agent package is not installed or not in Python path.
Solutions:
# Install in development mode
pip install -e .
# Install with all dependencies
pip install -e .[all]
# Verify installation
pip list | grep smartem
Alternative: Use the full path to the module:
PYTHONPATH=/path/to/smartem-decisions/src python -m smartem_agent --help
Error: python: can't open file 'smartem_agent'
#
Cause: Trying to run as a script instead of a module.
Solution: Use the module syntax:
# Correct
python -m smartem_agent watch /data
# Incorrect
python smartem_agent watch /data
2. Directory and File Access Issues#
Error: Directory /path/to/data does not exist
#
Diagnosis:
# Check if directory exists
ls -la /path/to/data
# Check parent directory
ls -la /path/to/
# Verify current working directory
pwd
Solutions:
Use absolute paths:
/full/path/to/directory
Verify directory spelling and case sensitivity
Check directory permissions:
ls -ld /path/to/data
Error: Permission denied
#
Cause: Insufficient permissions to read directories or write log files.
Solutions:
# Check directory permissions
ls -la /path/to/data
# Check write permissions for log file location
touch /path/to/logfile.log && rm /path/to/logfile.log
# Run with appropriate permissions
sudo python -m smartem_agent watch /data # Use sparingly
# Better: Fix directory permissions
chmod -R 755 /path/to/data
chown -R $USER:$GROUP /path/to/data
Error: Files not being detected during watch#
Diagnosis:
# Test with verbose output
python -m smartem_agent watch /data --dry-run -vv
# Check file patterns match
find /data -name "EpuSession.dm" -o -name "*.xml"
# Monitor filesystem events manually
inotifywait -m -r /data
Solutions:
Verify file patterns match EPU naming conventions
Check for symbolic links that might not be followed
Ensure files are completely written before processing
Consider filesystem-specific issues (NFS, network drives)
3. API Connection Problems#
Error: API at http://127.0.0.1:8000 is not reachable
#
Diagnosis:
# Test basic connectivity
curl -v http://127.0.0.1:8000/status
# Check if service is running
netstat -tlnp | grep 8000
# Test from Python
python -c "import requests; print(requests.get('http://127.0.0.1:8000/status').json())"
Solutions:
Start the backend API:
cd /path/to/smartem-decisions python -m smartem_backend.api_server
Use correct API URL:
# Local development python -m smartem_agent watch /data --api-url http://localhost:8000 # Remote server python -m smartem_agent watch /data --api-url https://smartem.example.com/api
Network connectivity issues:
Check firewall settings
Verify proxy configuration
Test with different network interfaces
Use
--dry-run
for offline testing
Error: SSE connection failed
or Failed to send heartbeat
#
Diagnosis:
# Test SSE endpoint manually
curl -H "Accept: text/event-stream" \
http://127.0.0.1:8000/agent/test-agent/session/test-session/instructions/stream
# Check heartbeat endpoint
curl -X POST \
-H "Content-Type: application/json" \
http://127.0.0.1:8000/agent/test-agent/session/test-session/heartbeat
Solutions:
Verify agent and session IDs:
# Ensure IDs exist in backend python -m smartem_agent watch /data \ --agent-id valid-agent-id \ --session-id valid-session-id
Adjust timeout settings:
# Increase timeouts for unstable connections python -m smartem_agent watch /data \ --agent-id agent-01 \ --session-id session-01 \ --sse-timeout 120 \ --heartbeat-interval 30
Check backend logs:
Monitor backend API logs for connection errors
Verify database connectivity
Check for authentication issues
4. Parsing and Validation Errors#
Error: Grid data dir is structurally invalid
#
Diagnosis:
# Get detailed validation errors
python -m smartem_agent validate /path/to/grid -vv
# Check directory structure
find /path/to/grid -type f -name "*.dm" -o -name "*.xml" | head -20
Common Issues:
Missing
EpuSession.dm
fileIncorrect directory naming conventions
Incomplete or corrupted files
Wrong directory structure (not EPU format)
Solutions:
Verify the directory contains valid EPU data
Check for required files:
EpuSession.dm
,Atlas/Atlas.dm
Ensure directory structure matches EPU conventions
Use
parse dir
command to identify specific parsing issues
Error: Could not extract instrument info
or parsing failures#
Diagnosis:
# Test individual file parsing
python -m smartem_agent parse session /path/to/EpuSession.dm -vv
# Check file integrity
file /path/to/EpuSession.dm
hexdump -C /path/to/EpuSession.dm | head -5
Solutions:
Verify files are not corrupted or partially written
Check file permissions and accessibility
Ensure files are in expected format (not binary corrupted)
Look for encoding issues or special characters in paths
5. Performance and Resource Issues#
Issue: High memory usage or slow processing#
Diagnosis:
# Monitor resource usage
top -p $(pgrep -f "smartem_agent")
htop
# Check directory size
du -sh /path/to/data
find /path/to/data -type f | wc -l
Solutions:
Optimise logging settings:
# Reduce logging frequency python -m smartem_agent watch /data --log-interval 30.0 # Reduce verbosity python -m smartem_agent watch /data # No -v flags
Process smaller datasets:
# Process single grids instead of entire sessions python -m smartem_agent watch /data/Grid_001
System resource limits:
# Increase file descriptor limits ulimit -n 4096 # Monitor disk space df -h /path/to/data
Issue: OSError: [Errno 24] Too many open files
#
Solutions:
# Check current limits
ulimit -a
# Increase file descriptor limit temporarily
ulimit -n 4096
# Increase permanently (add to ~/.bashrc)
echo "ulimit -n 4096" >> ~/.bashrc
6. Logging and Output Issues#
Issue: No log output or missing log files#
Diagnosis:
# Check log file permissions
ls -la fs_changes.log
# Verify log directory exists
ls -la $(dirname /path/to/custom.log)
# Test with different log location
python -m smartem_agent watch /data --log-file /tmp/test.log
Solutions:
Ensure log directory exists and is writable
Use absolute paths for log files
Check disk space availability
Verify SELinux/AppArmor policies if applicable
Issue: Verbose output not showing#
Solutions:
# Ensure correct verbose syntax
python -m smartem_agent watch /data -v # INFO level
python -m smartem_agent watch /data -vv # DEBUG level
# Check if output is being redirected
python -m smartem_agent watch /data -v 2>&1 | tee output.log
7. Signal Handling and Process Management#
Issue: Process doesn’t stop gracefully with Ctrl+C#
Solutions:
# Send SIGTERM for graceful shutdown
pkill -TERM -f "smartem_agent"
# Force kill if necessary
pkill -9 -f "smartem_agent"
# Use timeout for automatic termination
timeout 3600 python -m smartem_agent watch /data # Stop after 1 hour
Issue: Background process monitoring#
Solutions:
# Run in background with nohup
nohup python -m smartem_agent watch /data > smartem.log 2>&1 &
# Use screen or tmux for persistent sessions
screen -S smartem
python -m smartem_agent watch /data
# Monitor process status
ps aux | grep smartem_agent
Advanced Troubleshooting#
Debug Mode Activation#
Enable maximum debugging output:
python -m smartem_agent watch /data \
--dry-run \
--verbose --verbose \
--log-interval 1.0 \
--heartbeat-interval 10 \
--sse-timeout 10
Network Debugging#
Test connectivity chain:
# 1. Basic network connectivity
ping backend-host
# 2. Port connectivity
telnet backend-host 8000
# 3. HTTP connectivity
curl -v http://backend-host:8000/status
# 4. SSE connectivity
curl -v -H "Accept: text/event-stream" \
http://backend-host:8000/agent/test/session/test/instructions/stream
File System Monitoring Debug#
Manual file monitoring:
# Install inotify-tools (Linux)
sudo apt-get install inotify-tools
# Monitor directory changes
inotifywait -m -r --format '%w%f %e %T' --timefmt '%Y-%m-%d %H:%M:%S' /data
# Compare with agent detection
python -m smartem_agent watch /data --dry-run -vv
Database Connectivity Issues#
Test database connection (if using persistent storage):
# Check database connectivity
psql -h database-host -U username -d smartem_db -c "SELECT 1;"
# Verify tables exist
psql -h database-host -U username -d smartem_db -c "\\dt"
# Check for connection pooling issues
netstat -an | grep :5432
Environment-Specific Issues#
Windows Specific#
Path separator issues:
# Use forward slashes or raw strings
python -m smartem_agent watch "C:/data/microscopy"
python -m smartem_agent watch C:\\data\\microscopy
Service account permissions:
Ensure service account has appropriate file system access
Check Windows Defender exclusions for monitoring directories
Verify no Group Policy restrictions on file access
Linux/Unix Specific#
Permission issues:
# Check SELinux status
sestatus
# Check AppArmor status
sudo apparmor_status
# Verify no systemd restrictions
systemctl status user@$(id -u).service
NFS mounted directories:
# Check mount options
mount | grep nfs
# Test with local directory first
python -m smartem_agent watch /tmp/test_data --dry-run
Container/Docker Environments#
Volume mounting issues:
# Verify volume mounts
docker exec container-id ls -la /data
# Check container permissions
docker exec container-id id
docker exec container-id ls -la /data
Network connectivity in containers:
# Test from within container
docker exec container-id curl http://backend:8000/status
# Check container networking
docker network ls
docker inspect network-name
Getting Additional Help#
Collecting Diagnostic Information#
Create a diagnostic script:
#!/bin/bash
echo "=== System Information ==="
uname -a
python --version
echo "=== SmartEM Agent Status ==="
python -c "import smartem_agent; print('Available')" 2>&1
echo "=== Directory Information ==="
ls -la /path/to/data
echo "=== Network Connectivity ==="
curl -s http://127.0.0.1:8000/status 2>&1 || echo "API not reachable"
echo "=== Process Information ==="
ps aux | grep smartem
echo "=== Resource Usage ==="
free -h
df -h
Log Analysis#
Extract relevant log entries:
# Filter for errors
grep -i error fs_changes.log
# Filter by time range
awk '/2024-01-15 14:00:00/,/2024-01-15 15:00:00/' fs_changes.log
# Analyse patterns
grep "Heartbeat sent" fs_changes.log | tail -20
Reporting Issues#
When reporting issues, include:
Command used: Full command line with parameters
Error message: Complete error output
Environment: OS, Python version, installation method
Directory structure: Sample of the directory being processed
Logs: Relevant log entries with timestamps
Network information: API URLs, connectivity status
System resources: Available memory, disk space
Example issue report:
Command: python -m smartem_agent watch /data/Grid_001 --agent-id microscope-01 --session-id session-123 -v
Error: SSE connection failed: Connection refused
Environment:
- OS: Ubuntu 20.04 LTS
- Python: 3.12.1
- Installation: pip install -e .[all]
Directory: /data/Grid_001 (contains EpuSession.dm, 15 GridSquare directories)
API Status: curl http://127.0.0.1:8000/status returns {"status": "ok"}
Logs: (attach relevant log entries)
This comprehensive troubleshooting guide should resolve most common CLI issues. For backend-specific problems, consult the SmartEM Backend documentation.