A healthcare AI platform needed to ingest ECG recordings from clinic exam rooms into a cloud processing system. The recordings arrived through three different paths — clinician web uploads, SMB file shares on clinic networks, and EHR integration triggers. Clinical device ingestion operates under fundamentally different constraints than standard data pipelines: clinic networks use SMB file shares as the only programmatic interface, devices write files without notification mechanisms, network interruptions produce partial uploads, and the same recording frequently arrives through multiple paths. A configuration error that routed one organization's clinical data to another's storage would be both a compliance violation and a contractual breach.
When the same recording arrives through three paths and any misconfiguration leaks clinical data across organizations, the ingestion pipeline is the compliance boundary — not just a data pipe.
The platform was already processing web uploads and EHR-triggered ingestion through separate code paths. Adding SMB as a third parallel system would have created divergent processing logic, inconsistent audit trails, and bugs that only appeared when the same recording arrived through multiple paths. ML LABS built the device-to-cloud pipeline that unified all three ingestion paths onto a single processing queue, handled partial uploads without losing files, enforced per-organization data isolation at the storage level, and replaced manual operational scripts with event-driven scheduling — all deployed across test, US production, and UK production environments.
SMB File Share Infrastructure
Clinic devices write ECG recordings to a shared folder on the local network. The pipeline must discover those files, retrieve them, and deliver them to the cloud processing system — without losing files and without processing files that are still being written.
Cross-Organization Isolation
Each healthcare organization's file share sync operates with per-org credentials, per-org cloud storage destinations, and per-org processing configuration — enforcing isolation at the infrastructure layer rather than relying on application-level routing. When onboarding a new organization, automated provisioning creates the organization's storage, configures credentials, and registers the configuration, eliminating manual setup errors.
The pipeline validates file completeness before processing — intermittent network connections had been producing corrupt records downstream when incomplete files were ingested before the write finished.
File Handling and Archive
The pipeline handles cross-platform file naming inconsistencies that surfaced in production — such as case sensitivity differences between device output and the processing host. Successfully ingested files are archived to prevent re-ingestion while preserving the clinic's local copy for retention requirements. Archive behavior is configurable per organization.
Ingress Queue Unification
Web uploads and EHR integrations previously used separate processing paths. Building SMB as a third parallel system would have tripled the surface area for processing bugs and made cross-path deduplication impossible.
graph TD
A["Web Upload"] --> D["Unified Ingress<br/>Queue"]
B["SMB File Share"] --> D
C["EHR Integration"] --> D
D --> E["Serverless<br/>Processing"]
E --> F["AI Model<br/>Inference"]
F --> G["Clinical Results"]
style A fill:#1a1a2e,stroke:#0f3460,color:#fff
style B fill:#1a1a2e,stroke:#0f3460,color:#fff
style C fill:#1a1a2e,stroke:#0f3460,color:#fff
style D fill:#1a1a2e,stroke:#ffd700,color:#fff
style E fill:#1a1a2e,stroke:#ffd700,color:#fff
style F fill:#1a1a2e,stroke:#16c79a,color:#fff
style G fill:#1a1a2e,stroke:#16c79a,color:#fffEvery path — web upload, file share, EHR integration — now produces an identical message on a single processing queue. The downstream processor handles all recordings identically regardless of source, eliminating path-specific bugs and making deduplication work across all three paths without special handling. Every file that enters the system gets a log entry with the ingestion source, timestamp, and processing outcome.
Presigned Uploads and Large Files
Clinical files vary dramatically in size — standard ECG recordings, PDF reports, large waveform archives, and video files. Routing multi-megabyte uploads through application servers creates unnecessary latency and timeout failures.
ML LABS implemented presigned URL endpoints for direct-to-cloud-storage upload. The application server generates a time-limited presigned URL scoped to the organization's storage, and the client uploads directly without the file transiting through the backend. Cross-storage utilities handle the mechanics of moving data between organization-specific storage when ingestion and processing are separated.
Operational Automation
ML LABS replaced manual backup scripts with event-driven scheduling and serverless automation. Scheduled rules trigger serverless functions that perform backups without human initiation. Failures route to automated alarms rather than going unnoticed until someone checks a GitHub Actions run. CDN deployment was configured across all three environments with environment-specific configuration managing regional differences, ensuring changes are validated in test before reaching production clinical workflows.
When Complexity Exceeds Capacity
This architecture is tractable for a bounded number of clinic sites with consistent device types. It becomes categorically harder when the platform must support dozens of sites with heterogeneous device populations or when regulatory requirements span multiple jurisdictions with different data residency rules.
The SMB sync, per-org storage, queue infrastructure, and cross-storage utilities form a tightly coupled operational surface where a misconfiguration propagates silently until clinical data appears in the wrong place or fails to appear at all. The gap between "we have a working pipeline" and "the pipeline runs unattended across all sites" is where most clinical AI initiatives stall.
First Steps
- Unify your ingress queue. If your platform processes web uploads, file share data, and EHR integrations through separate code paths, converge them on a single queue with a common message format — this eliminates path-specific processing bugs that compound as you add sites.
- Automate per-organization isolation at provisioning time. Storage resources, sync credentials, and queue configuration for a new organization should be created by a single setup script, not a manual checklist — one missed step means a data isolation gap that may not surface until audit.
- Replace manual operational scripts with event-driven automation. Any recurring task that depends on someone remembering to run it — backups, sync monitoring, environment deployments — should run on a schedule with automated failure alerting.
Practical Solution Pattern
Build a unified device-to-cloud pipeline where every ingestion path produces an identical message on a single processing queue, with per-organization storage isolation enforced at provisioning time, partial upload recovery at the ingestion layer, and event-driven automation replacing manual operational scripts.
This works because ingress unification eliminates the path-specific processing branches that diverge and fail in untested combinations, while automated provisioning removes the human configuration errors that produce data isolation gaps. Organizations that build this infrastructure fastest concentrate the full operational picture — networking, storage, queuing, and deployment — in a single experienced operator rather than distributing it across teams that each understand the shared infrastructure partially. If your organization needs to scope a device-to-cloud pipeline, a Strategic Scoping Session can map ingestion paths, isolation requirements, and automation gaps before the build begins.