You're probably going to run into the problem that people didn't anticipate your strategy if you try to run a model on a GPU with way more memory than the host system. I'm not sure many execution frameworks can go straight from disk to GPU RAM. Also, storage speed for loading the model might be an issue on an SOC that boots off e.g. an SD card.
An eGPU dock should do CUDA just as well as an internal GPU, as far as I know. But you would need the drivers installed.