Running the model for Inference
Running a trained model for inference is also CPU intensive process. Specially if it is supposed to work for multiple users at the same time. A best solution is to run it in Google Cloud as a docker container with auto scaling.
Signing up for Google Cloud does require payment information, but it is free for the first $300. You can do a lot without a payment. More information on this is available at https://cloud.google.com/ai-platform/training/docs/getting-started-pytorch
Google Cloud Platform has built in cloudshell in a browser. run the following commands to build GCP model and deploy the docker image
gcloud beta artifacts repositories create en-ti-repo --repository-format=docker --location=europe-west1
docker build --tag=europe-west1-docker.pkg.dev/my-translate/en-ti-repo/serve-en-ti .
docker push europe-west1-docker.pkg.dev/my-translate/en-ti-repo/serve-en-ti
#login
#gcloud auth configure-docker europe-west1-docker.pkg.dev
#create model
gcloud beta ai-platform models create enTiModel --region=europe-west1 --enable-logging --enable-console-logging
#create version
gcloud beta ai-platform versions create v1 --region=europe-west1 --model=enTiModel --machine-type=n1-highcpu-8 --image=europe-west1-docker.pkg.dev/my-translate/en-ti-repo/serve-en-ti --ports=8080 --health-route=/ping --predict-route=/predictions/en-ti
gcloud beta ai-platform versions create v1 --region=europe-west1 --model=enTiModel --machine-type=n1-highcpu-8 --image=europe-west1-docker.pkg.dev/my-translate/en-ti-repo/serve-en-ti --ports=8080 --health-route=/ping --predict-route=/predictions/en-ti --accelerator=1,nvidia-tesla-k80
#test
#curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json; charset=utf-8" -d @instances.json https://europe-west1-ml.googleapis.com/v1/projects/my-translate/models/enTiModel/versions/v1:predict