2020-11-19 13:11:57 Starting - Starting the training job...
2020-11-19 13:12:00 Starting - Launching requested ML instances......
2020-11-19 13:13:15 Starting - Preparing the instances for training......
2020-11-19 13:14:23 Downloading - Downloading input data......
2020-11-19 13:15:14 Training - Downloading the training image..Docker entrypoint called with argument(s): train
Running default environment configuration script
[11/19/2020 13:15:38 INFO 139720084928320] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-conf.json: {u'index_metric': u'L2', u'_tuning_objective_metric': u'', u'_num_gpus': u'auto', u'_log_level': u'info', u'feature_dim': u'auto', u'faiss_index_ivf_nlists': u'auto', u'epochs': u'1', u'index_type': u'faiss.Flat', u'_faiss_index_nprobe': u'5', u'_kvstore': u'dist_async', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000'}
[11/19/2020 13:15:38 INFO 139720084928320] Merging with provided configuration from /opt/ml/input/config/hyperparameters.json: {u'sample_size': u'105', u'feature_dim': u'4', u'predictor_type': u'classifier', u'k': u'3'}
[11/19/2020 13:15:38 INFO 139720084928320] Final configuration: {u'index_metric': u'L2', u'predictor_type': u'classifier', u'_tuning_objective_metric': u'', u'_num_gpus': u'auto', u'_log_level': u'info', u'feature_dim': u'4', u'faiss_index_ivf_nlists': u'auto', u'sample_size': u'105', u'epochs': u'1', u'index_type': u'faiss.Flat', u'_faiss_index_nprobe': u'5', u'_kvstore': u'dist_async', u'_num_kv_servers': u'1', u'mini_batch_size': u'5000', u'k': u'3'}
[11/19/2020 13:15:38 WARNING 139720084928320] Loggers have already been setup.
[11/19/2020 13:15:38 INFO 139720084928320] Launching parameter server for role scheduler
[11/19/2020 13:15:38 INFO 139720084928320] {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/5b076316-bd0e-4444-aa13-a25ae3b4e862', 'ECS_CONTAINER_METADATA_URI_V4': 'http://169.254.170.2/v4/5b076316-bd0e-4444-aa13-a25ae3b4e862', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'AWS_REGION': 'us-east-1', 'SAGEMAKER_METRICS_DIRECTORY': '/opt/ml/output/metrics/sagemaker', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'iris-job-20201119131155', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '2', 'HOSTNAME': 'ip-10-0-230-142.ec2.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/01d241cf-c2d1-4de1-b9a6-8a96a56664a4', 'PWD': '/', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:us-east-1:130636039461:training-job/iris-job-20201119131155', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
[11/19/2020 13:15:38 INFO 139720084928320] envs={'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/5b076316-bd0e-4444-aa13-a25ae3b4e862', 'ECS_CONTAINER_METADATA_URI_V4': 'http://169.254.170.2/v4/5b076316-bd0e-4444-aa13-a25ae3b4e862', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_NUM_WORKER': '1', 'DMLC_PS_ROOT_PORT': '9000', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.230.142', 'AWS_REGION': 'us-east-1', 'SAGEMAKER_METRICS_DIRECTORY': '/opt/ml/output/metrics/sagemaker', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'iris-job-20201119131155', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '2', 'HOSTNAME': 'ip-10-0-230-142.ec2.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/01d241cf-c2d1-4de1-b9a6-8a96a56664a4', 'DMLC_ROLE': 'scheduler', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:us-east-1:130636039461:training-job/iris-job-20201119131155', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
[11/19/2020 13:15:38 INFO 139720084928320] Launching parameter server for role server
[11/19/2020 13:15:38 INFO 139720084928320] {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/5b076316-bd0e-4444-aa13-a25ae3b4e862', 'ECS_CONTAINER_METADATA_URI_V4': 'http://169.254.170.2/v4/5b076316-bd0e-4444-aa13-a25ae3b4e862', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'AWS_REGION': 'us-east-1', 'SAGEMAKER_METRICS_DIRECTORY': '/opt/ml/output/metrics/sagemaker', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'iris-job-20201119131155', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '2', 'HOSTNAME': 'ip-10-0-230-142.ec2.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/01d241cf-c2d1-4de1-b9a6-8a96a56664a4', 'PWD': '/', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:us-east-1:130636039461:training-job/iris-job-20201119131155', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
[11/19/2020 13:15:38 INFO 139720084928320] envs={'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/5b076316-bd0e-4444-aa13-a25ae3b4e862', 'ECS_CONTAINER_METADATA_URI_V4': 'http://169.254.170.2/v4/5b076316-bd0e-4444-aa13-a25ae3b4e862', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_NUM_WORKER': '1', 'DMLC_PS_ROOT_PORT': '9000', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'SAGEMAKER_HTTP_PORT': '8080', 'HOME': '/root', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.230.142', 'AWS_REGION': 'us-east-1', 'SAGEMAKER_METRICS_DIRECTORY': '/opt/ml/output/metrics/sagemaker', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'iris-job-20201119131155', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '2', 'HOSTNAME': 'ip-10-0-230-142.ec2.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/01d241cf-c2d1-4de1-b9a6-8a96a56664a4', 'DMLC_ROLE': 'server', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:us-east-1:130636039461:training-job/iris-job-20201119131155', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
[11/19/2020 13:15:38 INFO 139720084928320] Environment: {'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/5b076316-bd0e-4444-aa13-a25ae3b4e862', 'ECS_CONTAINER_METADATA_URI_V4': 'http://169.254.170.2/v4/5b076316-bd0e-4444-aa13-a25ae3b4e862', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION': '2', 'DMLC_PS_ROOT_PORT': '9000', 'DMLC_NUM_WORKER': '1', 'SAGEMAKER_HTTP_PORT': '8080', 'PATH': '/opt/amazon/bin:/usr/local/nvidia/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/amazon/bin:/opt/amazon/bin', 'PYTHONUNBUFFERED': 'TRUE', 'CANONICAL_ENVROOT': '/opt/amazon', 'LD_LIBRARY_PATH': '/opt/amazon/lib/python2.7/site-packages/cv2/../../../../lib:/usr/local/nvidia/lib64:/opt/amazon/lib', 'LANG': 'en_US.utf8', 'DMLC_INTERFACE': 'eth0', 'SHLVL': '1', 'DMLC_PS_ROOT_URI': '10.0.230.142', 'AWS_REGION': 'us-east-1', 'SAGEMAKER_METRICS_DIRECTORY': '/opt/ml/output/metrics/sagemaker', 'NVIDIA_VISIBLE_DEVICES': 'void', 'TRAINING_JOB_NAME': 'iris-job-20201119131155', 'HOME': '/root', 'PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION': 'cpp', 'ENVROOT': '/opt/amazon', 'SAGEMAKER_DATA_PATH': '/opt/ml', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'NVIDIA_REQUIRE_CUDA': 'cuda>=9.0', 'OMP_NUM_THREADS': '2', 'HOSTNAME': 'ip-10-0-230-142.ec2.internal', 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/01d241cf-c2d1-4de1-b9a6-8a96a56664a4', 'DMLC_ROLE': 'worker', 'PWD': '/', 'DMLC_NUM_SERVER': '1', 'TRAINING_JOB_ARN': 'arn:aws:sagemaker:us-east-1:130636039461:training-job/iris-job-20201119131155', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2'}
Process 60 is a shell:scheduler.
Process 69 is a shell:server.
Process 1 is a worker.
[11/19/2020 13:15:38 INFO 139720084928320] Using default worker.
[11/19/2020 13:15:39 INFO 139720084928320] Checkpoint loading and saving are disabled.
[11/19/2020 13:15:39 INFO 139720084928320] nvidia-smi took: 0.0252599716187 secs to identify 0 gpus
[11/19/2020 13:15:39 INFO 139720084928320] Create Store: dist_async
[11/19/2020 13:15:39 ERROR 139720084928320] nvidia-smi: failed to run (127): /bin/sh: nvidia-smi: command not found
[11/19/2020 13:15:39 INFO 139720084928320] Using per-worker sample size = 105 (Available virtual memory = 15140818944 bytes, GPU free memory = 0 bytes, number of workers = 1). If an out-of-memory error occurs, choose a larger instance type, use dimension reduction, decrease sample_size, and/or decrease mini_batch_size.
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Number of Batches Since Last Reset": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Number of Records Since Last Reset": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Total Batches Seen": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Total Records Seen": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Max Records Seen Between Resets": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Reset Count": {"count": 1, "max": 0, "sum": 0.0, "min": 0}}, "EndTime": 1605791740.078393, "Dimensions": {"Host": "algo-1", "Meta": "init_train_data_iter", "Operation": "training", "Algorithm": "AWS/KNN"}, "StartTime": 1605791740.078323}
[11/19/2020 13:15:40 INFO 139720084928320] push reservoir to kv... 1 num_workers 0 rank
[11/19/2020 13:15:40 INFO 139720084928320] ...done (105)
[11/19/2020 13:15:40 INFO 139720084928320] #progress_metric: host=algo-1, completed 100 % of epochs
#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Number of Batches Since Last Reset": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Number of Records Since Last Reset": {"count": 1, "max": 105, "sum": 105.0, "min": 105}, "Total Batches Seen": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "Total Records Seen": {"count": 1, "max": 105, "sum": 105.0, "min": 105}, "Max Records Seen Between Resets": {"count": 1, "max": 105, "sum": 105.0, "min": 105}, "Reset Count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1605791740.103201, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "AWS/KNN", "epoch": 0}, "StartTime": 1605791740.07875}
[11/19/2020 13:15:40 INFO 139720084928320] #throughput_metric: host=algo-1, train throughput=4267.74994428 records/second
[11/19/2020 13:15:40 INFO 139720084928320] pulled row count... worker 0 rows 105
[11/19/2020 13:15:40 INFO 139720084928320] pulled... worker 0 data (105, 4) labels (105,) nans 0
[11/19/2020 13:15:40 INFO 139720084928320] calling index.train...
[11/19/2020 13:15:40 INFO 139720084928320] ...done calling index.train
[11/19/2020 13:15:40 INFO 139720084928320] calling index.add...
[11/19/2020 13:15:40 INFO 139720084928320] ...done calling index.add
#metrics {"Metrics": {"epochs": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "model.serialize.time": {"count": 1, "max": 2.8769969940185547, "sum": 2.8769969940185547, "min": 2.8769969940185547}, "finalize.time": {"count": 1, "max": 4.1141510009765625, "sum": 4.1141510009765625, "min": 4.1141510009765625}, "initialize.time": {"count": 1, "max": 866.8332099914551, "sum": 866.8332099914551, "min": 866.8332099914551}, "update.time": {"count": 1, "max": 24.17898178100586, "sum": 24.17898178100586, "min": 24.17898178100586}}, "EndTime": 1605791740.110562, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/KNN"}, "StartTime": 1605791739.211127}
[11/19/2020 13:15:40 INFO 139720084928320] Test data is not provided.
#metrics {"Metrics": {"totaltime": {"count": 1, "max": 1694.9079036712646, "sum": 1694.9079036712646, "min": 1694.9079036712646}, "setuptime": {"count": 1, "max": 37.4448299407959, "sum": 37.4448299407959, "min": 37.4448299407959}}, "EndTime": 1605791740.111786, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "AWS/KNN"}, "StartTime": 1605791740.110652}
2020-11-19 13:15:51 Uploading - Uploading generated training model
2020-11-19 13:15:51 Completed - Training job completed
Training seconds: 88
Billable seconds: 88