{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# SageMaker Inference Recommender for a scikit-learn model\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "\n", "## Contents\n", "[1. Introduction](#1.-Introduction) \n", "[2. Download the Model & payload](#2.-Download-the-Model-&-payload) \n", "[3. Machine Learning model details](#3.-Machine-Learning-model-details) \n", "[4. Register Model Version/Package](#4.-Register-Model-Version/Package) \n", "[5. Create a SageMaker Inference Recommender Default Job](#5:-Create-a-SageMaker-Inference-Recommender-Default-Job) \n", "[6. Instance Recommendation Results](#6.-Instance-Recommendation-Results) \n", "[7. Create a SageMaker Inference Recommender Advanced Job](#7.-Custom-Load-Test) \n", "[8. Describe result of an Advanced Job](#8.-Custom-Load-Test-Results) " ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 1. Introduction\n", "\n", "SageMaker Inference Recommender is a new capability of SageMaker that reduces the time required to get machine learning (ML) models in production by automating performance benchmarking and load testing models across SageMaker ML instances. You can use Inference Recommender to deploy your model to a real-time inference endpoint that delivers the best performance at the lowest cost. \n", "\n", "Get started with Inference Recommender on SageMaker in minutes while selecting an instance and get an optimized endpoint configuration in hours, eliminating weeks of manual testing and tuning time.\n" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "To begin, let's update the required packages i.e. SageMaker Python SDK, `boto3`, `botocore` and `awscli`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": {}, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!pip install -U sagemaker" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": {}, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import sys\n", "\n", "!{sys.executable} -m pip install sagemaker botocore boto3 awscli --upgrade\n", "!pip install --upgrade pip awscli botocore boto3 --quiet" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 2. Download the Model & payload \n", "\n", "In this example, we are using a pre-trained scikit-learn model, trained on the California Housing dataset, present in Scikit-Learn: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html. The California Housing dataset was originally published in:\n", "\n", "> Pace, R. Kelley, and Ronald Barry. \"Sparse spatial auto-regressions.\" Statistics & Probability Letters 33.3 (1997): 291-297." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "from sagemaker import get_execution_role, Session, image_uris\n", "from sklearn.datasets import fetch_california_housing\n", "from sklearn.model_selection import train_test_split\n", "import pandas as pd\n", "import boto3\n", "import datetime\n", "import time\n", "import os\n", "\n", "region = boto3.Session().region_name\n", "role = get_execution_role()\n", "sagemaker_session = Session()\n", "\n", "print(region)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "export_dir = \"./model/\"\n", "\n", "if not os.path.exists(export_dir):\n", " os.makedirs(export_dir)\n", " print(\"Directory \", export_dir, \" Created \")\n", "else:\n", " print(\"Directory \", export_dir, \" already exists\")\n", "\n", "model_archive_name = \"sk-model.tar.gz\"\n", "sourcedir_archive_name = \"sourcedir.tar.gz\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!aws s3 cp s3://aws-ml-blog/artifacts/scikit_learn_bring_your_own_model/model.joblib {export_dir}" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Tar the model and code" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!cd model && tar -cvpzf ../{model_archive_name} *" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!cd code && tar -cvpzf ../{sourcedir_archive_name} *" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Download the payload " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "payload_location = \"./sample-payload/\"\n", "\n", "if not os.path.exists(payload_location):\n", " os.makedirs(payload_location)\n", " print(\"Directory \", payload_location, \" Created \")\n", "else:\n", " print(\"Directory \", payload_location, \" already exists\")\n", "\n", "payload_archive_name = \"sk_payload.tar.gz\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "data = fetch_california_housing()\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " data.data, data.target, test_size=0.25, random_state=42\n", ")\n", "\n", "# we don't train a model, so we will need only the testing data\n", "testX = pd.DataFrame(X_test, columns=data.feature_names)\n", "# Save testing data to CSV\n", "testX[data.feature_names].head(10).to_csv(\n", " os.path.join(payload_location, \"test_data.csv\"), header=False, index=False\n", ")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Tar the payload" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!cd ./sample-payload/ && tar czvf ../{payload_archive_name} *" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Upload to S3\n", "\n", "We now have a model archive ready. We need to upload it to S3 before we can use it with Inference Recommender, so we will use the SageMaker Python SDK to handle the upload.\n", "\n", "We need to create an archive that contains individual files that Inference Recommender can send to your SageMaker Endpoints. Inference Recommender will randomly sample files from this archive so make sure it contains a similar distribution of payloads you'd expect in production. Note that your inference code must be able to read in the file formats from the sample payload." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%time\n", "\n", "import os\n", "import boto3\n", "import re\n", "import copy\n", "import time\n", "from time import gmtime, strftime\n", "import sagemaker\n", "from sagemaker import get_execution_role\n", "\n", "# S3 bucket for saving code and model artifacts.\n", "# Feel free to specify a different bucket and prefix\n", "bucket = sagemaker.Session().default_bucket()\n", "\n", "prefix = \"sagemaker/scikit-learn-inference-recommender\"\n", "\n", "sample_payload_url = sagemaker.Session().upload_data(\n", " payload_archive_name, bucket=bucket, key_prefix=prefix + \"/inference\"\n", ")\n", "sourcedir_url = sagemaker.Session().upload_data(\n", " sourcedir_archive_name, bucket=bucket, key_prefix=prefix + \"/california_housing/sourcedir\"\n", ")\n", "model_url = sagemaker.Session().upload_data(\n", " model_archive_name, bucket=bucket, key_prefix=prefix + \"/california_housing/model\"\n", ")\n", "\n", "\n", "print(sample_payload_url)\n", "print(sourcedir_url)\n", "print(model_url)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 3. Machine Learning model details\n", "\n", "Inference Recommender uses information about your ML model to recommend the best instance types and endpoint configurations for deployment. You can provide as much or as little information as you'd like and Inference Recommender will use that to provide recommendations.\n", "\n", "Example ML Domains: `COMPUTER_VISION`, `NATURAL_LANGUAGE_PROCESSING`, `MACHINE_LEARNING`\n", "\n", "Example ML Tasks: `CLASSIFICATION`, `REGRESSION`, `OBJECT_DETECTION`, `OTHER`\n", "\n", "Note: Select the task that is the closest match to your model. Chose `OTHER` if none apply.\n", "\n", "Example Model name: `resnet50`, `yolov4`, `xgboost` etc\n", "\n", "Use list_model_metadata API to fetch the list of available models. This will help you to pick the closest model for better recommendation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import boto3\n", "import pandas as pd\n", "\n", "client = boto3.client(\"sagemaker\", region)\n", "\n", "list_model_metadata_response = client.list_model_metadata()\n", "\n", "domains = []\n", "frameworks = []\n", "framework_versions = []\n", "tasks = []\n", "models = []\n", "\n", "for model_summary in list_model_metadata_response[\"ModelMetadataSummaries\"]:\n", " domains.append(model_summary[\"Domain\"])\n", " tasks.append(model_summary[\"Task\"])\n", " models.append(model_summary[\"Model\"])\n", " frameworks.append(model_summary[\"Framework\"])\n", " framework_versions.append(model_summary[\"FrameworkVersion\"])\n", "\n", "data = {\n", " \"Domain\": domains,\n", " \"Task\": tasks,\n", " \"Framework\": frameworks,\n", " \"FrameworkVersion\": framework_versions,\n", " \"Model\": models,\n", "}\n", "\n", "df = pd.DataFrame(data)\n", "\n", "pd.set_option(\"display.max_rows\", None)\n", "pd.set_option(\"display.max_columns\", None)\n", "pd.set_option(\"display.width\", 1000)\n", "pd.set_option(\"display.colheader_justify\", \"center\")\n", "pd.set_option(\"display.precision\", 3)\n", "\n", "\n", "display(df.sort_values(by=[\"Domain\", \"Task\", \"Framework\", \"FrameworkVersion\"]))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "In this example, as we are predicting California Housing Prices with `scikit-learn`, we select `MACHINE_LEARNING` as the Domain, `REGRESSION` as the Task, `SAGEMAKER-SCIKIT-LEARN` as the Framework, and `sagemaker-scikit-learn` as the Model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "ml_domain = \"MACHINE_LEARNING\"\n", "ml_task = \"REGRESSION\"\n", "ml_framework = \"SAGEMAKER-SCIKIT-LEARN\"\n", "framework_version = \"1.2-1\"\n", "model = \"sagemaker-scikit-learn\"" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Container image URL\n", "\n", "If you don’t have an inference container image, you can use [Prebuilt Amazon SageMaker Docker Images for Scikit-learn](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-docker-containers-scikit-learn-spark.html) provided by AWS to serve your ML model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "from sagemaker import image_uris\n", "\n", "# ML model details\n", "model_name = \"scikit-learn-california-housing\" + datetime.datetime.now().strftime(\n", " \"%Y-%m-%d-%H-%M-%S\"\n", ")\n", "sagemaker_program = \"inference.py\"\n", "\n", "\n", "inference_image = image_uris.retrieve(\n", " framework=\"sklearn\",\n", " region=region,\n", " version=framework_version,\n", " py_version=\"py3\",\n", " instance_type=\"ml.m5.large\",\n", ")\n", "\n", "print(inference_image)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 4. Register Model Version/Package\n", "\n", "Inference Recommender expects the model to be packaged in the model registry. Here, we are creating a model package group and a model package version. The model package version which takes container, model `URL` etc. will now allow you to pass additional information about the model like `Domain`, `Task`, `Framework`, `FrameworkVersion`, `NearestModelName`, `SamplePayloadUrl`\n", "You specify a list of the instance types that are used to generate inferences in real-time in`SupportedRealtimeInferenceInstanceTypes` parameter. This list of instance types is key for the inference recommender feature. For inference on tabular data, e.g. with `scikit-learn`, or `XGBoost` models you'll probably want to use standard instances or compute optimized ones. For deep learning models, you will probably want to use accelerated computing instances (GPU).\n", "\n", "As `SamplePayloadUrl` and `SupportedContentTypes` parameters are essential for benchmarking the endpoint, we also highly recommend that you specify `Domain`, `Task`, `Framework`, `FrameworkVersion`, `NearestModelName` for better inference recommendation.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import boto3\n", "\n", "client = boto3.client(\"sagemaker\", region)\n", "\n", "model_package_group_name = \"scikit-learn-california-housing-\" + str(round(time.time()))\n", "print(model_package_group_name)\n", "model_package_group_response = client.create_model_package_group(\n", " ModelPackageGroupName=str(model_package_group_name),\n", " ModelPackageGroupDescription=\"My sample California housing model package group\",\n", ")\n", "\n", "print(model_package_group_response)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "model_package_version_response = client.create_model_package(\n", " ModelPackageGroupName=str(model_package_group_name),\n", " ModelPackageDescription=\"scikit-learn Inference Recommender Demo\",\n", " Domain=ml_domain,\n", " Task=ml_task,\n", " SamplePayloadUrl=sample_payload_url,\n", " InferenceSpecification={\n", " \"Containers\": [\n", " {\n", " \"ContainerHostname\": \"scikit-learn\",\n", " \"Image\": inference_image,\n", " \"ModelDataUrl\": model_url,\n", " \"Framework\": ml_framework,\n", " \"NearestModelName\": model,\n", " \"Environment\": {\n", " \"SAGEMAKER_CONTAINER_LOG_LEVEL\": \"20\",\n", " \"SAGEMAKER_PROGRAM\": sagemaker_program,\n", " \"SAGEMAKER_REGION\": region,\n", " \"SAGEMAKER_SUBMIT_DIRECTORY\": sourcedir_url,\n", " },\n", " },\n", " ],\n", " \"SupportedRealtimeInferenceInstanceTypes\": [\n", " \"ml.c5.large\",\n", " \"ml.c5.xlarge\",\n", " \"ml.c5.2xlarge\",\n", " \"ml.m5.xlarge\",\n", " \"ml.m5.2xlarge\",\n", " ],\n", " \"SupportedContentTypes\": [\"text/csv\"],\n", " \"SupportedResponseMIMETypes\": [\"text/csv\"],\n", " },\n", ")\n", "\n", "print(model_package_version_response)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Alternative Option: ContainerConfig\n", "\n", "If you are missing mandatory fields to create an inference recommender job in your model package version like so (this `create_model_package` is missing `Domain`, `Task`, and `SamplePayloadUrl`):\n", "\n", "```\n", "client.create_model_package(\n", " ModelPackageGroupName=str(model_package_group_name),\n", " ModelPackageDescription=\"scikit-learn Inference Recommender Demo\",\n", " InferenceSpecification={\n", " \"Containers\": [\n", " {\n", " \"ContainerHostname\": \"scikit-learn\",\n", " \"Image\": inference_image,\n", " \"ModelDataUrl\": model_url,\n", " \"Framework\": ml_framework,\n", " \"NearestModelName\": model,\n", " \"Environment\": {\n", " \"SAGEMAKER_CONTAINER_LOG_LEVEL\": \"20\",\n", " \"SAGEMAKER_PROGRAM\": sagemaker_program,\n", " \"SAGEMAKER_REGION\": region,\n", " \"SAGEMAKER_SUBMIT_DIRECTORY\": sourcedir_url,\n", " },\n", " },\n", " ],\n", " \"SupportedRealtimeInferenceInstanceTypes\": [\n", " \"ml.c5.large\",\n", " \"ml.c5.xlarge\",\n", " \"ml.c5.2xlarge\",\n", " \"ml.m5.xlarge\",\n", " \"ml.m5.2xlarge\",\n", " ],\n", " \"SupportedContentTypes\": [\"text/csv\"],\n", " \"SupportedResponseMIMETypes\": [\"text/csv\"],\n", " },\n", ")\n", "```\n", "\n", "You may define the fields `Domain`, `Task`, and `SamplePayloadUrl` in the optional field `ContainerConfig` like so:\n", "\n", "```\n", "payload_config = {\n", " \"SamplePayloadUrl\": sample_payload_url,\n", "}\n", "\n", "container_config = {\n", " \"Domain\": ml_domain,\n", " \"Task\": ml_task,\n", " \"PayloadConfig\": payload_config,\n", "}\n", "```\n", "\n", "And then provide it directly within `create_inference_recommendations_job()` API like so:\n", "\n", "```\n", "default_response = client.create_inference_recommendations_job(\n", " JobName=str(default_job),\n", " JobDescription=\"\",\n", " JobType=\"Default\",\n", " RoleArn=role,\n", " InputConfig={\n", " \"ModelPackageVersionArn\": model_package_arn,\n", " \"ContainerConfig\": container_config\n", " },\n", ")\n", "```\n", "\n", "For more information on what else can be provided via `ContainerConfig` please refer to the `CreateInferenceRecommendationsJob` doc here: [CreateInferenceRecommendationsJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateInferenceRecommendationsJob.html)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 5: Create a SageMaker Inference Recommender Default Job\n", "\n", "Now with your model in Model Registry, you can kick off a 'Default' job to get instance recommendations. This only requires your `ModelPackageVersionArn` and comes back with recommendations within an hour. \n", "\n", "The output is a list of instance type recommendations with associated environment variables, cost, throughput and latency metrics." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import boto3\n", "from sagemaker import get_execution_role\n", "\n", "client = boto3.client(\"sagemaker\", region)\n", "\n", "role = get_execution_role()\n", "default_job = \"scikit-learn-basic-recommender-job-\" + datetime.datetime.now().strftime(\n", " \"%Y-%m-%d-%H-%M-%S\"\n", ")\n", "default_response = client.create_inference_recommendations_job(\n", " JobName=str(default_job),\n", " JobDescription=\"scikit-learn Inference Basic Recommender Job\",\n", " JobType=\"Default\",\n", " RoleArn=role,\n", " InputConfig={\"ModelPackageVersionArn\": model_package_version_response[\"ModelPackageArn\"]},\n", ")\n", "\n", "print(default_response)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### 6. Instance Recommendation Results\n", "\n", "The inference recommender job provides multiple endpoint recommendations in its result. The recommendation includes `InstanceType`, `InitialInstanceCount`, `EnvironmentParameters` which includes tuned parameters for better performance. We also include the benchmarking results like `MaxInvocations`, `ModelLatency`, `CostPerHour` and `CostPerInference` for deeper analysis. The information provided will help you narrow down to a specific endpoint configuration that suits your use case.\n", "\n", "Example: \n", "\n", "If your motivation is overall price-performance, then you should focus on `CostPerInference` metrics \n", "If your motivation is latency/throughput, then you should focus on `ModelLatency` / `MaxInvocations` metrics" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Running the Inference recommender job will take ~35 minutes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%time\n", "\n", "import boto3\n", "import pprint\n", "import pandas as pd\n", "\n", "client = boto3.client(\"sagemaker\", region)\n", "\n", "ended = False\n", "while not ended:\n", " inference_recommender_job = client.describe_inference_recommendations_job(\n", " JobName=str(default_job)\n", " )\n", " if inference_recommender_job[\"Status\"] in [\"COMPLETED\", \"STOPPED\", \"FAILED\"]:\n", " ended = True\n", " else:\n", " print(\"Inference recommender job in progress\")\n", " time.sleep(300)\n", "\n", "if inference_recommender_job[\"Status\"] == \"FAILED\":\n", " print(\"Inference recommender job failed \")\n", " print(\"Failed Reason: {}\".inference_recommender_job[\"FailedReason\"])\n", "else:\n", " print(\"Inference recommender job completed\")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Detailing out the result" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "data = [\n", " {**x[\"EndpointConfiguration\"], **x[\"ModelConfiguration\"], **x[\"Metrics\"]}\n", " for x in inference_recommender_job[\"InferenceRecommendations\"]\n", "]\n", "df = pd.DataFrame(data)\n", "dropFilter = df.filter([\"VariantName\"])\n", "df.drop(dropFilter, inplace=True, axis=1)\n", "pd.set_option(\"max_colwidth\", 400)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "By `MaxInvocations` - The maximum number of requests per minute expected for the endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "df.sort_values(by=[\"MaxInvocations\"], ascending=False).head()" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "By `ModelLatencyThresholds` - The interval of time taken by a model to respond as viewed from SageMaker. The interval includes the local communication time taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "df.sort_values(by=[\"ModelLatency\"]).head()" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Optional: ListInferenceRecommendationsJobSteps\n", "To see the list of subtasks for an Inference Recommender job, simply provide the `JobName` to the `ListInferenceRecommendationsJobSteps` API. \n", "\n", "To see more information for the API, please refer to the doc here: [ListInferenceRecommendationsJobSteps](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListInferenceRecommendationsJobSteps.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "list_job_steps_response = client.list_inference_recommendations_job_steps(JobName=str(default_job))\n", "print(list_job_steps_response)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 7. Custom Load Test\n", "\n", "With an 'Advanced' job, you can provide your production requirements, select instance types, tune environment variables and perform more extensive load tests. This typically takes 2 hours depending on your traffic pattern and number of instance types. \n", "\n", "The output is a list of endpoint configuration recommendations (instance type, instance count, environment variables) with associated cost, throughput and latency metrics.\n", "\n", "In the below example, we aim to limit the latency requirement to 50 ms. The goal is to find the best performance in the sense of the maximum number of requests per minute expected for the endpoint for a `ml.m5.2xlarge` instance.\n", "We specify `DurationInSeconds`, how long traffic phase should be, to be 120, and the maximum duration of the job, in seconds `JobDurationInSeconds` to 7200." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import boto3\n", "\n", "client = boto3.client(\"sagemaker\", region)\n", "\n", "role = get_execution_role()\n", "advanced_job = \"scikit-learn-advanced-recommender-job-\" + datetime.datetime.now().strftime(\n", " \"%Y-%m-%d-%H-%M-%S\"\n", ")\n", "advanced_response = client.create_inference_recommendations_job(\n", " JobName=advanced_job,\n", " JobDescription=\"scikit-learn Inference Advanced Recommender Job\",\n", " JobType=\"Advanced\",\n", " RoleArn=role,\n", " InputConfig={\n", " \"ModelPackageVersionArn\": model_package_version_response[\"ModelPackageArn\"],\n", " \"JobDurationInSeconds\": 7200,\n", " \"EndpointConfigurations\": [{\"InstanceType\": \"ml.m5.2xlarge\"}],\n", " \"TrafficPattern\": {\n", " \"TrafficType\": \"PHASES\",\n", " \"Phases\": [{\"InitialNumberOfUsers\": 1, \"SpawnRate\": 1, \"DurationInSeconds\": 120}],\n", " },\n", " },\n", " StoppingConditions={\n", " \"MaxInvocations\": 500,\n", " \"ModelLatencyThresholds\": [{\"Percentile\": \"P95\", \"ValueInMilliseconds\": 50}],\n", " },\n", ")\n", "\n", "print(advanced_response)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### 8. Custom Load Test Results\n", "\n", "Inference recommender runs benchmarks on both of the endpoint configurations. Below is the result.\n", "\n", "Running the Inference recommender job will take ~15 minutes.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%time\n", "\n", "import boto3\n", "import pprint\n", "import pandas as pd\n", "\n", "client = boto3.client(\"sagemaker\", region)\n", "\n", "ended = False\n", "while not ended:\n", " inference_recommender_job = client.describe_inference_recommendations_job(\n", " JobName=str(advanced_job)\n", " )\n", " if inference_recommender_job[\"Status\"] in [\"COMPLETED\", \"STOPPED\", \"FAILED\"]:\n", " ended = True\n", " else:\n", " print(\"Inference recommender job in progress\")\n", " time.sleep(300)\n", "\n", "if inference_recommender_job[\"Status\"] == \"FAILED\":\n", " print(\"Inference recommender job failed \")\n", " print(\"Failed Reason: {}\".inference_recommender_job[\"FailedReason\"])\n", "else:\n", " print(\"Inference recommender job completed\")" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Detailing out the result\n", "\n", "Analyzing load test result, we can see that to achieve 50 ms latency, we will need two `ml.m5.2xlarge` instances, with `MaxInvocations` (The maximum number of requests per minute expected for the endpoint) of ~736." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "data = [\n", " {**x[\"EndpointConfiguration\"], **x[\"ModelConfiguration\"], **x[\"Metrics\"]}\n", " for x in inference_recommender_job[\"InferenceRecommendations\"]\n", "]\n", "df = pd.DataFrame(data)\n", "dropFilter = df.filter([\"VariantName\"])\n", "df.drop(dropFilter, inplace=True, axis=1)\n", "pd.set_option(\"max_colwidth\", 400)\n", "df.head()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/sagemaker-inference-recommender|sklearn-inference-recommender|sklearn-inference-recommender.ipynb)\n" ] } ], "metadata": { "instance_type": "ml.g4dn.xlarge", "kernelspec": { "display_name": "Python 3 (Data Science 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 }