{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a7e2542d-af8f-497d-9413-e13ce69db3ba",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "\n",
    "# Fine-tuning and deploying the Mixtral 8x7B LLM In SageMaker with Hugging Face, using QLoRA Parameter-Efficient Fine-Tuning\n",
    "\n",
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.\n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "---\n",
    "\n",
    "The Mixtral 8x7B Large Language Model by Mistral AI has 46.7 billion parameters, but uses only 12.9 billion per token, thanks to its Mixture of Experts architecture. The model masters 5 languages ​​(French, Spanish, Italian, English and German) and outperforms the much larger Llama 2 70B model from Meta. An instruct version of the model, trained to follow instructions is also available.\\\n",
    "QLoRA is a parameter-efficient fine-tuning technique that allows for fine-tuning LLMs in less memory, without changing the weights of the model, but by adding to them. This not only leads to good performance, but it mitigates the risk of [Catastrophic Forgetting](https://en.wikipedia.org/wiki/Catastrophic_interference) that comes with regular full fine-tuning. QLoRA:\n",
    "\n",
    "1. Freezes model weights, and quantizes the pretrained model to 4 bits.\n",
    "2. Attaches additional trainable adapter layers.\n",
    "3. Fine-tunes these layers, without changing the frozen, quantized model (while using it as context).\n",
    "\n",
    "In this notebook, you will learn how to fine-tune the 8x7B model using Hugging Face on Amazon SageMaker. You'll use the Hugging Face Transformers framework and the Hugging Face extension to the SageMaker Python SDK to fine-tune Mixtral with QLoRA on an example instruction dataset, and run the tuned model in a Hugging Face Deep-Learning Container (DLC) on a SageMaker real-time inference endpoint. This notebook can be run from an Amazon SageMaker Studio notebook or a SageMaker notebook instance, and outside SageMaker (for example on your laptop/development machine). In the latter case you'll need to handle authentication to SageMaker and other AWS services used in the notebook. When you run the notebook on SageMaker this will be handled for you.\n",
    "\n",
    "\n",
    "## Files\n",
    "\n",
    "scripts/run_clm.py: The entry point script that'll be passed to the Hugging Face estimator later in this notebook when launching the QLoRA fine-tuning job (from [here](https://github.com/philschmid/sagemaker-huggingface-llama-2-samples/blob/master/training/scripts/run_clm.py))\\\n",
    "scripts/requirements.txt: This takes care of installing some dependencies for the fune-tuning job, like Hugging Face Transformers and the PEFT library.\n",
    "\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "You need to create an S3 bucket to store the input data for training. This bucket must be located in the same AWS Region that you choose to launch your training job. To learn how to create a S3 bucket, see [Create your first S3 bucket in the Amazon S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). You can also just use the default bucket for the SageMaker session you create without specifying a specific bucket name.\n",
    "\n",
    "\n",
    "## Launching Environment\n",
    "### Amazon SageMaker Notebook\n",
    "\n",
    "You can run the notebook on an Amazon SageMaker Studio notebook, or a SageMaker notebook instance without manually setting your aws credentials.\n",
    "\n",
    "Create a new SageMaker notebook instance and open it.\n",
    "Zip the contents of this folder & upload to the instance with the Upload button on the top-right.\n",
    "Open a new terminal with New -> Terminal.\n",
    "Within the terminal, enter the correct directory and unzip the file.\n",
    "cd SageMaker && unzip <your-zip-name-here>.zip\n",
    "\n",
    "### Locally\n",
    "\n",
    "You can run locally by launching a Jupyter notebook server with Jupyter notebook. This requires you to set your aws credentials in the environment manually. See [Configure the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) for more details."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d8a4bcb-0c94-426a-ad3d-61cc7f70d10d",
   "metadata": {},
   "source": [
    "\n",
    "#### Amazon SageMaker Initialization\n",
    "Run the following cell to upgrade the SageMaker SDK and the Transformers framework to the latest version. You may need to restart the notebook kernel for the changes to take effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a255daf-f1d0-45fa-b686-82e1db0916b7",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%pip install --quiet --upgrade transformers datasets sagemaker s3fs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3fa1db96-2121-4652-98c9-cd114afcf73e",
   "metadata": {},
   "source": [
    "import SageMaker modules and retrieve information of your current SageMaker work environment, such as the AWS Region and the ARN of your Amazon SageMaker execution role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4de5bdd7-25a8-42e0-89b8-209f2162238b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import boto3\n",
    "\n",
    "sess = sagemaker.Session()\n",
    "\n",
    "# gets role\n",
    "try:\n",
    "    role = sagemaker.get_execution_role()\n",
    "except ValueError:\n",
    "    iam = boto3.client(\"iam\")\n",
    "    role = iam.get_role(RoleName=\"AmazonSageMaker-ExecutionRole-20231209T154667\")[\"Role\"][\"Arn\"]\n",
    "\n",
    "print(f\"sagemaker role arn: {role}\")\n",
    "print(f\"sagemaker bucket: {sess.default_bucket()}\")\n",
    "print(f\"sagemaker session region: {sess.boto_region_name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5868ae66-610f-4e1a-815e-b1d4cbe2d6ad",
   "metadata": {},
   "source": [
    "Here we load the [Dolly-15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k). This is a high-quality set of prompt/response pairs, human-generated; perfect for instruction fine-tuning LLMs like Mixtral."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ac93506-6a7a-4db6-8a00-44928acfe113",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "from random import randrange\n",
    "\n",
    "# Load dataset from the hub\n",
    "dataset = load_dataset(\"databricks/databricks-dolly-15k\", split=\"train\")\n",
    "\n",
    "print(f\"dataset size: {len(dataset)}\")\n",
    "print(dataset[randrange(len(dataset))])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88736dc9-6f11-4613-82a3-ba62a53f249d",
   "metadata": {},
   "source": [
    "Formatting function to convert our data into task prompts. The function takes a sample of the dataset and outputs a prompt string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8338589-c8c3-426e-95b5-9938272a10ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "def format_dolly(sample):\n",
    "    instruction = f\"### Instruction\\n{sample['instruction']}\"\n",
    "    context = f\"### Context\\n{sample['context']}\" if len(sample[\"context\"]) > 0 else None\n",
    "    response = f\"### Answer\\n{sample['response']}\"\n",
    "    # join all the parts together\n",
    "    prompt = \"\\n\\n\".join([i for i in [instruction, context, response] if i is not None])\n",
    "    return prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54e7a30b-d547-4352-b606-5295fb30d2e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from random import randrange\n",
    "\n",
    "print(format_dolly(dataset[randrange(len(dataset))]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd648fae-a59d-4b4f-909a-56453124874f",
   "metadata": {},
   "source": [
    "Now, we load the tokenizer from the pre-trained Mixtral model, add an EOS token to each sample, tokenize the data and pack it in chunks of 2048 tokens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1f19c30-0680-43d8-987c-2c76139fe392",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "\n",
    "model_id = \"mistralai/Mixtral-8x7B-v0.1\"  # sharded weights\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
    "tokenizer.pad_token = tokenizer.eos_token"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "742a0bbc-76e8-4dba-a40a-49a17ff8ec6a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from random import randint\n",
    "from itertools import chain\n",
    "from functools import partial\n",
    "\n",
    "\n",
    "# template dataset to add prompt to each sample\n",
    "def template_dataset(sample):\n",
    "    sample[\"text\"] = f\"{format_dolly(sample)}{tokenizer.eos_token}\"\n",
    "    return sample\n",
    "\n",
    "\n",
    "# apply prompt template per sample\n",
    "dataset = dataset.map(template_dataset, remove_columns=list(dataset.features))\n",
    "# print random sample\n",
    "print(dataset[randint(0, len(dataset))][\"text\"])\n",
    "\n",
    "# empty list to save remainder from batches to use in next batch\n",
    "remainder = {\"input_ids\": [], \"attention_mask\": [], \"token_type_ids\": []}\n",
    "\n",
    "\n",
    "def chunk(sample, chunk_length=2048):\n",
    "    # define global remainder variable to save remainder from batches to use in next batch\n",
    "    global remainder\n",
    "    # Concatenate all texts and add remainder from previous batch\n",
    "    concatenated_examples = {k: list(chain(*sample[k])) for k in sample.keys()}\n",
    "    concatenated_examples = {\n",
    "        k: remainder[k] + concatenated_examples[k] for k in concatenated_examples.keys()\n",
    "    }\n",
    "    # get total number of tokens for batch\n",
    "    batch_total_length = len(concatenated_examples[list(sample.keys())[0]])\n",
    "\n",
    "    # get max number of chunks for batch\n",
    "    if batch_total_length >= chunk_length:\n",
    "        batch_chunk_length = (batch_total_length // chunk_length) * chunk_length\n",
    "\n",
    "    # Split by chunks of max_len.\n",
    "    result = {\n",
    "        k: [t[i : i + chunk_length] for i in range(0, batch_chunk_length, chunk_length)]\n",
    "        for k, t in concatenated_examples.items()\n",
    "    }\n",
    "    # add remainder to global variable for next batch\n",
    "    remainder = {\n",
    "        k: concatenated_examples[k][batch_chunk_length:] for k in concatenated_examples.keys()\n",
    "    }\n",
    "    # prepare labels\n",
    "    result[\"labels\"] = result[\"input_ids\"].copy()\n",
    "    return result\n",
    "\n",
    "\n",
    "# tokenize and chunk dataset\n",
    "lm_dataset = dataset.map(\n",
    "    lambda sample: tokenizer(sample[\"text\"]),\n",
    "    batched=True,\n",
    "    remove_columns=list(dataset.features),\n",
    ").map(\n",
    "    partial(chunk, chunk_length=2048),\n",
    "    batched=True,\n",
    ")\n",
    "\n",
    "# Print total number of samples\n",
    "print(f\"Total number of samples: {len(lm_dataset)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c72b9f23-326e-4f7c-a39c-3d972e290205",
   "metadata": {},
   "source": [
    "Save our processed data to S3 for use in the training job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e43275e-8cf2-41ae-ad28-c38e2ddcc714",
   "metadata": {},
   "outputs": [],
   "source": [
    "import s3fs\n",
    "\n",
    "# save train_dataset to s3\n",
    "training_input_path = f\"s3://{sess.default_bucket()}/processed/mixtral/dolly/train\"\n",
    "lm_dataset.save_to_disk(training_input_path)\n",
    "\n",
    "print(\"uploaded data to:\")\n",
    "print(f\"training dataset to: {training_input_path}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dda60887-1c65-4dbf-935c-d1c020ba130e",
   "metadata": {},
   "source": [
    "run_clm.py is the entrypoint script for the training job. It implements QLoRA using PEFT to train our model. It merges the fine-tuned LoRA weights into the model weights after training, so you can use the resulting model as normal. Don't forget to add the requirements.txt into your source_dir folder - that way SageMaker will install the needed libraries, including peft (provides the LoRA API), and bitsandbytes for quantization of the pre-trained model to use in the QLoRA training job.\n",
    "\n",
    "We use a single g5.24xlarge instance (with 4 24 GB A10G GPUs) for the training job. The quantization that QLoRA provides reduces the memory requirements for the job such that it fits on that instance and doesn't need an instance type with 8 GPUs. Training for 3 epochs took 9 1/2 hours in my case. If you're in a hurry and just want to see proof of the concept, reducing the number of epochs helps.\n",
    "\n",
    "These large GPU instances aren't available in every AWS region, so make sure that you're in an AWS region that has g5.24xlarge instances (and you have the quota in your AWS account to use one additional)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "abd2da80-eee1-43cf-ac68-fade87d4bc38",
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "from sagemaker.huggingface import HuggingFace\n",
    "\n",
    "# define Training Job Name\n",
    "job_name = f'mixtral-8x7b-qlora-{time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.localtime())}'\n",
    "\n",
    "# hyperparameters, which are passed into the training job\n",
    "hyperparameters = {\n",
    "    \"model_id\": model_id,  # pre-trained model\n",
    "    \"dataset_path\": \"/opt/ml/input/data/training\",  # path where sagemaker will save training dataset\n",
    "    \"epochs\": 3,  # number of training epochs\n",
    "    \"per_device_train_batch_size\": 2,  # batch size for training\n",
    "    \"lr\": 2e-4,  # learning rate used during training\n",
    "    \"merge_weights\": True,  # wether to merge LoRA into the model (needs more memory)\n",
    "}\n",
    "\n",
    "# create the Estimator\n",
    "huggingface_estimator = HuggingFace(\n",
    "    entry_point=\"run_clm.py\",  # train script\n",
    "    source_dir=\"scripts\",  # directory which includes the entrypoint script and the requirements.txt for our training environment\n",
    "    instance_type=\"ml.g5.24xlarge\",  # instances type used for the training job\n",
    "    instance_count=1,  # the number of instances used for training\n",
    "    base_job_name=job_name,  # the name of the training job\n",
    "    role=role,  # Iam role used in training job to access AWS ressources, e.g. S3\n",
    "    volume_size=300,  # the size of the EBS volume in GB\n",
    "    transformers_version=\"4.28\",  # the transformers version used in the training job\n",
    "    pytorch_version=\"2.0\",  # the pytorch_version version used in the training job\n",
    "    py_version=\"py310\",  # the python version used in the training job\n",
    "    hyperparameters=hyperparameters,  # the hyperparameters passed to the training job\n",
    "    environment={\n",
    "        \"HUGGINGFACE_HUB_CACHE\": \"/tmp/.cache\"\n",
    "    },  # set env variable to cache models in /tmp\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b125c7c2-6064-4abc-814b-5cb261dc5db1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# define a data input dictonary with our uploaded s3 uris\n",
    "data = {\"training\": training_input_path}\n",
    "\n",
    "# starting the train job with our uploaded datasets as input\n",
    "huggingface_estimator.fit(data, wait=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d88eaf9-c544-47dd-8fe9-c1469b74e6c0",
   "metadata": {},
   "source": [
    "Load the Hugging Face [LLM inference container](https://aws.amazon.com/blogs/machine-learning/announcing-the-launch-of-new-hugging-face-llm-inference-containers-on-amazon-sagemaker/) that will run the model as a real-time SageMaker inference endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "294d5849-473b-4ab7-9758-cbbdbfacba79",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 1.3.3 is not yet available in the SDK as of January 2024, when it becomes available, use this rather than static image mapping\n",
    "# from sagemaker.huggingface import get_huggingface_llm_image_uri\n",
    "\n",
    "## retrieve the llm image uri\n",
    "# llm_image = get_huggingface_llm_image_uri(\n",
    "#  \"huggingface\",\n",
    "#  version = \"1.3.3\"\n",
    "# )\n",
    "\n",
    "region_mapping = {\n",
    "    \"af-south-1\": \"626614931356\",\n",
    "    \"il-central-1\": \"780543022126\",\n",
    "    \"ap-east-1\": \"871362719292\",\n",
    "    \"ap-northeast-1\": \"763104351884\",\n",
    "    \"ap-northeast-2\": \"763104351884\",\n",
    "    \"ap-northeast-3\": \"364406365360\",\n",
    "    \"ap-south-1\": \"763104351884\",\n",
    "    \"ap-south-2\": \"772153158452\",\n",
    "    \"ap-southeast-1\": \"763104351884\",\n",
    "    \"ap-southeast-2\": \"763104351884\",\n",
    "    \"ap-southeast-3\": \"907027046896\",\n",
    "    \"ap-southeast-4\": \"457447274322\",\n",
    "    \"ca-central-1\": \"763104351884\",\n",
    "    \"cn-north-1\": \"727897471807\",\n",
    "    \"cn-northwest-1\": \"727897471807\",\n",
    "    \"eu-central-1\": \"763104351884\",\n",
    "    \"eu-central-2\": \"380420809688\",\n",
    "    \"eu-north-1\": \"763104351884\",\n",
    "    \"eu-west-1\": \"763104351884\",\n",
    "    \"eu-west-2\": \"763104351884\",\n",
    "    \"eu-west-3\": \"763104351884\",\n",
    "    \"eu-south-1\": \"692866216735\",\n",
    "    \"eu-south-2\": \"503227376785\",\n",
    "    \"me-south-1\": \"217643126080\",\n",
    "    \"me-central-1\": \"914824155844\",\n",
    "    \"sa-east-1\": \"763104351884\",\n",
    "    \"us-east-1\": \"763104351884\",\n",
    "    \"us-east-2\": \"763104351884\",\n",
    "    \"us-gov-east-1\": \"446045086412\",\n",
    "    \"us-gov-west-1\": \"442386744353\",\n",
    "    \"us-iso-east-1\": \"886529160074\",\n",
    "    \"us-isob-east-1\": \"094389454867\",\n",
    "    \"us-west-1\": \"763104351884\",\n",
    "    \"us-west-2\": \"763104351884\",\n",
    "}\n",
    "\n",
    "llm_image = f\"{region_mapping[sess.boto_region_name]}.dkr.ecr.{sess.boto_region_name}.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.3-gpu-py310-cu121-ubuntu20.04-v1.0\"\n",
    "\n",
    "print(f\"llm image uri: {llm_image}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d5f50a5a-6ef9-4ce3-8dfd-99389a973866",
   "metadata": {},
   "source": [
    "Now take the instruct-tuned model from S3, and deploy it. Make sure that you're in an AWS region that has g5.48xlarge instances (and you have the quota in your AWS account to use one additional)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3958d80a-f569-40ab-aa08-8008da6c1e40",
   "metadata": {},
   "outputs": [],
   "source": [
    "s3_uri = huggingface_estimator.model_data\n",
    "print(s3_uri)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45a56cba-135a-4b57-ac32-66db93da2d1f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from sagemaker.huggingface import HuggingFaceModel\n",
    "\n",
    "# sagemaker config\n",
    "instance_type = \"ml.g5.48xlarge\"\n",
    "number_of_gpu = 8\n",
    "health_check_timeout = 300\n",
    "\n",
    "# Define Model and Endpoint configuration parameter\n",
    "config = {\n",
    "    \"HF_MODEL_ID\": \"/opt/ml/model\",\n",
    "    \"SM_NUM_GPUS\": json.dumps(number_of_gpu),  # Number of GPU used per replica\n",
    "    \"MAX_INPUT_LENGTH\": json.dumps(24000),  # Max length of input text\n",
    "    \"MAX_BATCH_PREFILL_TOKENS\": json.dumps(32000),  # Number of tokens for the prefill operation.\n",
    "    \"MAX_TOTAL_TOKENS\": json.dumps(32000),  # Max length of the generation (including input text)\n",
    "    \"MAX_BATCH_TOTAL_TOKENS\": json.dumps(\n",
    "        512000\n",
    "    ),  # Limits the number of tokens that can be processed in parallel during the generation\n",
    "}\n",
    "\n",
    "# create HuggingFaceModel with the image uri\n",
    "llm_model = HuggingFaceModel(model_data=s3_uri, role=role, image_uri=llm_image, env=config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf28ae58-257e-4edf-b743-7a2cad914624",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Deploy model to an endpoint\n",
    "# https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model.deploy\n",
    "\n",
    "endpoint_name = sagemaker.utils.name_from_base(\"Mixtral-8x7B\")\n",
    "\n",
    "llm = llm_model.deploy(\n",
    "    endpoint_name=endpoint_name,\n",
    "    initial_instance_count=1,\n",
    "    instance_type=instance_type,\n",
    "    container_startup_health_check_timeout=health_check_timeout,  # 10 minutes to be able to load the model\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfffb3c4-9b32-4c73-9333-24d280c74e6b",
   "metadata": {},
   "source": [
    "Let's send a prompt! The resulting completion is well-aligned to instructions, quite accurate and concise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bcabf4b6-518c-48ab-836d-b1b325eeb750",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prompt to generate\n",
    "prompt = \"What is Amazon SageMaker?\"\n",
    "\n",
    "# Generation arguments\n",
    "payload = {\n",
    "    \"do_sample\": True,\n",
    "    \"top_p\": 0.6,\n",
    "    \"temperature\": 0.1,\n",
    "    \"top_k\": 50,\n",
    "    \"max_new_tokens\": 1024,\n",
    "    \"repetition_penalty\": 1.03,\n",
    "    \"return_full_text\": False,\n",
    "    \"stop\": [\"</s>\"],\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e25a212-e2ef-4b09-b075-2a57f79d6bae",
   "metadata": {},
   "outputs": [],
   "source": [
    "chat = llm.predict({\"inputs\": prompt, \"parameters\": payload})\n",
    "\n",
    "print(chat[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "170ad729-af2d-427a-b12c-62725e5e2d3e",
   "metadata": {},
   "source": [
    "Finally, cleanup. Delete the SageMaker model and endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e546e685-272f-437e-8579-0aa152cdd058",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm.delete_model()\n",
    "llm.delete_endpoint()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cd68f59-ce15-4e64-bb0e-c86c0177a26c",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/introduction_to_applying_machine_learning|mixtral_tune_and_deploy|mixtral-8x7b.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}