

Deploy CogVLM to AWS EC2

Deploy your models with Roboflow

Hosted or on-device deployment

SDKs optimized for maximum performance

Extensive documentation

Get started for free

Learn More

How to Deploy CogVLM to AWS EC2

In this guide, we are going to show how to deploy a

CogVLM

model to

AWS

using Roboflow Inference. Inference is a high-performance inference server with which you can run a range of vision models, from YOLOv8 to CLIP to CogVLM.

To deploy a

CogVLM

model to

AWS

, we will:

1. Set up our computing environment
2. Download the Roboflow Inference Server
3. Try out our model on an example image

Let's get started!

In this guide, we are going to show how to deploy a

CogVLM

model to

AWS

using the Roboflow Inference Server. This SDK works with

CogVLM

models trained on both Roboflow and in custom training processes outside of Roboflow.

To deploy a

CogVLM

model to

AWS

, we will:

1. Train a model on (or upload a model to) Roboflow
2. Download the Roboflow Inference Server
3. Install the Python SDK to run inference on images
4. Try out the model on an example image

Let's get started!

Train a Model on or Upload a Model to Roboflow

If you want to upload your own model weights, first create a Roboflow account and create a new project. When you have created a new project, upload your project data, then generate a new dataset version. With that version ready, you can upload your model weights to Roboflow.

Download the Roboflow Python SDK:

pip install roboflow

Then, use the following script to upload your model weights:

from roboflow import Roboflow

home = "/path/to/project/folder"

rf = Roboflow(api_key=os.environ["ROBOFLOW_API_KEY"])
project = rf.workspace().project("PROJECT_ID")

project.version(PROJECT_VERSION).deploy(model_type="yolov5", model_path=f"/{home}/yolov5/runs/train/")

You will need your project name, version, API key, and model weights. The following documentation shows how to retrieve your API key and project information:

- Retrieve your Roboflow project name and version
- Retrieve your API key

Change the path in the script above to the path where your model weights are stored.

When you have configured the script above, run the code to upload your weights to Roboflow.

Now you are ready to start deploying your model.

Set up a AWS Virtual Machine

First, we need to create an AWS EC2 instance. EC2 is Amazon’s compute product that you can use to deploy virtual machines. Search for “EC2” in the search bar and navigate to EC2.

‍

On the EC2 homepage, click the “Launch instance” button:

‍

This button will take you to a page where you can configure the machine to create.

We recommend choosing the Amazon Linux operating system, which has been optimized for use in AWS. You will need to run Inference on a GPU device to run CogVLM. We recommend choosing a machine image optimized for deep learning, such as the Deep Learning Base GPU AMI image. This image will come with some tooling out of the box that will minimize GPU setup. If you want to deploy on a CPU device, the standard Amazon Linux operating system is recommended.

‍

Once you have configured the virtual machine, you can deploy the system.

Next, sign in to your server with SSH. Read the AWS EC2 SSH instructions to learn more.

‍

Download the Roboflow Inference Server

The Roboflow Inference Server allows you to deploy computer vision models to a range of devices, including

AWS

.

The Inference Server relies on Docker to run. If you don't already have Docker installed on the device(s) on which you want to run inference, install it by following the official Docker installation instructions.

Once you have Docker installed, run the following command to download the Roboflow Inference Server on your

AWS

.

‍


pip install inference inference-cli
inference server start

Now you have the Roboflow Inference Server running, you can use your model on

AWS

Install the Roboflow Python SDK

The Roboflow Inference Server provides a HTTP API with a range of methods you can use to query your model and various popular models (i.e. SAM, CLIP). You can read more about all of the API methods available on the Roboflow Inference server in the Inference Server documentation.

The Roboflow Python SDK provides abstract convenience methods for interacting with the HTTP API. In this guide, we will use the Python SDK to run inference on a model. You can also query the HTTP API itself.

To install the Python SDK, run the following command:

pip install roboflow

Run Inference on an Image

Create a new Python file and add the following code:


import base64
import os
from PIL import Image
import requests

PORT = 9001
API_KEY = ""
IMAGE_PATH = "forklift.png"


def encode_base64(image_path):
    with open(image_path, "rb") as image:
        x = image.read()
        image_string = base64.b64encode(x)

    return image_string.decode("ascii")

prompt = "Read the text in this image."

infer_payload = {
    "image": {
        "type": "base64",
        "value": encode_base64(IMAGE_PATH),
    },
    "api_key": API_KEY,
    "prompt": prompt,
}

results = requests.post(
    f"http://localhost:{PORT}/llm/cogvlm",
    json=infer_payload,
)

print(results.json())

This code will make a HTTP request to the /llm/cogvlm route on your Inference installation. This route accepts text and images which will be sent to CogVLM for processing. This route returns a JSON object with the text response from the model.

Above, replace:

1. ROBOFLOW_API_KEY with your Roboflow API key. Learn how to retrieve your Roboflow API key.

3. image.png with the image that you want to use to make a request.

4. prompt with the question you want to ask.

Let’s run the code on the following image of a forklift and ask the question “Is there a forklift close to a conveyor belt?”:

Here is an example output from the script above:


{
    'response': 'yes, there is a forklift close to a conveyor belt, and it appears to be transporting a stack of items onto it.',
    'time': 12.89864671198302
}

‍

Enterprise-grade security and compliance

We take security seriously and have implemented comprehensive measures to keep your sensitive data safe



Compliant with SOC2 Type 1 requirements



All data is encrypted in transit and at rest, with SSL transport receiving a grade A+ rating from Qualys



Strict row-level permissions to ensure users cannot access sensitive data outside of their organizations



Roboflow is hosted on the Google Cloud Platform and Amazon Web Services, best-in-class infrastructure as a service providers



Authentication, database and file storage mechanisms are ISO 27001, ISO 27017, ISO 27018, SOC 1, SOC 2 and SOC 3 compliant



PCI compliant with Self-Assessment Questionnaire A and Attestation of Compliance



All card numbers and bank accounts never touch our servers and are stored by Stripe, a PCI Service Provider Level 1, the highest available security certification in the payments industry



Access to production data is heavily restricted within Roboflow and only accessible via SSO login



All Roboflow employees sign nondisclosure agreements restricting them from sharing information learned while handling customer data



Learn more about Roboflow for enterprise

Learn how to deploy models to other devices

Below, you can find our guides on how to deploy

CogVLM

models to other devices.

Deploy CogVLM to Azure Virtual Machines

Deploy CogVLM to GCP Compute Engine

Deploy CogVLM to AWS EC2

Documentation

The following resources are useful reference material for working with your model using Roboflow and the Roboflow Inference Server.