Skip to main content


There are a lot of materials and toolkits for image processing. Here we provide some materials and some code snippets mainly using OpenCV and Pillow.

  • OpenCV: OpenCV is an open source computer vision and machine learning software library. Tutorial; Github
  • Pillow: Pillow is the friendly PIL fork by Alex Clark and Contributors. PIL is the Python Imaging Library by Fredrik Lundh and Contributors.

Read and Write Images

  • Using Pillow

    from PIL import Image

    # Read an image
    image ='path_to_image.jpg')

    # Convert the array to a PIL Image
    image = Image.fromarray(img_np_array)

    # Save an image'path_to_save_image.jpg')
  • Using OpenCV

    import cv2

    # Read an image
    image = cv2.imread('path_to_image.jpg')

    # Convert the image from RGB to BGR
    bgr_image = cv2.cvtColor(rgb_image_array, cv2.COLOR_RGB2BGR)

    # Save an image
    cv2.imwrite('path_to_save_image.jpg', image)

Image Resizing

  • Using Pillow

    # Resize image
    resized_image = image.resize((new_width, new_height))

    # Thumbnail (maintains aspect ratio)
    image.thumbnail((new_width, new_height))
  • Using OpenCV

    # Resize image
    resized_image = cv2.resize(image, (new_width, new_height))

Image Feature Extraction

  • DINO: Self-Supervised Vision Transformers with DINO. Code; Paper
  • DINO v2: DINOv2: Learning Robust Visual Features without Supervision. Code; Paper

Image Segmentation

  • Segment Anything Model (SAM): SAM is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training. Code; Paper
  • Segmentation Models.pytorch: The main features of this library are: (1) High level API (just two lines to create a neural network) (2) 9 models architectures for binary and multi class segmentation (including legendary Unet) (3) 124 available encoders (and 500+ encoders from timm) (4) All encoders have pre-trained weights for faster and better convergence (5) Popular metrics and losses for training routines. Code

Open-Vocabulary Image Detection

Open-Vocabulary Image Segmentation

  • GroundedSAM: A very interesting demo combining Grounding DINO with Segment Anything which aims to detect and segment anything with text inputs! And we will continue to improve it and create more interesting demos based on this foundation. Code