Today, since I need to store a large dataset and to access it like I am on my local machine, I explored AWS S3. After having created a bucket, I had a look at the official guide of boto3. The guide suggests to download the files to your local HD and then processing them. This brings to a continuous download and delete files that is not healthy for your hard drive. So, I found a way to assign the images to a NumPy variable through OpenCV directly. First of all we have to installa the folling libries (I assume you already have OpenCV installed):

pip install boto3

Boto3 is the library responsible for accessing and exploring your aws bucket. Now, I’ll show you my solution:

import boto3
import cv2
import numpy as np

# AWS CREDENTIALS
s3 = boto3.resource(
    service_name='s3',
    region_name='<BUCKET-REGION-NAME>',
    aws_access_key_id='<AWS-ACCESS-KEY-ID>',
    aws_secret_access_key='<AWS-SECRET-ACCESS-KEY>'
)

#BUCKET NAME
bucket_name = "<BUCKET-NAME>"
bucket = s3.Bucket(bucket_name) 

for obj in bucket.objects.all():
    #ARRAY OF BYTES WHERE YOUR IMAGE IS TEMPORARY SAVED
    img = bucket.Object(obj.key).get().get('Body').read()
    #DECODING THE IMAGE
    nparray = cv2.imdecode(np.asarray(bytearray(img)), cv2.IMREAD_COLOR)
    #SHOWING IT
    cv2.imshow("image", nparray)
    cv2.waitKey(0)

As you can see, it is very simple, you store the bytes of your image in a temporary variable and then you use the OpenCV method cv2.imdecode for decoding it.

aws  opencv