Iterate Objects in S3 Buckets and Find Latest Modified Date

Iterate Objects in S3 Buckets and Find Latest Modified Date

As your AWS infrastructure ages it’s a good idea to review storage requirements and configurations. Doing so periodically can often allocate resources for different areas of business requirements.

The one I recommend first and foremost is storage requirements. Perhaps you can move items from S3 buckets or even whole buckets to Glacier. However, in order to do so one needs to know information about all of the data therein. There’s not really a very intuitive way to view file metadata via S3 web portal, so I wrote this script using Boto 3.

It iterates through all objects within a bucket, keeping track of the latest modification date. Until it finds the most recently modified object, it then prints that information to a CSV for manual review.

#!/usr/local/bin/python3
import boto3
import csv
from datetime import datetime, timedelta


s3 = boto3.client('s3',
    aws_access_key_id='KEY_ID',
    aws_secret_access_key='ACCESS_KEY'
)

response = s3.list_buckets()

# print('Existing S3 buckets')

for bucket in response['Buckets']:
    # print("Iterating through:\n", bucket['Name'],"- Created: ", bucket['CreationDate'],"\n")

    s3_res = boto3.resource('s3',
        aws_access_key_id='KEY_ID',
        aws_secret_access_key='ACCESS_KEY'
    )

    s3_bucket = s3_res.Bucket(bucket['Name'])

    print("Iterating through bucket", bucket['Name'])

    csvfile = open('s3buckets.csv', 'a')
    fieldnames = ['filename', 'modification_date']
    csvfile.writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    csvfile.writer.writeheader()

    latest_modified_date = datetime(1984,1,1,0)
    latest_modifed_file = ""
    
    for obj in s3_bucket.objects.all():
        # print("file: ", obj.key, "- last modified: ", obj.last_modified)

        if (datetime.strptime(str(obj.last_modified), '%Y-%m-%d %H:%M:%S+00:00') > latest_modified_date):
            latest_modified_date = datetime.strptime(str(obj.last_modified), '%Y-%m-%d %H:%M:%S+00:00')
            latest_modifed_file = str(obj.key)

    print("Last modified file:", latest_modifed_file, "- modified on:", latest_modified_date)
    csvfile.writer.writerow({'filename': str(bucket['Name']), 'modification_date': ''})
    csvfile.writer.writerow({'filename': latest_modifed_file, 'modification_date': str(latest_modified_date)})

csvfile.close()

Comments are closed.