Uploading an Excel into DynamoDB

--

How I spent an entire day and 4 cents

I’m new to AWS and I find the abundance of options and endless configuration daunting. But finally, when things work, it seems so easy.

In general, I’m currently building a Chrome extension that based on data on the web screen pulls data out of a DynamoDB table via a Lambda function. I just spent an entire Sunday until I managed to accomplish the simple task of getting the data from an excel file into a DynamoDB. I’m sharing what didn‘t work and then what did.

Pre Step — convert the excel to csv, easy.

Now, if you want to skip the war stories, just scroll down to the Lambda section below.

Typically I find AWS documentation accurate, thorough and easy to follow. So I started with this: Import and Export DynamoDB Data Using AWS Data Pipeline

On my, so many inaccuracies! You get to step 2, save the Pipeline and get an error (shown below). I would expect the documentation to be complete and not result in errors. Plus, you would think the link would lead you to the place where you can configure the ‘terminate after’, wrong.

Well it does, in a way, you need to “Add an optional field…” and pick “Terminate After” in the dropdown.

I won’t bore you with a whole slew of errors. Suffice to say that I never managed to get past the cluster being in status CREATING, I was never able to see any logs on S3 no matter what I did and in the end I went to look for alternatives.
My bill for all of my attempts: $0.04.
I feel like I should be compensated for trying.

Enter CData Excel Add-In for Amazon DynamoDB. Sounds promising. I was able to install and configure in no time but then when it ran a query to the database got totally stuck and was not able to recover. Didn’t manage to uninstall smoothly either. I’m sure its a great add-on, really. Probably all of the security features on my laptop hindered its capabilities. So scratch that, look for something else.

Lambda

Good old reliable Lambda, my friend. Out of all of the AWS services I’ve used til now, I think Lambda is the easiest interface for configuring, developing and testing code.

This is a list of some of the resources I used as reference to set this up:
Read files from Amazon S3 bucket using Python
Load csv to AWS dynamoDB
AWS Documentation

These are the steps I followed:

  1. Upload the csv file to a S3 bucket. I named mine: file.cvs
  2. Create a table on DynamoDB (I’m in us-east-2, your region may be different). Make sure you understand the Partition Key vs Sort Key. And those are the only keys you need to put in the table definition. Even if your data has more key-values. They will be created during upload.
  3. After the table is created, in the Overview tab I configured the Stream by clicking the ‘Manage Stream’ button. Picking the ‘New and old images — both the new and the old images of the item’ option, and clicking Enable.
  4. Security — on IAM, create an ‘Allows Lambda functions to call AWS services on your behalf’ Role with the following policies:
    AmazonDynamoDBFullAccess
    AmazonS3ReadOnlyAccess
    AWSLambdaBasicExecutionRole
    You’ll need this role for your Lambda function.
  5. In the Lambda console, create a new function:
    Pick the ‘Author from Scratch’ option
    Put in your function name
    Pick Python 3.8 for the Runtime
    In Permissions, ‘Change default execution role’, pick ‘Use an existing role’ and select the Role you created in step #4 above.
    And then click the ‘Create Function’ button
  6. Scroll down to function code block and put this in (italics indicates you need to change to your own):
import boto3
import csv
def lambda_handler(event, context):
region=’us-east-2'
try:
# get a handle on s3
session = boto3.Session(region_name=region)
s3 = session.resource(‘s3’)
dyndb = boto3.client(‘dynamodb’, region_name=region)
bucket = s3.Bucket(‘YourBucketName’)
obj = bucket.Object(key=’file.csv’)
# get the object
response = obj.get()
# read the contents of the file
lines = response[‘Body’].read().decode(‘utf-8’).splitlines()

firstrecord=True
csv_reader = csv.reader(lines, delimiter=’,’, quotechar=’”’)
for row in csv_reader:
if (firstrecord):
firstrecord=False
continue
FirstCol = row[0]
SecondCol = row[1]
ThirdCol = row[2]
ForthCol = row[3]
response = dyndb.put_item(
TableName=’YourTable’,
Item={
# 'S' for type String, 'N' for Number.
FirstCol’ : {‘S’:str(FirstCol)},
SecondCol’: {‘S’:str(SecondCol)},
ThirdCol’: {‘S’:str(ThirdCol)},
ForthCol’: {‘S’:str(ForthCol)},
}
)
result = ‘Put succeeded:’
except Exception as err:
result = format(err)
return {
'body': result
}

Deploy — click the Orange ‘Deploy button’

Running and Testing — On the top banner select ‘Configure test events’ for defining the test script. You can use the default ‘Hello World’ template. You don’t even need to change the JSON. From now on you can use that test by clicking the ‘Test’ button at the top right.

In the ‘Execution Result’ tab beneath the Code block you should see a successful result.

You should also be able to see the logs on CloudWatch. Look for the log group created by /aws/lambda/...

By default Lambda functions have a Timeout set to 3 sec. If your dataset is large it will take more time than that. In the Configuration tab, scroll down to Basic Settings and then click the ‘Edit’ button. I changed mine to the max of 9 mins after I did some tests to see that my function works.

And that’s it! I really hope I wrote everything down correctly. If you see any mistakes, please let me know.

In my next story I’ll be tackling CORS and Amazon API Gateway. Fun stuff!

--

--