**This is what my team have done while building the serverless architecture:**
We wanted to build a serverless data pipeline for coding medical charts using NLP. However, we didn't want it to be real-time (which is most serverless systems). So, we used a queue and a monitoring system's (AWS CloudWatch) alarms to pull off the serverless, batch processing pipeline.
Next, we wanted to make it a serverless, batch, distributed pipeline. So, we made use of Ansible and made the Master-Workers architecture. However, AWS Lambda has a time-limit of 5 minutes. But our entire NLP pipeline flow takes 30 minutes to complete.
So, we stumbled upon an idea wherein we create a Master server via Lambda and run Ansible in nohup mode. And then, we learned some very important lessons while doing nohup monitoring.
Now, we realized that Ansible can terminate the workers once the tasks are completed, but we want to delete the master also. So, we again built the Ansible playbook such that the Master kills itself once the workers are terminated.
Also, we built a serverless API for querying the results of the data-pipeline, using AWS Lambda and API Gateway. And, all this have to be built keeping in mind the HIPAA compliance, which means that the data needs to be encrypted both at rest, and in motion.
So, along the way of building this complete architecture, we experienced a lot of gotcha moments, failures, huge wins and pitfalls which taught us very important lessons. This talk would very briefly explain those bullet-points after the hands-on demo.
This talk would include a quick spin up of a simpler serverless Lambda function in AWS [hands-on], and explain the concept of serverless, and the gotchas and pitfalls to keep in mind while opting for a serverless architecture
Raj works as a Senior Data Scientist.
His job includes building ML algorithms, architecting data pipelines, staring at endless Linux logs and building the DevOps team.
Raj is the author of the Julia lang cookbook and the DevOps moderator at StackOverflow