DataGenic Goes Serverless! An interview with Kowshik, a DataGenic senior software developer

Cloud computing has matured into the cornerstone of enterprise technology innovation. The ability for an organization to host their own data centers, applications, and processes online has transformed today’s business environment. As part of a continued process to automate capabilities, increase scalability, and leverage cloud computing, DataGenic has begun to transition its data ingestion processes from running java code on internal hardware to utilizing Lambda functions in a serverless environment. Kowshik Nandagudi Sudheendra Rao, a senior software developer for DataGenic, describes the reason for this transition, and the benefits DataGenic and clients will realize from leveraging this technology.

1. How did you initially learn about Lambda serverless technology?

I first heard about Lambda when Colin, our CTO, went to Amazon’s Re-invent Conference in 2014. Since then, AWS’s Lambda has evolved in terms of scaling, triggers, automation, security and the languages it supports. I was intrigued by the thought of leveraging Lambda’s serverless framework as a way to gain operational efficiencies and reduce costs on the provisioning and management of hardware. Because Lambda is a framework for Functions-as-a-Service, we only pay for the actual resources used, rather than paying for pre-purchased server capacity.

2. How are you using Lambda?

We use Lambda for our data ingestion process which automates our extract, transforms the data, enriches it with metadata and distributes our data sets. With Lambda, we migrate all those Java functions into Lambda and orchestrate into Step Functions to complete the same flow and distribute files.

Lambda Workflow for CME Data Ingestion

Lambda Workflow for CME Data Ingestion

Our developers take Java, C# or Python scripts, upload them into AWS Lambda using visual workflows, and then specify settings to ensure there are minimum required computing resources to support the function and that the timing for running the function matches a user’s workflow. Each Lambda function then interacts using Step Functions, a framework to join Lambdas, execute the requirements, and create a workflow to automate the data ingestion process. This makes our data ingestion process more efficient and reduces manual errors.

3. What benefits do you see from using serverless technology?

The most important benefits are that it helps us scale our business and deliver data faster to clients. DataGenic has approximately 2,500 Data Ingestion jobs that all run in the same time window at the end of day. Because we require dedicated hardware to run the jobs, we must scale up hardware capacity to match our peak processing time. However, most of our capacity goes unused during non-peak times. Serverless technology saves us money on hardware, physical space, man hours, electricity and other costs associated with physical servers.

Today, our peak processing time is currently after trading closes in London. As our business grows and the number of processes increases, we have two options to scale: we can buy more servers or we can stagger our processes to run sequentially. However, these options lead to increased costs or slower delivery of data. With Lambda, we no longer have to shift the processing queue to meet priorities because we can simultaneously run as many processes as needed, allowing us to release information on a timelier basis. This will benefit our clients as well as us.

4. Are there security concerns?

Given that we operate in energy data management for commodity trading and risk management, security is the cornerstone of our business. Serverless technology still provides a secure environment through AWS’ virtual private connection. This means that the function a user codes can be associated to DataGenic’s specific virtual cloud. Instead of a dedicated physical data center, it is now a dedicated Virtual Data Center.

5. You mentioned to me that by mid-2018 you anticipate having all data ingestion in the cloud. Is there another evolution of the processes after that point? What is next?

Yes, by mid-2018 we look forward to processing data out to our clients in seconds versus hours. That will be very exciting in and of itself! I’ve also begun testing the use of Alexa in hopes of tying voice recognition software to our serverless environment. Rather than monitoring a dashboard to see when a job needs to be run and then logging into the application to run the job, I can simply tell Alexa to run my job. I’m also testing Alexa’s ability to read out statistics on my feeds so I don’t even need to view the dashboard.