Data Lake Solution Accelerator For Low Latency Data Processing

Business Problem

  • The requirement was to develop a data platform to collect, organize, process and provide insights into business, operational aspects and enable development of customer value added, data driven product features and dashboards.
  • ​Raw data was stored with no oversight of the contents​
  • The platform needs to have defined mechanisms to catalog, and secure data. Without these elements, data cannot be found, or trusted resulting in a “data swamp“.​


  • ​Real time streaming data from source systems (batch load scripts are also in place).​
  • Connectors developed for Oracle DB (PeopleSoft, Banner) and Workday.​
  • Uses Oracle DB Streams feature to identify changes from Redo Logs and stream to Kafka​
  • Data access is controlled with views set up in Apache Hive which is connected to data lake.​
  • ETLS run in loop and identify changed files (via Hive) and update Report Mart. Sample ET L scripts and reports developed for HR Diversity data. PostgreSQL acts as Report Mart.​
  • Data changes form source system are reflected in the reports within two minutes.​


  • Easier and quicker to populate as no transformation is involved ​
  • Allows to import any amount of data that can come in real-time​
  • Allows organizations to generate different types of insights including reporting on historical data​
  • Ability to store all types of structured and unstructured data​
  • Elimination of data silos​
  • Democratized access to data via a single, unified view of data​

Let's talk about
your next big project

Looking for a new career?

For all career & job related inquires Send your resumes to

Indian Employees For inquiries on background verification, PF, and any other information needed, please contact

USA Employees For inquiries related to employment/background verification please contact