Howard University - Data Lake Solution Accelerator for Low Latency Data Processing

Business Problem

The requirement was to develop a data platform to collect, organize, process and provide insights into business, operational aspects and enable development of customer value added, data driven product features and dashboards.

Raw data was stored with no oversight of the contents​

The platform needs to have defined mechanisms to catalog, and secure data. Without these elements, data cannot be found, or trusted resulting in a “data swamp “.​


Real time streaming data from source systems (batch load scripts are also in place).​

Connectors developed for Oracle DB (PeopleSoft, Banner) and Workday.

Uses Oracle DB Streams feature to identify changes from Redo Logs and stream to Kafka​

Data access is controlled with views set up in Apache Hive which is connected to data lake.​

ETLS run in loop and identify changed files (via Hive) and update Report Mart. Sample ET L scripts and reports developed for HR Diversity data. PostgreSQL acts as Report Mart.​

Data changes form source system are reflected in the reports within two minutes.​


Easier and quicker to populate as no transformation is involved

Allows to import any amount of data that can come in real-time​

Allows organizations to generate different types of insights including reporting on historical data​

Ability to store all types of structured and unstructured data​

Elimination of data silos​

Democratized access to data via a single, unified view of data​

Let's talk about
your next big project

Looking for a new career?

For all career & job related inquires Send your resumes to

Indian Employees For inquiries on background verification, PF, and any other information needed, please contact

USA Employees For inquiries related to employment/background verification please contact