Data Engineer

Job Context:

We are seeking a senior Data Engineer to design, build, and operate scalable data pipelines and curated data assets that enable AI-driven workflows and business-critical automation. This role is focused on production-grade delivery, data quality, governance, and integration across enterprise systems, supporting outcomes such as cycle time reduction, productivity gains, cost reduction, and improved operational decisioning. The Data Engineer will work closely with architecture, platform engineering, AI engineers, and business stakeholders to ensure data is accessible, reliable, secure, and fit for operational and AI-enabled use cases.

Job Responsibilities:

Design and implement scalable data pipelines (batch and near real-time) to support AI and automation use cases.
Build and maintain robust ingestion processes across structured, semi-structured, and unstructured sources.
Implement data transformation, cleansing, validation, and quality controls aligned with enterprise standards.
Provide curated, reusable datasets and interfaces to enable AI use cases, including ingestion and indexing flows used for retrieval-based AI patterns (RAG) where applicable.
Define and enforce data contracts, schema management, and versioning to support reliable downstream consumption.
Implement data preparation logic for RAG, including semantic chunking strategies, metadata enrichment, and approaches for handling incremental updates and re- indexing.
Collaborate with AI Engineers to define indexing schemas and metadata strategies that optimize retrieval performance.
Collaborate with AI Engineers to ensure source data and retrieval pipelines are production-ready, measurable, and aligned with delivery objectives.
Implement observability for pipelines (freshness, completeness, accuracy, lineage signals where applicable).
Troubleshoot pipeline failures and performance bottlenecks and ensure stable production operations.
Ensure alignment with internal governance requirements (access control, data privacy constraints, auditability).

Educational Requirements:

Bachelor’s or Master’s in Computer Science, Software Engineering, Information Systems, Data Engineering, or related field.
Experience with data pipelines, SQL, cloud platforms is key.

Required Skills and Experience:

Strong experience building and maintaining data pipelines in production environments.
Expert-level SQL skills and solid understanding of enterprise data modeling concepts.
Advanced Python experience for pipeline development and automation.
Experience processing semi-structured and unstructured data (for example PDFs, JSON payloads, Markdown/text corpora), including preparing data for consumption by LLM- enabled applications and retrieval systems.
Experience working with APIs, enterprise systems, and heterogeneous data sources.
Strong understanding of data quality patterns (validation rules, anomaly detection, automated checks).
Ability to operate with high ownership and accountability for data reliability in production.
Strong troubleshooting skills for performance, failures, and operational constraints.

Nice to Have (Senior Differentiators)

Understanding of retrieval and indexing workflows supporting AI applications, including vector-ready ingestion patterns and retrieval enablement.
Experience supporting vectorization pipelines and embeddings-based retrieval workflows.
Experience with distributed data processing and performance optimization (for example Spark/Delta Lake-style processing), particularly where high volume and concurrency are required.
Familiarity with orchestration and workflow tooling (for example Airflow, Dagster, or Data Factory-style patterns).
Experience with cloud-native implementations (Azure-first where applicable) and enterprise governance constraints.
Familiarity with data governance practices (access policies, audit requirements, lineage approaches).
Experience with production monitoring patterns, including service-level objectives (SLOs) for data freshness and pipeline stability.

Technology Stack (Indicative):
(Final stack and tooling will follow internal standards and may vary by domain.)

Languages

Python (advanced)
SQL (expert)

Data Platforms / Storage (examples)

Azure SQL and enterprise databases
Data lake / lakehouse patterns (Delta Lake-style where applicable)
Document storage and indexing patterns supporting unstructured data

Orchestration / Processing (examples)

Airflow, Dagster, or Azure Data Factory-style orchestration
Distributed processing frameworks (Spark-style processing where applicable)

Integration & Messaging (examples, where applicable)

API-based ingestion
Event streaming / messaging patterns (Kafka/Event Hubs-style where applicable)

Operations

CI/CD pipelines and release management
Observability and monitoring for pipelines and services

Cross-Cutting Expectations

Delivery mindset with focus on measurable outcomes and operational stability.
High ownership for reliability, quality, and performance.
Ability to collaborate across engineering, architecture, and business stakeholders.
Clear documentation, structured communication, and predictable execution.

Success Measures (Examples)

Data pipelines deliver reliable, validated data aligned to agreed freshness and quality expectations.
Reduced manual effort and operational friction through stable automation and data availability.
Production issues are detected early through monitoring and resolved with structured root cause analysis.
AI and automation teams are enabled by consistent, reusable datasets and retrieval- ready ingestion flows.
Implementation of automated data quality gates (“quality as code”) that prevent broken or incomplete data from reaching downstream AI workflows (including retrieval/vector indexing), reducing manual cleanup and operational rework.

Workplace:

Remote

Working Hours

2:00 PM to 11:00 PM (BD Time)

Salary:

Negotiable (Based on experience and skills)

Benefits:

General Leave: 10 days
Festival Bonus (2), basic 100%
Weekly 2 holidays (Sat & Sun)
Annual Salary Review
PTO Benefits
Resignation &/or Termination Benefits: 1 month

The Application Process:

Telephone Round.
Interview with the Client.
Final Interview with the People & Culture.
Job Offer.

NB: Only shortlisted candidates will be communicated in the recruitment process.

Application Form

Informative

Personal Details

Experience

Add Riseup Labs to Homescreen