Principal Backend Development Engineer
About Us
-
Multi-stage Engine Development: Own the development and refactoring of high-concurrency, low-latency recommendation serving engines, powering the full pipeline of multi-channel recall (two-tower/collaborative filtering/ANN vector retrieval) → coarse ranking → fine ranking → re-ranking.
-
Compute Tiering & Strategy Engine: Implement dynamic compute trimming and degradation mechanisms for dynamic Ul personalization and global strategy dispatch, ensuring core engine stability under extreme traffic spikes.
-
Real-time Feature Pipeline: Build high-throughput, low-latency real-time feature streams on Kafka/Flink, enabling minute-level/second-level user pehavioral feature updates and dynamic sliding-window aggregations.
-
Feature Store: Contribute to the development of unified online/offline feature storage with stream-batch convergence architecture; deeply govern industrial-grade pain points such as "online/offline feature inconsistency" and "feature time-travel leakage," systematically improving online/offline feature consistency.
-
Vector Retrieval System: Own the construction and optimization of large-scale vector retrieval systems (Faiss/Milvus/NSW) supporting candidate pools from tens of thousands to milions, lead index structure parameter tuning, achieving P99 retrieval latency < 100ms.
-
Multi-source Heterogeneous Indexing: Build unified Embedding Pipelines and high-performance inverted index services covering trading products, news, KOL content, on-chain signals, and other heterogeneous data sources.
-
Deep Model Engineering: Own high-performance online deployment and operator optimization of complex deep ranking models (e.g., DIN/SIM sequential models, MMoE/PLE multi-objective deep models).
-
Inference Graph & Compute Governance: Solve compute explosion in multi-task/multi-objective online inference through inference optimization (quantization, graph optimization, batching strategies) - targeting fine-ranking P99 latency < 200ms.
-
High Availability & SLO: Ensure global recommendation service P99 latency < 200ms and system availability > 99.9%; write comprehensive overload protection, thread isolation, and disaster recovery degradation code.
-
Global Distributed Tracing: In complex multi-region/multi-site/multi-language cross-border IDC environments, build and maintain end-to-end distributed tracing and monitoring systems (e.g.,Jaeger/Prometheus), establishing minute-level root cause attribution and fault localization mechanisms.
-
Experimentation Platform Engineering: Contribute to A/B experiment traffic-splitting platform upgrades, engineering CUPED variance reduction and sequential testing mechanisms to reduce sample size requirements and accelerate algorithm iteration pipelines.
-
Industrial-grade Hands-on Experience: 5+ years of recommendation system engineering at consumer-scale internet companies; must have deeply participated in or led architecture refactoring or launches of realtime recommendation systems serving tens of millions of users at scale. Proven experience building recommendation systems from 0→1 (not just maintaining mature systems) is strongly preferred.
-
Mastery of Engineering Foundations: Exceptionally solid low-level CS fundamentals; proficient in at least one of Go/Java/C++ (Go preferred given current stack; C++ experience a plus for future engine optimization); familiar with PyTorch/TensorFlow model online inference and deployment optimization.
-
Deep Expertise in Big Data & Retrieval: Hands-on with Spark/Flink/Kafka stack, with real experience solving stream computing latency and data backlog issues; proficient in Milvus/Faiss cluster deployment and tuning.
-
Engineering Perspective on Algorithms: Deep understanding of computational complexity and online bottlenecks of core algorithms (collaborative filtering, two-tower recall, multi-objective optimization MMoE/PLE); ability to interface smoothly with algorithm teams for high-quality, efficient engineering translation.
-
Excellent System Design Capability: Proven experience in actual development and core module design of recommendation platforms, feature platforms, experimentation platforms, or high-performance RPC frameworks at top-tier companies.
Why Join Us
At Bybit, we are committed to fostering a supportive and enriching work environment.
Our benefits include:
- Study Growth Fund: We support your professional development and continuous learning.
- Internal Events: Participate in regular team-building activities, workshops, and events designed to promote collaboration and innovation.
- Global Collaboration: Be part of a diverse, international team, working alongside colleagues from around the world.
- Career Advancement: Access opportunities for growth and advancement within a rapidly expanding global company.
- Internal Mobility: Grow with us- Your long-term development is important to us. We offer internal job opportunities to help build your career path.
Apply for this job
*
indicates a required field

