Site Reliability Engineer
About The Role
The NEAR AI engineering team is developing decentralized and confidential machine learning infrastructure to power user owned AI. We currently focus on building infrastructure to enable private and confidential inference that works across different compute providers, as well as a blockchain-based coordination layer that incentivizes computer providers to join the decentralized inference network.
You will own various components and drive critical decisions throughout their life cycles, including architecture, implementation, and maintenance. You will collaborate with highly knowledgeable and skilled colleagues who are passionate about solving hard problems that can disrupt the industry.
What You'll Be Doing:
- End-to-end infrastructure ownership (for handling telemetry data, for performing testing, etc)
- Design and implementation of infrastructure components that manage clusters of GPU with special configurations
- Performance tuning and optimizations
- Create and maintain runbooks that support the on-call rotation
- Participate in the on-call rotation.
- Support code releases and delivery
- Plan and implement infrastructure cost and security strategies
- Plan and implement effective CI/CD Pipelines to facilitate development processes
What We're Looking For:
- Agility to quickly learn new programming languages and technologies
- Ability to write clean and efficient code
- Ability to transform ambiguous problems into tangible solutions or prototypes
- Linux systems proficiency
- Experience with software concurrency or parallelism
- Experience in building, operating, and scaling Cloud infrastructure (GCP, AWS, etc)
- Experience with data visualization and observability tooling (Grafana, Graphite, Zabbix, etc)
- Detail-oriented mindset with a focus on setting priorities and progressing towards objectives
- Excellent communication and teamwork skills
- Bachelor's Degree in Computer Science or a related field
We'd Love If You Have:
- Experience with NEAR or other blockchain internals
- Experience with GPUs
- Experience with Trusted Execution Environments
- Experience debugging and troubleshooting complex concurrent systems
- Professional experience with Rust
Locations: onsite, San Francisco office
Create a Job Alert
Interested in building your career at Near AI? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field