Data Engineer

Job description

Running a flexible Machine Learning engine at scale is hard. We must ingest and process large volumes of data uninterruptedly and store it in a scalable manner. The data needs to be prepared and served to hundreds of models constantly. All the predictions of the models, as well as other data pipelines, must be stored and reachable for our web application(s) to present the generated insights to our customers.
We work on the system that delivers this functionality and also allows the Machine Learning engineers to deliver new and improved models at ease, manage existing models, monitor these models, and many different interactions, all of which are crucial to day to day operations.
You will be working and interacting with a wide array of technologies that constitute Jungle's core systems (data handling/processing, serving ML models, etc...) and building the backend systems that provide access to all this functionality. You will have the possibility to work on and enhance the different stages of an end-to-end Machine Learning system at scale.

Who we are

Jungle develops and applies Artificial Intelligence to increase the uptime and performance of renewable energy sources. Built on existing sensors and data streams, the company’s technology enables solar and wind energy owners to squeeze more out of their assets, accelerating the world’s transition to renewable energy sources.

We have productised our services into a web application and are continuously improving it to ensure that our best analyses and visualisations help our users get the maximum energy out of their assets. We operate at a large scale - millions of data points per day - providing always-on predictive models, alarms and metrics visualisations for some of the largest and most sophisticated customers in the global renewable energy space. This is not your average dashboard, we’re talking about intelligently visualising handling large quantities of data to drive performant visualisations and functionality.

Why do we need you?

  • You’ll make use of modern open-source technologies in a practical use case to improve usability, performance and robustness of our internal system.
  • You’ll work together with the engineering team to maintain and improve existing systems, and overcome difficulties arising from scaling up our systems to more and more data. Some examples:
    • Contribute to the improvement of the new data backend that is used to efficiently serve large amounts of data to our product.
    • Improve our workflow orchestration and making it more reactive to live data ingestion events.
    • Improve observability of our systems to make sure that data flowing through our systems is in perfect conditions or otherwise notifying the team as early as possible.
  • You’ll make architectural decisions on how to solve our engineering challenges and keep us future proof.
  • You’ll research new (upcoming) technologies that will considerably improve the user experience and or development time of our products.

Why work with us?

  • Join a funded start-up.
  • Work with modern technologies (both in ML and software engineering).
  • You'll work on the last-mile delivery of our products, ensuring they have the best impact with our customers.
  • You have the opportunity to use your skills to create a meaningful change in this world.
  • Become part of a warm and skilled group of people, committed to each others success.
  • We care about your growth and assign you a personal mentor to help you achieve this.
  • As a remote-first company, we offer you a flexible work schedule, holiday policy, and work location.


  • Demonstrable experience (2+ years) in developing and maintaining software tools, backend services and ETL pipelines; ideally experience in Python.
  • You are knowledgeable of computer algorithms, data structures and statistics.
  • You have experience in building and maintaining scalable systems and infrastructure.
  • You have experience dealing with large volumes of data (data sets, data format, data storage).
  • You have worked with RDMBs (PostgreSQL, MySQL) or non-relational databases (Cassandra, MongoDB, Redis).
  • You have experience with distributed systems, like message queues (RabbitMQ, Kafka), distributed computing engines (Spark, Dask, Ray) and/or orchestration tools (Kubernetes).
  • Experience with workflow orchestration tools (Airflow, Argo, Dagster, etc).
  • You have a firm knowledge of Linux-based systems (or similar), ideally in a server/headless environment, wielding your shell as a weapon that imposes fear on GUI users.
  • Experience in agile environments and development workflows using git or similar tools, and CI/CD tools such as Gitlab CI or Jenkins.

Preferred Requirements (extra)

  • (Preferred) You have experience with Prometheus and Grafana.
  • (Preferred) You have experience with cloud based environments such as Amazon Web Services, Azure or GCP.
  • (Bonus) You have experience with Time Series data.
  • (Bonus) You understand what is a Machine Learning model and have experience with MLOps.
  • (Bonus) Your knowledge of networking goes beyond the basics, understanding routing, proxies, VPNs, NAT, service meshes and other more complex networking concepts.
  • (Bonus) You have experience using Terraform.
  • (Bonus) You have experience with configuration management tools like Ansible, Chef, Puppet, Saltstack, or similar.

About you

  • You are curious and won't stop searching until you find the answer.
  • You work meticulously. People around you trust your work results, rightly so.
  • You're pragmatic; you know when to trade off diving deep with quick fixes.