AH aharon.haravon / senior engineering manager
Portfolio · 2026 · v05

Aharon
Haravon.

Senior Engineering Manager (DevOps). Twenty-six years of infrastructure — from Unix deployment tooling in 1999 to Kubernetes autoscaling and AI/LLM platforms today.

Tel Aviv · hybrid Currently: K Health · Infrastructure & DevOps Target: AWS inference infrastructure
Aharon Haravon on stage at HeapCon HeapCon · on stage

Infrastructure for probabilistic systems — where p99 latency and GPU cost are the only metrics that matter.

01
Kubernetes autoscaling
KEDA, HPA, Cluster Autoscaler, Spot.io, GPU-aware scheduling
02
AI / LLM platforms
GPU workloads, Langfuse, LLM observability, Claude-based automation
03
Cloud & on-prem K8s
AWS · GKE · Kops · Kubeadm · custom operators (Operator SDK)
04
Storage & data
Ceph, Rook, Vitess, Cassandra, Kafka, PostgreSQL
05
Engineering leadership
Hands-on architect & people manager; regulated medtech & analytics SaaS

A deep-dive on the autoscaling primitive that most directly applies to LLM serving infrastructure.

Featured article · Keda

Horizontal Autoscaling in Kubernetes

A walkthrough of KEDA-driven HPA: event-sourced scaling, custom scalers for queue depth, and the pitfalls that bite you once the autoscaler starts making decisions faster than humans can audit them.

KEDA HPA Custom metrics Scale-to-zero Event-driven
Read on Medium

Two HeapCon sessions — one on autoscaling in production, one on the shape of modern information systems.

■ HeapCon talk · 01 YouTube ↗

Auto-Scaling: The Force Awakens — and Nods Off

▶ PLAY
■ HeapCon talk · 02 YouTube ↗

Guide to the Galaxy — Structure of Modern Information Systems

▶ PLAY

Twenty-six years. From ERP deployment tooling in 1999 to GPU autoscalers on Kubernetes today.

Jun 2025 — Present

Senior Engineering Manager (DevOps)

K Health · AI/LLM-powered clinic platform

Lead globally distributed DevOps team of four (Israel · NJ · Bulgaria). On-prem (GKE) components — Langfuse, Flagsmith, Hush Security. Strengthened DR for GPU instances, Cloud SQL, GCS, Redis, GKE. LLM observability and GPU workload optimization. Built reusable Claude skills for DB queries, Redis triage, architecture scans, K8s scans, Datadog APM trace analysis.

GKEGCPLangfuseFlagsmithDatadogClaude
2020 — Jun 2025

Principal Software Architect & SRE Lead

Nuvo Group · Remote pregnancy monitoring (medical device)

Led a team of architects and SRE engineers. Owned end-to-end system architecture — extensive POCs to integrate CNCF components, custom Kubernetes operators, infrastructure tools. Hands-on RCA across all cloud environments; focus on reliability, scalability, and information security.

AWSKopsKEDACeph/RookVitessKafkaPrometheus
2017 — 2020

Chief Architect / Director of Engineering

Elminda · Brain disorder diagnostics (medical device)

Productized MatLab algorithm specifications into scale-out C++ implementations. Implemented H2O, TensorFlow, Kubeflow. Recruited and restructured engineering (dev, automation, DevOps). Cost-optimized ALM via on-prem Kubernetes CI/CD. Introduced architecture changes for cloud-agnostic, geo-distributed, regulatory-compliant systems.

KubernetesKubeflowTensorFlowC++AWSAnsible
2010 — 2017

Director of Analytics · Senior Team Manager

Clicktale (CX analytics SaaS) · AGT International / 3i-MIND (law-enforcement analytics)

At Clicktale: three dream-teams building SaaS behaviour analytics; performance/cost improvements enabling larger deals; ad-hoc real-time big data (HP Vertica), NewSQL POCs. At AGT OpenMind: core team building advanced analytics — Spark, Cassandra, graph analysis, ML/NLP — for investigative use.

SparkCassandraVerticaKafkaELKAkka
1999 — 2010

Senior Dev Manager · CTO · Infrastructure Lead

Top Image Systems (NASDAQ: TISA) · Data-Mall · Aviv Advanced Solutions (TASE: AVSO)

Earlier roles across enterprise software. .NET single-page + RESTful platform for mail-room automation with SAP R3 invoice integration. CTO at Data-Mall — structured CMS with ERP integrations and WYSIWYG SPA query/report designer. Cross-platform (Unix + Windows NT) packaging/deployment system for AvivERP.

.NETSAPJavaPerlUnix/Linux

What I build in my own time — infrastructure, PKI, home-lab Kubernetes, and companion demos for the writing above.

▸ AUTOSCALING
Companion demo for the Medium article and the HeapCon talk on auto-scaling.
▸ CLOUD · TERRAFORM
Personal AWS + Cloudflare infrastructure, Terraform-managed end-to-end.
▸ PKI
Simple Certificate Authority — PKI made easy, with YubiKey support.
▸ ARM K8S HOME LAB
ARM Kubernetes cluster at home — Rook-Ceph storage, IPsec VPN, PKI, all Ansible-driven.
▸ NETWORKING
Ansible collection for OpenWRT routers — VPN, firewall, ACME, DDNS.
▸ FULL-STACK PRODUCT
FastAPI image API · YOLO/CNN edge detection · Android client · Next.js frontend.

Things I've learned the expensive way.

01

Treat autoscaling as a control system, not a config file.

02

Latency is a distribution. Design for p99, not the mean.

03

The cheapest GPU is the one you didn't spin up.

04

Observability before optimization. Always.

05

A hands-on architect who can't still read a stack trace is just a diagram-maker.

Education, languages, and what else.

▸ EDUCATION
Theoretical Physics
B.Sc. studies, Belgrade University · 1992 — 1999
▸ AWARD · 1992
Gold Medal
Serbian National Competition in Programming (high-school).
▸ LANGUAGES
EN · HE · SR
English (fluent) · Hebrew (fluent) · Serbian (native).
▸ OFF-HOURS
Guitar · Ableton · 3D printing
Electronic music production on Ableton Live and Push. 3D-printing enthusiast.
~/portfolio $ whoami --verbose
Aharon Haravon
file: identity.jpg
encoding: grayscale
subject: aharon haravon
tag: senior-eng-mgr

Aharon Haravon

senior_engineering_manager // devops // ai_platform

Twenty-six years of infrastructure — from Unix deployment tooling in 1999 to Kubernetes autoscaling and LLM platforms today. Still hands-on.

OPEN TO ANTHROPIC · AWS INFERENCE INFRA
// identity.yaml HEAD
name: Aharon Haravon
location: Tel Aviv, IL · hybrid
experience: 26+ years
current: K Health · Sr. Eng. Manager (DevOps)
stack_now: GKE · Langfuse · Flagsmith · Datadog · Claude
stack_prev: AWS · Kops · KEDA · Ceph · Vitess · Kafka
languages: EN · HE · SR
edu: B.Sc. studies, Theoretical Physics, Belgrade
[ exp_01 ]
Kubernetes
autoscaling
KEDA, HPA, Cluster Autoscaler, Spot.io, GPU-aware scheduling
[ exp_02 ]
AI / LLM
platforms
GPU workloads, Langfuse, LLM observability, Claude automation
[ exp_03 ]
Cloud &
on-prem K8s
AWS · GKE · Kops · Kubeadm · custom operators
[ exp_04 ]
Storage
& data
Ceph, Rook, Vitess, Cassandra, Kafka, PostgreSQL
[ exp_05 ]
Engineering
leadership
Hands-on architect & people manager; medtech & SaaS

# technical_work

01 / article · 01 / talk
keda-scaledobject.yaml READ-ONLY
# The control-system idea from the Medium article,
# distilled down to the config that makes it real.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: llm-inference-serve
spec:
  scaleTargetRef:
    name: vllm-gateway
  minReplicaCount: 2
  maxReplicaCount: 80
  pollingInterval: 5
  cooldownPeriod: 120
  triggers:
    # queue depth drives scale-out faster than CPU can
    - type: prometheus
      metadata:
        query: sum(vllm_queue_depth)
        threshold: "8"
    - type: prometheus
      metadata:
        query: p99(vllm_latency_seconds)
        threshold: "1.2"
▸ Featured · Article

Horizontal Autoscaling in Kubernetes

Event-sourced scaling, custom scalers for queue depth, and the pitfalls that bite once the autoscaler makes decisions faster than humans can audit.

MEDIUM FEATURED · kube.today #74
read article ↗
▸ Relevance to AWS · Anthropic

Event-driven autoscaling is the exact primitive an inference platform uses to absorb bursty, uneven traffic without overprovisioning GPUs.

# speaking_&_writing

02 / recorded_talks
REC · HeapCon · 01 HD · YT
▶ PLAY
▸ HEAPCON · AUTOSCALING

Auto-Scaling: The Force Awakens — and Nods Off

Event-driven autoscaling in production: the failure modes, the cost math, and what changes when your pods are GPU-bound instead of CPU-bound.

venue
HeapCon
topic
autoscaling
audience
platform eng · SRE
format
technical deep-dive
watch on youtube ↗
REC · HeapCon · 02 HD · YT
▶ PLAY
▸ HEAPCON · ARCHITECTURE

Guide to the Galaxy — Structure of Modern Information Systems

A field guide to how modern information systems actually fit together: the layers, the seams, and the tradeoffs that decide whether a platform survives real traffic.

venue
HeapCon
topic
architecture
audience
architects · eng leaders
format
field guide
watch on youtube ↗

# experience.log

05 / roles · 26+ yrs
2025-06 → now
Senior Engineering Manager (DevOps) / K Health
Lead distributed DevOps team of 4 (IL · NJ · BG). GKE on-prem components (Langfuse, Flagsmith, Hush Security), GPU workload optimization, LLM observability. Reusable Claude skills for DB queries, Redis triage, K8s and Datadog scans.
GKEGCPLangfuseFlagsmithDatadogClaude
2020 → 2025-06
Principal Software Architect & SRE Lead / Nuvo Group
Regulated medical-device cloud. Led architects and SRE team. End-to-end architecture, CNCF integration, custom K8s operators, hands-on RCA across all cloud environments.
AWSKopsKEDACeph/RookVitessKafkaPrometheus
2017 → 2020
Chief Architect / Director of Engineering / Elminda
Brain disorder diagnostics. MatLab → scale-out C++ productization. H2O, TensorFlow, Kubeflow. On-prem Kubernetes CI/CD. Cloud-agnostic, geo-distributed, regulatory-compliant.
KubernetesKubeflowTensorFlowC++AWSAnsible
2010 → 2017
Director of Analytics · Senior Team Manager / Clicktale · AGT International (3i-MIND)
Clicktale: three teams building CX analytics SaaS; HP Vertica, NewSQL POCs. AGT OpenMind: core team for law-enforcement advanced analytics with Spark, Cassandra, graph, ML/NLP.
SparkCassandraVerticaKafkaELKAkka
1999 → 2010
Senior Dev Mgr · CTO · Infrastructure Lead / Top Image Systems (TISA) · Data-Mall · Aviv (AVSO)
.NET SPA + REST platform with SAP R3 invoice integration. CTO at Data-Mall — structured CMS with ERP integrations. Cross-platform (Unix + NT) packaging/deployment system for AvivERP.
.NETSAPJavaPerlUnix/Linux

# projects.git

github.com/aharonh · github.com/harley-systems
aharonh
k8s-autoscaling-demo / companion to the article & talk
Runnable demo for the Horizontal Autoscaling in Kubernetes article and the HeapCon auto-scaling talk.
KEDAHPAShell
harley-systems
cloud-infra / personal AWS + Cloudflare
Terraform-managed AWS VPC, EC2, S3, IAM, Route53, plus Cloudflare zones & DNS. CI/CD via GitHub Actions self-hosted runner.
TerraformAWSCloudflareGitHub Actions
harley-systems
lab-infra / ARM Kubernetes home lab
Physical ARM cluster with Rook-Ceph storage, IPsec VPN to cloud gateway, full Ansible lifecycle, PKI-backed everything.
KubernetesRook-CephStrongSwanAnsible
harley-systems
sca / simple certificate authority
PKI CA in Bash — keys, CSRs, signing, YubiKey-backed hardware tokens. Designed for small-team PKI without the operational weight.
BashOpenSSLYubiKeyPKCS#11
harley-systems
magnet-tiles · suite / API · ML · Android · Next.js
Full product stack for magnet-tile board detection: FastAPI image API, YOLO + CNN edge detection, Android client, Next.js frontend.
FastAPIYOLOKotlinNext.js
harley-systems
ansible-collection-openwrt / router automation
Ansible roles for OpenWRT: VPN, firewall, DHCP/DNS, ACME certificates, DDNS, StrongSwan IPsec.
AnsibleOpenWRTStrongSwan
§ 01Treat autoscaling as a control system, not a config file.
§ 02Latency is a distribution. Design for p99, not the mean.
§ 03The cheapest GPU is the one you didn't spin up.
§ 04Observability before optimization. Always.
§ 05A hands-on architect who can't read a stack trace is just a diagram-maker.

01Technical Work

Featured · Writing

02Speaking

02 recorded talks · HeapCon
■ HEAPCON · 01 YT ↗
HEAPCON · AUTOSCALING

Auto-Scaling: The Force Awakens — and Nods Off

Event-driven autoscaling in production: the failure modes, the cost math, and what changes when your pods are GPU-bound instead of CPU-bound. Written for platform engineers who have to make the scaler behave under real traffic.

Watch on YouTube ↗
■ HEAPCON · 02 YT ↗
HEAPCON · ARCHITECTURE

Guide to the Galaxy — Structure of Modern Information Systems

A field guide to how modern information systems actually fit together: the layers, the seams, and the tradeoffs that decide whether a platform survives real traffic.

Watch on YouTube ↗

03Experience

26+ years · 05 rows
Jun 2025 — Now

Senior Engineering Manager (DevOps)

K Health · AI/LLM clinic platform

Lead distributed DevOps team of four (IL · NJ · BG). GKE on-prem components (Langfuse, Flagsmith, Hush Security), GPU workload optimization, LLM observability. Built reusable Claude skills for DB queries, Redis triage, K8s and Datadog scans.

GKEGCPLangfuseFlagsmithDatadogClaude
2020 — Jun 2025

Principal Software Architect & SRE Lead

Nuvo Group · remote pregnancy monitoring (medical device)

Led architects and SRE engineers. End-to-end architecture — CNCF integration, custom Kubernetes operators, infrastructure tools. Hands-on RCA across all cloud environments; focus on reliability, scalability, security.

AWSKopsKEDACeph/RookVitessKafka
2017 — 2020

Chief Architect / Director of Engineering

Elminda · brain disorder diagnostics (medical device)

MatLab → scale-out C++ algorithm productization. H2O, TensorFlow, Kubeflow. On-prem Kubernetes CI/CD. Architecture for cloud-agnostic, geo-distributed, regulatory-compliant systems.

KubernetesKubeflowTensorFlowC++AWS
2010 — 2017

Director of Analytics · Senior Team Manager

Clicktale (CX analytics SaaS) · AGT International / 3i-MIND

Clicktale: three teams building SaaS behaviour analytics; HP Vertica, NewSQL POCs. AGT OpenMind: core team for law-enforcement advanced analytics with Spark, Cassandra, graph, ML, NLP.

SparkCassandraVerticaKafkaELKAkka
1999 — 2010

Senior Dev Mgr · CTO · Infrastructure Lead

Top Image Systems (TISA) · Data-Mall · Aviv (AVSO)

.NET SPA + REST platform with SAP R3 invoice integration. CTO at Data-Mall — structured CMS with ERP integrations and WYSIWYG SPA designer. Cross-platform packaging/deployment system for AvivERP (Unix + NT).

.NETSAPJavaPerlUnix/Linux

04Open-source & personal projects

github.com/aharonh · github.com/harley-systems
aharonh

k8s-autoscaling-demo

companion to the article & HeapCon talk

Runnable demo for the Horizontal Autoscaling in Kubernetes article and the HeapCon auto-scaling talk.

KEDAHPAShell
harley-systems

cloud-infra

personal AWS + Cloudflare infrastructure

Terraform-managed AWS VPC, EC2, S3, IAM, Route53, plus Cloudflare zones & DNS. Self-hosted GitHub Actions runner for CI/CD.

TerraformAWSCloudflare
harley-systems

lab-infra

ARM Kubernetes home lab

Physical ARM cluster with Rook-Ceph storage, IPsec site-to-site VPN to a cloud gateway, PKI-backed everything, full Ansible lifecycle.

KubernetesRook-CephStrongSwanAnsible
harley-systems

sca · simple certificate authority

PKI made easy, YubiKey-backed

Small-team PKI without the operational weight: keys, CSRs, signing, hardware tokens via PKCS#11.

BashOpenSSLYubiKey
harley-systems

magnet-tiles · API · ML · Android · Next.js

full-stack product exercise

Magnet-tile board detection — FastAPI image processing API, YOLO + CNN edge detection, Android client, Next.js frontend.

FastAPIYOLOKotlinNext.js
harley-systems

ansible-collection-openwrt

router automation

Ansible roles for OpenWRT routers: VPN, firewall, DHCP/DNS, ACME certificates, DDNS, StrongSwan IPsec.

AnsibleOpenWRTStrongSwan

05Operating Principles

Things I've learned the expensive way
§ 01Treat autoscaling as a control system, not a config file.
§ 02Latency is a distribution. Design for p99, not the mean.
§ 03The cheapest GPU is the one you didn't spin up.
§ 04Observability before optimization. Always.
§ 05A hands-on architect who can't read a stack trace is just a diagram-maker.

Tweaks