Commit graph

10 commits

Author SHA1 Message Date
Felix Wolf d65181de78 fix(ocis-backup): Fix S3 backup permissions and update config IDs
Adds `fsGroup` to the S3 backup cronjob's security context to ensure proper volume ownership. Increases the SSH key secret's `defaultMode` to grant group read access, resolving permission failures when reading the SSH key.
2026-05-03 02:16:02 +02:00
Felix Wolf 33c52be1c5 feat(pss): drop 5 namespaces from PSS privileged to restricted
argocd, cert-manager, cloudnative-pg already compliant — label flip only.
ocis: add overlay injecting seccompProfile=RuntimeDefault, drop ALL caps,
allowPrivilegeEscalation=false across all chart Deployments/CronJobs;
patch idm initContainer; harden custom precheck Job; refactor s3-backup
to rclone/rclone image (avoids apk-add-as-root).
victoria-metrics-single: overlay sets full restricted SC on the StatefulSet
that ships with empty securityContext: {}.

forgejo, traefik, kube-system stay privileged (hostPort / CSI driver).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 01:24:59 +02:00
Felix Wolf 85b8fec6b3 feat: replace secret-init Jobs with mittwald operator + cert-manager
Migrate ~180 LOC of openssl/kubectl init Jobs to declarative Secret
manifests reconciled by mittwald/kubernetes-secret-generator (random
strings, SSH keypair) and cert-manager Certificates (RSA private key +
self-signed CA chain). mittwald only fills empty fields, so existing
populated Secrets keep their current values across the migration.

Changes:

- New prototype kubernetes-secret-generator (chart 3.4.1, mittwald helm
  repo). Cluster-wide informer reconciler, no webhook -> cold-bootstrap
  safe via ArgoCD retries.
- New cert-manager selfsigned ClusterIssuer (in-cluster trust root).
  letsencrypt remains for public-DNS endpoints.
- forgejo: admin-secret Job replaced with a mittwald-annotated Secret
  (hex-encoded 24-char password). Deploy-key Job split: mittwald
  ssh-keypair Secret + slim Job that uploads pubkey to Forgejo and
  copies privkey into the argocd repo Secret.
- ocis: 13 Secrets / 16 random fields now mittwald-managed (UUIDs
  replaced with opaque random hex; ocis treats user-id as opaque). IDP
  RSA signing key, LDAP self-signed CA, and LDAP server cert produced
  by cert-manager. Per-Deployment ytt overlay remaps volume key paths
  (tls.crt -> ldap-ca.crt, tls.key -> private-key.pem, etc.) since the
  ocis chart mounts Secrets raw without items support. Old multi-secret
  s3-secret-job replaced with a slim external-secret precheck Job that
  only validates pre-created Hetzner S3/Storage Box credentials.
- Application sync-wave -10 on cert-manager and kubernetes-secret-
  generator so they install before consumers. ArgoCD selfHeal handles
  any residual races.

CLAUDE.md: remove the "all namespaces use privileged PodSecurity"
convention. Existing namespaces still carry the label and will be
audited separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:00:07 +02:00
Felix Wolf 9112153e8a fix(ocis): resolve large file upload timeouts and enable stale upload cleanup
Increase Traefik readTimeout from 600s to 3600s to prevent connection drops during large uploads, and enable the suspended cleanUpExpiredUploads CronJob so stale TUS sessions are automatically purged.
2026-04-24 20:12:24 +02:00
Felix Wolf f57d29d1d3 chore: update service resource requests and identifiers
Increases memory requests for the IDM and NATS services to enhance stability and performance.
Updates application, service account, and storage UUIDs in configuration maps, reflecting a re-initialization or re-rendering of OIDC settings.
2026-04-12 18:26:47 +02:00
Felix Wolf f442255833 feat: configure storageusers resources and anti-affinity
Assigns specific CPU and memory requests and limits to the storageusers service to ensure stable operation and efficient resource utilization.

Introduces pod anti-affinity for storageusers to prevent it from being scheduled on the same node as victoria-metrics-single, improving resilience and preventing potential resource contention.
2026-04-06 16:39:24 +02:00
Felix Wolf 1122c3f0e2 feat: Implement S3 to Storage Box backup
Introduces a daily Kubernetes CronJob that utilizes rclone to perform compressed backups of oCIS S3 data to a Hetzner Storage Box via SFTP.

This new backup mechanism requires the manual creation of an 'ocis-storagebox-credentials' secret, which holds the Storage Box host, user, and SSH private key. A check is added to the secret initialization job to ensure this essential external secret exists.
2026-04-06 15:24:14 +02:00
Felix Wolf a3143ac33c feat: Configure Ocis for Hetzner Cloud storage
Sets `hcloud-volumes` as the default storage class for Ocis components including storageusers, storagesystem, and idm.
2026-04-06 14:25:35 +02:00
Felix Wolf 4e48df73d3 feat(ocis): Transition to oCIS and enhance deployment
Removes the full Nextcloud stack (PostgreSQL/CNPG, Valkey, Caddy) and
  deploys oCIS at drive.tr1ceracop.de. oCIS is self-contained — no
  external database or cache needed.

  Key design decisions:
  - S3ng storage backend on Hetzner Object Storage (ocis-tr1ceracop)
  - Chart fetched via vendir git source (not published to a Helm repo)
  - All secrets generated in-cluster via PreSync init Job (never in git)
  - Memory requests on all pods to prevent node overcommit
  - Persistence on local-path for metadata (idm, nats, search, storage)

  Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 14:01:55 +02:00
Felix Wolf ffa171bfb0 feat: Replace Nextcloud with oCIS (ownCloud Infinite Scale)
Removes the full Nextcloud stack (PostgreSQL/CNPG, Valkey, Caddy sidecar)
and replaces it with oCIS at drive.tr1ceracop.de. oCIS is self-contained
(no external DB/cache needed) with S3ng storage backend on Hetzner Object
Storage (bucket: ocis-tr1ceracop). Chart sourced from git via vendir since
it is not published to a Helm repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 20:19:54 +02:00