# Scaling and Data Storage

This document describes where critical data (including user credentials) is stored and how to scale the system for larger institutions (e.g. ~10,000 users). Treat this as a critical reference for infrastructure and security.

---

## 1. Where data is stored today

### 1.1 User credentials (authentication)

**Location: database only.**

| What | Where | Notes |
|------|--------|------|
| **Passwords** | `users` table, `password` column | **Hashed** only (Laravel `Hash::make()` → bcrypt/argon). Plain-text passwords are never stored. |
| **Remember token** | `users` table, `remember_token` | Used for “remember me” sessions. |
| **Email (login identity)** | `users` table, `email` | Unique; used for login. |
| **Roles, status, profile** | `users` table (+ related tables) | All in the database. |

- **Hashing:** Passwords are hashed in application code (e.g. `Hash::make($request->password)`) before being saved. They are never written to files or logs.
- **Visibility:** The `User` model has `password` and `remember_token` in `$hidden`, so they are not included in JSON/API responses.
- **Scaling:** Credentials stay in the database as you scale. Ensure the database is backed up, access is restricted, and (if required) consider encryption at rest and TLS in transit.

**Summary:** User credentials live **only in the database**; passwords are stored as one-way hashes, not plain text.

---

### 1.2 Documents and media (files)

**Location: filesystem (local storage), not in the database.**

| What | Where | In database |
|------|--------|-------------|
| POE evidence / submission files | `storage/app/public/` (e.g. under `poe-evidence/`) | Only **paths** (e.g. `file_path`, `file_name`) in `poe_evidence` (or similar) table. |
| Completed assessor tool PDFs | `storage/app/public/assessor-tools/completed/` | Path in `assignment_submissions.completed_assessor_tool_path`. |
| Assignment attachments / tools | `storage/app/public/` | Paths in `assignments.attachments`, `assessor_tool`, `candidate_tool` (arrays of paths). |
| Student documents (ID, certificates) | `storage/app/public/` | Paths in `users` (e.g. `id_document_path`, `kcpe_certificate_path`, etc.). |
| Temp / merge PDFs | `storage/app/` (local disk) | Not persisted long-term. |

- **Config:** Default disks are in `config/filesystems.php`: `local` → `storage/app`, `public` → `storage/app/public`. The app uses `Storage::disk('public')` for most user and generated files.
- **URLs:** Public files are served via the `storage` link: `public/storage` → `storage/app/public`.

**Summary:** File **contents** are on **disk**; the **database** stores only **paths and metadata**. No document or media binary is stored in the database.

---

## 2. Scaling to ~10,000 users (institution-level)

The following keeps the same architecture (credentials in DB, files elsewhere) while making the system suitable for ~10,000 users.

### 2.1 File storage: move to object storage (S3 or compatible)

**Why:** Local disk does not scale across multiple app servers and is a single point of failure.

**What to do:**

- Use **object storage** (e.g. AWS S3 or S3-compatible) for all user and generated files.
- The project already defines an S3 disk in `config/filesystems.php`. In production, set `FILESYSTEM_DISK=s3` (or use a dedicated disk name) and configure AWS (or compatible) credentials in `.env`.
- **Application code:** Prefer a single disk for uploads and generated files (e.g. `Storage::disk(config('filesystems.default'))` or a named disk) so switching from `public` to S3 is a config change. Use `Storage::url($path)` for download/preview links so they work for both local and S3.
- **Database:** No change to where credentials or paths are stored; only the path values will point to S3 keys instead of local paths.

**Result:** Files scale independently, survive server loss, and can sit behind a CDN later.

---

### 2.2 User credentials at scale

- **Remain in the database.** No need to move passwords or tokens to another store for 10k users.
- **Recommendations:**
  - Use **connection pooling** (e.g. PgBouncer, ProxySQL) so many app instances do not exhaust DB connections.
  - Add a **read replica** for read-heavy operations (e.g. dashboards, reports); keep login and write operations on the primary.
  - Ensure **backups** and **access control** for the DB; consider encryption at rest and TLS in transit as per policy.
  - Keep **password hashing** as-is (Laravel default); ensure `password` and `remember_token` stay in `$hidden` and are never logged or exposed.

---

### 2.3 Application servers (horizontal scaling)

- Run **multiple stateless** Laravel instances behind a **load balancer** (e.g. 2–4 for ~10k users).
- **No local file storage** for user data; all files on S3 (or similar).
- **Sessions:** Use Redis or database, not file driver, so any instance can serve any user:
  - `SESSION_DRIVER=redis` (or `database`)
  - `CACHE_STORE=redis`
  - `QUEUE_CONNECTION=redis`

---

### 2.4 Database

- **Primary + read replica:** Writes and auth on primary; heavy reads (reports, progress, portfolios) on replica.
- **Indexes:** Ensure indexes on foreign keys and frequently filtered columns (e.g. `user_id`, `class_id`, `term_id`, `unit_id`, `assignment_id`, `status`, `created_at`).
- **Connection pooling:** Between app and DB to limit connections per instance.

---

### 2.5 Heavy work (PDFs, exports) in queues

- **PDF generation** (assessor tools, merged “units + practical checklists” PDF, etc.) should be dispatched to **queues** so HTTP requests return quickly.
- Use **Redis** (or another queue backend) and run **queue workers** (`php artisan queue:work`) on one or more servers.
- Options for “download PDF”: generate in a job and store result in S3 then redirect/link to it, or generate on first request and cache in S3 for repeated downloads.

---

### 2.6 Caching

- **Redis** for sessions, cache, and queues (as above).
- Cache expensive, mostly-read data (e.g. active term, aggregated stats) with short TTLs.
- Cache computed progress/statistics per user or class where appropriate to reduce DB load.

---

### 2.7 Infrastructure sketch for ~10k users

| Component       | Recommendation |
|----------------|----------------|
| **App**        | 2–4 Laravel instances behind a load balancer |
| **Web server** | Nginx (or similar) in front of PHP-FPM / Octane |
| **Database**   | 1 primary + 1 read replica; connection pooling |
| **Files**      | S3 (or compatible) for all documents and media |
| **Cache/queue**| Redis (single instance or small cluster) |
| **Queue workers** | 2–4 workers for PDF generation and other heavy jobs |

---

## 3. Quick reference

| Data | Stored in | Notes |
|------|-----------|--------|
| **User passwords** | Database (`users.password`) | Hashed only; never plain text. |
| **User email, role, profile** | Database (`users` + related) | Credentials and identity in DB. |
| **Documents / media files** | Filesystem (today: `storage/app/public`) | Scale by moving to S3; DB holds paths only. |
| **Sessions (at scale)** | Redis or database | Not file-based when using multiple app servers. |
| **Cache / queues (at scale)** | Redis | For performance and decoupling. |

---

## 4. Security reminders

- **Credentials:** Only hashed passwords and tokens in DB; never log or expose them.
- **Files:** Restrict S3 (or storage) buckets; use signed or private URLs if documents are not public.
- **Database:** Limit access, use strong credentials (in env, not code), backup regularly.
- **TLS:** Use HTTPS and TLS for DB connections in production.

This document should be updated when storage or auth architecture changes.
