job-scraper/README.md
Bastian Gruber e8eb9d3fcf
Initial commit: Job scraper for privacy/open-source companies
- Scrapes job listings from Greenhouse, Lever, and Ashby platforms
- Tracks 14 companies (1Password, DuckDuckGo, GitLab, etc.)
- SQLite database for change detection
- Filters by engineering job titles and location preferences
- Generates static HTML dashboard with search/filter
- Docker support for deployment to Debian server
2026-01-20 12:40:33 -04:00

132 lines
2.7 KiB
Markdown

# Job Scraper
Monitor job openings from privacy-focused and open-source companies. Runs daily and shows changes.
## Quick Start (Local)
```bash
# Create venv and install deps
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run once
python main.py
# View dashboard
open data/dashboard.html
```
## Deploy to Debian Server
### 1. Install Docker
```bash
# Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Log out and back in
# Install Docker Compose
sudo apt install docker-compose-plugin
```
### 2. Clone/Copy the project
```bash
# Copy project to server
scp -r job-scraper user@your-server:~/
# Or clone from git if you pushed it
git clone <your-repo> ~/job-scraper
```
### 3. Run with Docker Compose
```bash
cd ~/job-scraper
# Run scraper once to populate data
docker compose run --rm scraper
# Start dashboard + scheduled scraper
docker compose up -d scraper-scheduled dashboard
# View logs
docker compose logs -f
```
### 4. Access the dashboard
Open `http://your-server:8080` in your browser.
### Optional: Use a reverse proxy
If you want HTTPS or a custom domain, add nginx/caddy in front:
```bash
# Example with Caddy (auto HTTPS)
sudo apt install caddy
echo "jobs.yourdomain.com {
reverse_proxy localhost:8080
}" | sudo tee /etc/caddy/Caddyfile
sudo systemctl reload caddy
```
## Commands
```bash
# Run scraper once
docker compose run --rm scraper
# Run scraper with schedule (daily 9 AM)
docker compose up -d scraper-scheduled
# Start web dashboard
docker compose up -d dashboard
# View all jobs
docker compose run --rm scraper python main.py --list
# Stop everything
docker compose down
# View logs
docker compose logs -f scraper-scheduled
```
## Configuration
Edit `config.yaml` to:
- Add/remove companies
- Change location filters
- Configure email/Slack notifications
## Dashboard Features
- Dark theme, monospace font
- Filter jobs by typing (press `/` to focus, `Esc` to clear)
- Color-coded tags: `remote`, `canada`, `berlin`
- Jump to company links
- Updates automatically when scraper runs
## Project Structure
```
job-scraper/
├── main.py # CLI entry point
├── db.py # SQLite database
├── dashboard.py # HTML generator
├── notify.py # Notifications
├── scrapers/ # Platform scrapers
│ ├── base.py # Base class
│ ├── greenhouse.py # Greenhouse API
│ ├── lever.py # Lever API
│ └── ashby.py # Ashby API
├── config.yaml # Company list & settings
├── Dockerfile
├── docker-compose.yaml
└── data/
├── jobs.db # SQLite database
└── dashboard.html # Generated dashboard
```