2026-01-20 16:40:08 +00:00
|
|
|
# Job Scraper
|
2026-01-20 16:36:26 +00:00
|
|
|
|
2026-01-20 16:40:08 +00:00
|
|
|
Monitor job openings from privacy-focused and open-source companies. Runs daily and shows changes.
|
|
|
|
|
|
|
|
|
|
## Quick Start (Local)
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Create venv and install deps
|
|
|
|
|
python3 -m venv venv
|
|
|
|
|
source venv/bin/activate
|
|
|
|
|
pip install -r requirements.txt
|
|
|
|
|
|
|
|
|
|
# Run once
|
|
|
|
|
python main.py
|
|
|
|
|
|
|
|
|
|
# View dashboard
|
|
|
|
|
open data/dashboard.html
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Deploy to Debian Server
|
|
|
|
|
|
|
|
|
|
### 1. Install Docker
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Install Docker
|
|
|
|
|
curl -fsSL https://get.docker.com | sh
|
|
|
|
|
sudo usermod -aG docker $USER
|
|
|
|
|
# Log out and back in
|
|
|
|
|
|
|
|
|
|
# Install Docker Compose
|
|
|
|
|
sudo apt install docker-compose-plugin
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 2. Clone/Copy the project
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Copy project to server
|
|
|
|
|
scp -r job-scraper user@your-server:~/
|
|
|
|
|
|
|
|
|
|
# Or clone from git if you pushed it
|
|
|
|
|
git clone <your-repo> ~/job-scraper
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 3. Run with Docker Compose
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
cd ~/job-scraper
|
|
|
|
|
|
|
|
|
|
# Run scraper once to populate data
|
|
|
|
|
docker compose run --rm scraper
|
|
|
|
|
|
|
|
|
|
# Start dashboard + scheduled scraper
|
|
|
|
|
docker compose up -d scraper-scheduled dashboard
|
|
|
|
|
|
|
|
|
|
# View logs
|
|
|
|
|
docker compose logs -f
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 4. Access the dashboard
|
|
|
|
|
|
|
|
|
|
Open `http://your-server:8080` in your browser.
|
|
|
|
|
|
|
|
|
|
### Optional: Use a reverse proxy
|
|
|
|
|
|
|
|
|
|
If you want HTTPS or a custom domain, add nginx/caddy in front:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Example with Caddy (auto HTTPS)
|
|
|
|
|
sudo apt install caddy
|
|
|
|
|
echo "jobs.yourdomain.com {
|
|
|
|
|
reverse_proxy localhost:8080
|
|
|
|
|
}" | sudo tee /etc/caddy/Caddyfile
|
|
|
|
|
sudo systemctl reload caddy
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Commands
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Run scraper once
|
|
|
|
|
docker compose run --rm scraper
|
|
|
|
|
|
|
|
|
|
# Run scraper with schedule (daily 9 AM)
|
|
|
|
|
docker compose up -d scraper-scheduled
|
|
|
|
|
|
|
|
|
|
# Start web dashboard
|
|
|
|
|
docker compose up -d dashboard
|
|
|
|
|
|
|
|
|
|
# View all jobs
|
|
|
|
|
docker compose run --rm scraper python main.py --list
|
|
|
|
|
|
|
|
|
|
# Stop everything
|
|
|
|
|
docker compose down
|
|
|
|
|
|
|
|
|
|
# View logs
|
|
|
|
|
docker compose logs -f scraper-scheduled
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Configuration
|
|
|
|
|
|
|
|
|
|
Edit `config.yaml` to:
|
|
|
|
|
- Add/remove companies
|
|
|
|
|
- Change location filters
|
|
|
|
|
- Configure email/Slack notifications
|
|
|
|
|
|
|
|
|
|
## Dashboard Features
|
|
|
|
|
|
|
|
|
|
- Dark theme, monospace font
|
|
|
|
|
- Filter jobs by typing (press `/` to focus, `Esc` to clear)
|
|
|
|
|
- Color-coded tags: `remote`, `canada`, `berlin`
|
|
|
|
|
- Jump to company links
|
|
|
|
|
- Updates automatically when scraper runs
|
|
|
|
|
|
|
|
|
|
## Project Structure
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
job-scraper/
|
|
|
|
|
├── main.py # CLI entry point
|
|
|
|
|
├── db.py # SQLite database
|
|
|
|
|
├── dashboard.py # HTML generator
|
|
|
|
|
├── notify.py # Notifications
|
|
|
|
|
├── scrapers/ # Platform scrapers
|
|
|
|
|
│ ├── base.py # Base class
|
|
|
|
|
│ ├── greenhouse.py # Greenhouse API
|
|
|
|
|
│ ├── lever.py # Lever API
|
|
|
|
|
│ └── ashby.py # Ashby API
|
|
|
|
|
├── config.yaml # Company list & settings
|
|
|
|
|
├── Dockerfile
|
|
|
|
|
├── docker-compose.yaml
|
|
|
|
|
└── data/
|
|
|
|
|
├── jobs.db # SQLite database
|
|
|
|
|
└── dashboard.html # Generated dashboard
|
|
|
|
|
```
|