← Back to Portfolio

RAG Job Search

Human Architected • AI-Augmented Development

AI-powered semantic job matching with RAG, vector search, and intelligent enrichment

3,500+ Lines of Code
85% Production Ready
$3-5 Monthly Cost
90% Cost Reduction

Project Overview

Built a production-ready AI job matching system that uses RAG (Retrieval-Augmented Generation), semantic search, and LLM-powered reasoning to intelligently match job seekers with AI/ML roles. The system goes beyond keyword matching by understanding context, culture fit, and market compensation.

Key Achievement: Optimized from $50-100/month to $3-5/month through smart caching, model selection (GPT-3.5-Turbo vs GPT-4), and selective LLM analysis - achieving 90% cost reduction while maintaining high-quality matches.

The Challenge

Traditional job search relies on keyword matching, which misses nuanced requirements like culture fit, work arrangement preferences, and market-competitive compensation. Needed an intelligent system that:

Architecture

RAG Pipeline Flow

1. Ingestion

Job postings → Text chunks → Vector embeddings

2. Retrieval

User profile → Similarity search → Top matches

3. Enrichment

Culture matching + Salary analysis

4. Reasoning

GPT-3.5 explains why jobs match

Key Components

Vector Database

ChromaDB for persistent vector storage with OpenAI embeddings (text-embedding-3-small)

LLM Orchestration

LangChain for RAG pipelines, embeddings, and LLM chains with GPT-3.5-Turbo

API Layer

FastAPI with async endpoints, lifespan management, and middleware for logging/monitoring

Automation

APScheduler for background job scraping (twice daily at 6am/6pm UTC)

Web Scraping

Playwright + BeautifulSoup for company career pages and job boards

Caching

Diskcache for 6-hour match result caching (90% cost reduction on repeat searches)

Core Features

🎯 Semantic Matching

Vector similarity search finds jobs based on meaning, not just keywords. Understands "ML Engineer" relates to "Machine Learning Scientist".

🧠 LLM Reasoning

GPT-3.5-Turbo analyzes top 5 matches and explains why each job fits the candidate's profile with specific highlights.

🎨 Culture Matching

Scores companies (0-1) based on alignment with user's culture preferences (Research-driven, Fast-paced, etc.).

💰 Salary Analysis

Compares listed compensation against market data from levels.fyi with variance analysis (above/below/competitive).

🔄 Automated Scraping

Background scheduler scrapes 13 sources (10 company sites + 3 job boards) twice daily, with auto-ingestion.

Smart Caching

MD5-based cache with 6-hour TTL. Identical searches hit cache instantly, reducing API costs by 90%.

🛡️ Rate Limiting

Protects against abuse with 20 searches per 15-min window. Tracks usage and enforces monthly cost limits.

📊 Production Logging

Structured JSON logs with request IDs, performance tracking, and daily rotation for debugging and monitoring.

Technical Deep Dive

Service-Based Architecture

Refactored from monolithic to service-oriented architecture with dependency injection for maintainability and testability:

# Service layer with protocol-based interfaces class ISearchService(Protocol): def search(self, query: str, k: int) -> List[Document]: ... def ingest(self, jobs: List[Job]) -> IngestionStats: ... class SearchService(BaseService): """Vector search using ChromaDB + OpenAI embeddings""" def __init__(self): self.embeddings = OpenAIEmbeddings(model="text-embedding-3-small") self.vectorstore = Chroma(persist_directory="./vectorstore") def search(self, query: str, k: int = 20) -> List[Document]: # Semantic similarity search return self.vectorstore.similarity_search(query, k=k) class JobMatcherService: """Main orchestration service""" def __init__(self, search: ISearchService, enrichment: IEnrichmentService): self.search = search self.enrichment = enrichment self.cache = CacheService(ttl=21600) # 6 hours

Cost Optimization Strategy

Achieved 90% cost reduction through multiple strategies:

# 1. Model Selection: GPT-3.5-Turbo (90% cheaper than GPT-4) llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.3) # 2. Selective Analysis: Only top 5 jobs get LLM reasoning top_jobs = ranked_jobs[:5] for job in top_jobs: reasoning = llm.invoke(analysis_prompt) # 3. Smart Caching: 6-hour TTL with MD5 hash keys cache_key = hashlib.md5(json.dumps(profile).encode()).hexdigest() if cached := cache.get(cache_key): return cached # 90% of searches hit cache # 4. Incremental Embeddings: Only embed new jobs existing_ids = set(vectorstore.get()['ids']) new_jobs = [j for j in jobs if j.id not in existing_ids]

Enrichment Pipeline

Culture and salary data enrichment with confidence scoring:

@lru_cache(maxsize=1) # Singleton pattern for data loading def load_culture_data() -> Dict: with open("data/company_culture.json") as f: return json.load(f) def match_culture(company: str, preferences: List[str]) -> CultureMatch: culture_db = load_culture_data() company_data = culture_db.get(company, {}) # Calculate trait alignment score matching_traits = set(preferences) & set(company_data.get("traits", [])) score = len(matching_traits) / len(preferences) if preferences else 0 return CultureMatch( score=score, matching_traits=list(matching_traits), confidence="High" if company in culture_db else "Low" )

Background Automation

Automated job discovery with scheduler integration:

scheduler = AsyncIOScheduler() @asynccontextmanager async def lifespan(app: FastAPI): # Startup: Initialize services and scheduler matcher = JobMatcherService() # Schedule scraping twice daily scheduler.add_job(scrape_and_ingest, CronTrigger(hour=6)) scheduler.add_job(scrape_and_ingest, CronTrigger(hour=18)) scheduler.start() yield # Shutdown: Cleanup scheduler.shutdown() matcher.cleanup()

Production Readiness

Error Handling & Monitoring

Retry Logic & Resilience

@retry( stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10), retry=retry_if_exception_type((RateLimitError, TimeoutError)) ) def fetch_with_retry(url: str) -> str: # Exponential backoff for API failures response = requests.get(url, timeout=10) response.raise_for_status() return response.text

Configuration Management

Skills Demonstrated

AI/ML Architecture

RAG pipeline design, vector databases, embedding strategies, LLM orchestration

Backend Development

FastAPI, async/await patterns, dependency injection, service-oriented architecture

Data Engineering

ETL pipelines, web scraping, data enrichment, incremental updates

Cost Optimization

Smart caching, model selection, selective processing, rate limiting

Production Engineering

Structured logging, error handling, monitoring, performance optimization

System Design

Scalable architecture, background jobs, lifespan management, clean code patterns