AI-powered semantic job matching with RAG, vector search, and intelligent enrichment
Built a production-ready AI job matching system that uses RAG (Retrieval-Augmented Generation), semantic search, and LLM-powered reasoning to intelligently match job seekers with AI/ML roles. The system goes beyond keyword matching by understanding context, culture fit, and market compensation.
Key Achievement: Optimized from $50-100/month to $3-5/month through smart caching, model selection (GPT-3.5-Turbo vs GPT-4), and selective LLM analysis - achieving 90% cost reduction while maintaining high-quality matches.
Traditional job search relies on keyword matching, which misses nuanced requirements like culture fit, work arrangement preferences, and market-competitive compensation. Needed an intelligent system that:
Job postings → Text chunks → Vector embeddings
User profile → Similarity search → Top matches
Culture matching + Salary analysis
GPT-3.5 explains why jobs match
ChromaDB for persistent vector storage with OpenAI embeddings (text-embedding-3-small)
LangChain for RAG pipelines, embeddings, and LLM chains with GPT-3.5-Turbo
FastAPI with async endpoints, lifespan management, and middleware for logging/monitoring
APScheduler for background job scraping (twice daily at 6am/6pm UTC)
Playwright + BeautifulSoup for company career pages and job boards
Diskcache for 6-hour match result caching (90% cost reduction on repeat searches)
Vector similarity search finds jobs based on meaning, not just keywords. Understands "ML Engineer" relates to "Machine Learning Scientist".
GPT-3.5-Turbo analyzes top 5 matches and explains why each job fits the candidate's profile with specific highlights.
Scores companies (0-1) based on alignment with user's culture preferences (Research-driven, Fast-paced, etc.).
Compares listed compensation against market data from levels.fyi with variance analysis (above/below/competitive).
Background scheduler scrapes 13 sources (10 company sites + 3 job boards) twice daily, with auto-ingestion.
MD5-based cache with 6-hour TTL. Identical searches hit cache instantly, reducing API costs by 90%.
Protects against abuse with 20 searches per 15-min window. Tracks usage and enforces monthly cost limits.
Structured JSON logs with request IDs, performance tracking, and daily rotation for debugging and monitoring.
Refactored from monolithic to service-oriented architecture with dependency injection for maintainability and testability:
class ISearchService(Protocol):
def search(self, query: str, k: int) -> List[Document]: ...
def ingest(self, jobs: List[Job]) -> IngestionStats: ...
class SearchService(BaseService):
"""Vector search using ChromaDB + OpenAI embeddings"""
def __init__(self):
self.embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
self.vectorstore = Chroma(persist_directory="./vectorstore")
def search(self, query: str, k: int = 20) -> List[Document]:
# Semantic similarity search
return self.vectorstore.similarity_search(query, k=k)
class JobMatcherService:
"""Main orchestration service"""
def __init__(self, search: ISearchService, enrichment: IEnrichmentService):
self.search = search
self.enrichment = enrichment
self.cache = CacheService(ttl=21600) # 6 hours
Achieved 90% cost reduction through multiple strategies:
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.3)
# 2. Selective Analysis: Only top 5 jobs get LLM reasoning
top_jobs = ranked_jobs[:5]
for job in top_jobs:
reasoning = llm.invoke(analysis_prompt)
# 3. Smart Caching: 6-hour TTL with MD5 hash keys
cache_key = hashlib.md5(json.dumps(profile).encode()).hexdigest()
if cached := cache.get(cache_key):
return cached # 90% of searches hit cache
# 4. Incremental Embeddings: Only embed new jobs
existing_ids = set(vectorstore.get()['ids'])
new_jobs = [j for j in jobs if j.id not in existing_ids]
Culture and salary data enrichment with confidence scoring:
@lru_cache(maxsize=1) # Singleton pattern for data loading
def load_culture_data() -> Dict:
with open("data/company_culture.json") as f:
return json.load(f)
def match_culture(company: str, preferences: List[str]) -> CultureMatch:
culture_db = load_culture_data()
company_data = culture_db.get(company, {})
# Calculate trait alignment score
matching_traits = set(preferences) & set(company_data.get("traits", []))
score = len(matching_traits) / len(preferences) if preferences else 0
return CultureMatch(
score=score,
matching_traits=list(matching_traits),
confidence="High" if company in culture_db else "Low"
)
Automated job discovery with scheduler integration:
scheduler = AsyncIOScheduler()
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup: Initialize services and scheduler
matcher = JobMatcherService()
# Schedule scraping twice daily
scheduler.add_job(scrape_and_ingest, CronTrigger(hour=6))
scheduler.add_job(scrape_and_ingest, CronTrigger(hour=18))
scheduler.start()
yield
# Shutdown: Cleanup
scheduler.shutdown()
matcher.cleanup()
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(min=1, max=10),
retry=retry_if_exception_type((RateLimitError, TimeoutError))
)
def fetch_with_retry(url: str) -> str:
# Exponential backoff for API failures
response = requests.get(url, timeout=10)
response.raise_for_status()
return response.text
RAG pipeline design, vector databases, embedding strategies, LLM orchestration
FastAPI, async/await patterns, dependency injection, service-oriented architecture
ETL pipelines, web scraping, data enrichment, incremental updates
Smart caching, model selection, selective processing, rate limiting
Structured logging, error handling, monitoring, performance optimization
Scalable architecture, background jobs, lifespan management, clean code patterns