Resume Matching Engine

The matching engine is the core intelligence of InnoHire.ai — moving beyond keyword scanning to semantically understand what a candidate has done and how well it maps to what a role demands.

Why Keyword Matching Fails

Traditional ATS systems match resumes by counting keyword occurrences. A resume with the word "Python" five times outscores a resume where Python is demonstrated through a complex distributed data pipeline — even though the latter candidate is clearly superior. InnoHire.ai solves this with semantic matching.

Stage 1 — Document Ingestion & Parsing

Supported input formats: PDF, DOCX, TXT, plain text paste.

The parser extracts the following structured fields from each resume:

Contact block — name, email, phone, LinkedIn, location
Work history — company, role, dates, bullet descriptions
Education — institution, degree, major, graduation year
Skills — hard skills, tools, languages, frameworks
Certifications — issuing body, credential name, validity
Projects — title, stack, description, impact metrics

A hybrid extraction approach is used: rule-based regex handles structured sections (phone numbers, dates, degree keywords), while a fine-tuned NER (Named Entity Recognition) model handles free-form sections like project descriptions and role responsibilities.

Stage 2 — Normalisation

Raw extracted data contains inconsistencies that would corrupt scoring. The normalisation stage resolves these:

Skill synonyms — "JS" → "JavaScript", "ML" → "Machine Learning", "k8s" → "Kubernetes"
Date standardisation — "Jan '22 – Present" → 2022-01-01 to today
Experience duration — computed in months per role and cumulatively per skill domain
Title normalisation — "Sr. SWE" → "Senior Software Engineer" for clean comparison

Stage 3 — Semantic Embedding

Both the job description and the normalised resume candidate summary are independently passed through a sentence-transformer model (based on the BERT architecture, fine-tuned on recruitment domain data).

Each document is encoded into a 768-dimensional dense vector that captures semantic meaning rather than surface-level tokens. The model understands that "built REST APIs in Node.js" and "developed backend services using Express" are semantically equivalent — even without shared keywords.

The job description embedding represents the ideal candidate profile. The resume embedding represents the candidate profile. The distance between them, measured via cosine similarity, forms the base match score.

Stage 4 — Cosine Similarity Scoring

Cosine similarity measures the angle between two vectors in high-dimensional space. A score of 1.0 means identical direction (perfect match); 0.0 means orthogonal (no relationship).

In practice, InnoHire.ai sees:

0.85–1.0 — Excellent match. Candidate likely covers >90% of role requirements.
0.70–0.85 — Strong match with some skill or experience gaps.
0.55–0.70 — Partial match. Notable gaps but transferable experience exists.
Below 0.55 — Poor alignment. Candidate is unlikely to be a strong fit.

Stage 5 — Gap Detection

After the base score is computed, a secondary module compares the job description's extracted required skills list against the candidate's normalised skill inventory. Missing skills are flagged as gaps with a gap severity score (critical vs. nice-to-have) based on how prominently they appear in the job description.

This gap list is surfaced to recruiters alongside the match score, and is used to generate targeted screening questions designed to probe whether the candidate can compensate for each gap.

Output Format

The engine returns a structured result object per candidate:

{
  "candidateId": "abc123",
  "overall_match_score": 84,
  "semantic_similarity": 0.81,
  "matched_skills": ["Python", "PostgreSQL", "FastAPI"],
  "missing_skills": ["Docker", "CI/CD"],
  "gap_severity": { "Docker": "critical", "CI/CD": "moderate" },
  "experience_years_relevant": 4.5,
  "screening_questions": [...],
  "boolean_search_string": "...",
  "linkedin_outreach_draft": "..."
}

Performance

Average processing time per resume: 1.4 seconds (end-to-end, including parse + embed + score + generate). Bulk processing of 100 resumes completes in under 90 seconds via the async job queue.