Skills Gap Toolkit

ESCO Semantic Database

The ESCO Semantic Database supports semantic search and skills matching within the Companies for Tomorrow (C4T) platform. The system integrates the ESCO dataset into a hybrid database environment combining relational storage and vector-based semantic search capabilities.

ESCO Dataset Acquisition

The process begins with the ESCO dataset, which contains structured information about:

Occupations
Skills
Relationships between occupations and skills

The complete ESCO database is used as the primary data source. The data are exported in CSV format, ensuring compatibility with the database system and allowing structured import.

Data Import into MariaDB

The exported ESCO CSV files are imported into MariaDB, where the data are stored in structured tables.

MariaDB serves as a hybrid database system, functioning both as:

A Relational Database Management System (RDBMS) for structured ESCO data tables
A Vector Database for storing semantic embeddings

The relational tables maintain the original ESCO structure, including occupations, skills, and their relationships.

Text Preparation

After importing the data, a text preparation process is performed.

For each ESCO entity (occupation or skill), the following fields are combined:

Preferred Label
Description

These fields are merged into a single text representation, which serves as the input for embedding generation.

This step ensures that both the title and semantic description of each entity are included in the vector representation.

Embeddings Generation

The merged text fields are processed using the embeddinggemma model to generate semantic embeddings.

Key characteristics of this step:

Each occupation and skill is converted into a numerical vector representation
The embeddings capture semantic meaning and conceptual similarity
The same embedding model must be consistently used
Changing the embedding model requires regeneration of all embeddings

The embeddings enable semantic comparison between skills and occupations.

Relational and Vector Storage

The system stores the processed data in two complementary forms:

Relational Database Storage

The original ESCO data are stored in relational tables, including:

Occupation tables
Skills tables
Relationship tables

This supports structured queries and filtering operations.

Vector Database Storage

The generated embeddings are stored in MariaDB using the vector data type supported by recent MariaDB versions.

This allows MariaDB to operate as a Vector Database, enabling efficient vector similarity calculations.

MariaDB Hybrid Vector Database

MariaDB operates as a combined relational and vector database, where:

Structured ESCO data support traditional database queries
Vector embeddings support semantic similarity queries

This hybrid architecture enables both structured and semantic access to skills and occupations data.

Semantic Search Engine

The stored embeddings are used by the Semantic Search Engine.

The search engine performs similarity search, allowing the system to identify conceptually related skills and occupations even when exact keywords do not match.

This enables:

Concept-based skill matching
Occupation similarity detection
Intelligent recommendations
Skills Passport functionality
Skills Gap analysis

Similarity Search (Concept-Based Matching)

The final stage of the process is semantic similarity search, where:

Query text is converted into embeddings
Vector similarity calculations are performed
The most relevant occupations or skills are identified

This produces concept-based matching results, allowing users to discover relevant skills and occupations based on meaning rather than exact wording.