Skills Gap Toolkit
ESCO Semantic Database
The ESCO Semantic Database supports semantic search and skills matching within the Companies for Tomorrow (C4T) platform. The system integrates the ESCO dataset into a hybrid database environment combining relational storage and vector-based semantic search capabilities.
- ESCO Dataset Acquisition
The process begins with the ESCO dataset, which contains structured information about:
- Occupations
- Skills
- Relationships between occupations and skills
The complete ESCO database is used as the primary data source. The data are exported in CSV format, ensuring compatibility with the database system and allowing structured import.
- Data Import into MariaDB
The exported ESCO CSV files are imported into MariaDB, where the data are stored in structured tables.
MariaDB serves as a hybrid database system, functioning both as:
- A Relational Database Management System (RDBMS) for structured ESCO data tables
- A Vector Database for storing semantic embeddings
The relational tables maintain the original ESCO structure, including occupations, skills, and their relationships.
- Text Preparation
After importing the data, a text preparation process is performed.
For each ESCO entity (occupation or skill), the following fields are combined:
- Preferred Label
- Description
These fields are merged into a single text representation, which serves as the input for embedding generation.
This step ensures that both the title and semantic description of each entity are included in the vector representation.
- Embeddings Generation
The merged text fields are processed using the embeddinggemma model to generate semantic embeddings.
Key characteristics of this step:
- Each occupation and skill is converted into a numerical vector representation
- The embeddings capture semantic meaning and conceptual similarity
- The same embedding model must be consistently used
- Changing the embedding model requires regeneration of all embeddings
The embeddings enable semantic comparison between skills and occupations.
- Relational and Vector Storage
The system stores the processed data in two complementary forms:
Relational Database Storage
The original ESCO data are stored in relational tables, including:
- Occupation tables
- Skills tables
- Relationship tables
This supports structured queries and filtering operations.
Vector Database Storage
The generated embeddings are stored in MariaDB using the vector data type supported by recent MariaDB versions.
This allows MariaDB to operate as a Vector Database, enabling efficient vector similarity calculations.
- MariaDB Hybrid Vector Database
MariaDB operates as a combined relational and vector database, where:
- Structured ESCO data support traditional database queries
- Vector embeddings support semantic similarity queries
This hybrid architecture enables both structured and semantic access to skills and occupations data.
- Semantic Search Engine
The stored embeddings are used by the Semantic Search Engine.
The search engine performs similarity search, allowing the system to identify conceptually related skills and occupations even when exact keywords do not match.
This enables:
- Concept-based skill matching
- Occupation similarity detection
- Intelligent recommendations
- Skills Passport functionality
- Skills Gap analysis
- Similarity Search (Concept-Based Matching)
The final stage of the process is semantic similarity search, where:
- Query text is converted into embeddings
- Vector similarity calculations are performed
- The most relevant occupations or skills are identified
This produces concept-based matching results, allowing users to discover relevant skills and occupations based on meaning rather than exact wording.


