Full-text Search

MetaCroc logo

Full-text Search #

Overview #

The goal is to provide data engineers and business analysts with a feature to find metadata effectively. The search is built on two pillars: field relevance (where to look) and match quality (how to look).

Operators #

User can write part of searched word, the whole word or several words. Each word is included to algorithm. It is also possible touse following operators:

Operator Description
+term mandatory occurrence of a word
-term strict exclusion of a word
“term” searching of exact term

Scoring Algorithm (Ranking) #

The final score of an element is determined by the product of Metadata Weights and the Match Coefficient.

A. Metadata Field Weights #

Field Weight Meaning
Technical name 10 Primary identifier (e.g., STG_SALES_D).
Business name 8 Human-readable name for analysts.
Technical column name 2 Primary column identifier.
Business column name 2 Human-readable column name for analysts.
Description 5 Official documentation of the element’s purpose.
Comment 5 Informal notes and additions.
Path (Folder) 3 Contextual location within the project.
Responsible Person 2 Owner identification.

Match Quality Coefficients #

Match Type Coefficient Example (searching for “Acc”)
Exact Match 1.0 Acc
Prefix 0.7 Account
Infix 0.3 GL_Account

Multi-word Search Logic #

When multiple words are entered (e.g., Customer Invoice STG), the system calculates a score for each term separately and applies final modifiers:

Relevance Calculation Formula #

Search evaluation

  1. Completion Bonus: The ratio of found words to searched words. (Ensures that an element containing all words always ranks higher than an element with only one, even if that one is an exact match).
  2. Proximity Bonus (1.5x): A bonus applied if the searched words are located right next to each other in the text.
  3. Cross-field Search: Allows combining technical and business parameters (one word found in the technical name, another in the description).

Evaluation Example #

Search Query: Client Address

Element Found in Match Type Calculation Score
Table CLIENT_ADDR Tech. name Start-with 10 x 0.7 x 2 (words) 14.0
Table D_CLIENT Tech. name / Desc. Start / Exact (10 x 0.7) + (5 x 1.0) 12.0
API LOG_ADDR Tech. name Infix 10 x 0.3 x 0.5 (bonus) 1.5