Semantic Pipeline Introduction
Category: Semantic Pipeline
Discover how the Semantic Pipeline enhances your search experience by enriching metadata, recognizing entities, and transforming data for smarter, more intuitive results!
The Semantic Pipeline
The Semantic Pipeline enriches and enhances the documents in your index. You can extract additional metadata using vocabularies, patterns, and the Property Expression Language. It allows plugins to find out further information.
Core Features
Out of the box, you have the ability to:
- Translate metadata values
- Label documents using text classification
- Call LLMs
- Much more
Precomputed Synthesized Metadata
One of the most powerful features in the Semantic Pipeline is Precomputed Synthesized Metadata. For instance:
- A numeric value like a file size can be categorized into predefined groups, simplifying usage and enhancing accessibility.
- A flat filter facet, such as a URL, can be transformed into a hierarchical navigation system, making exploration easier.
Entity Recognition
Entity Recognition empowers us to extract product names from unstructured and semi-structured documents. All identified product names are extracted and organized into a metadata field called ProductName. Additionally, Entity Recognition provides precise matching locations.
For example, by opening a brochure in the preview, we can see exactly where each product name has been identified.
CSV Transformation
CSV Transformation allows us to attach columns from a CSV file to results from Mindbreeze. For example, we can seamlessly attach the corresponding role from the CSV file for each result to the related author.
After applying CSV Transformation, the role column is now available and enriched by the CSV data, alongside the author column. We can test this by selecting Educator, which should correspond to Keenan Whitney. Filtering by Educator now displays all documents created by Keenan Whitney.
Item Transformation
Item Transformation allows us to group files by type efficiently. By default, file types are presented as a simple list. However, we can easily organize and interact with files using the new groups configuration.
For example, clicking Documents reveals all the file types under this category. The same applies to other types, such as Spreadsheets, providing a more streamlined and intuitive experience.
Language Detection
The Language Detector automatically identifies the language of results by analyzing their content. No metadata is required within the file to indicate its language to Mindbreeze.
For example:
- Selecting "de" displays German results.
- Selecting "fr" brings up French results.
- Selecting "en" provides English results.
Precomputed Synthesized Metadata and Property Expression Language
Precomputed Synthesized Metadata is among the most powerful tools within the Semantic Pipeline. It has its own scripting language, the Property Expression Language, enabling us to manipulate and generate metadata through user-defined expressions, fully calculated during index reinversion.
Entity Recognition Using Regex
Entity Recognition enables information extraction from semi-structured and unstructured data using Regex rules. For example, using the SM87A microphone as a product name:
- The first Entity Recognition rule identifies and extracts one or more letters as complete words.
- The second rule focuses on numbers, extracting one or more digits.
- An optional suffix of one or more letters at the end is also recognized.
When combined, these rules accurately extract the product name and store it in a new metadata field labeled ProductName.
CSV Transformation in Action
CSV Transformation enriches results within a Mindbreeze InSpire index. By mapping the Author field to the corresponding Name column in the CSV file, we link each author to their corresponding row in the CSV. This grants access to additional information, such as their role.
Item Transformers
Item Transformers are a series of plugins that are typically more complex and tailored to specific tasks. Examples include:
- Language Detection plug-in
- I18n Translation plug-in
Semantic Processing
Semantic Processing includes powerful features such as:
- Language detection
- Named Entity Recognition
Execution Phases in the Semantic Pipeline
The phases are executed in sequence:
- If you create metadata in the Entity Recognition phase, it will be available in the CSV Transformation phase.
- However, metadata created in the CSV Transformation phase cannot be used during Entity Recognition.
Index Reinvert and Reindex
The Semantic Pipeline operates on documents in the index during the inversion phase. Every document goes through this phase when inserted or updated.
That means changes to settings are not automatically applied to documents already in the index before the change. To apply new settings to all documents:
- Perform a reinvert, which processes all documents again.
- Alternatively, perform a reindex, which downloads all documents from the data source again (this takes longer).
A reinvert is also necessary when making metadata available for filtering or grouping results.
Advanced Filtering and Grouping
Only metadata that is aggregatable is available for advanced filtering, grouping, sorting, and mapping.
Related Links
Related Tutorials
Similarity Search - Setup
Learn to activate Similarity Search in Mindbreeze to deliver smarter, context-aware results and enhance your search capabilities with natural language
Setup an Index for Your Results
Learn to set up and configure your first Mindbreeze index with this step-by-step tutorial, empowering you to create efficient, tailored search experie