Skip to main content
Version: v0.1.0

Search Test

In the Search tab of the Knowledge detail screen, you can directly search collected document chunks to validate the Knowledge's quality. Unlike AI Chat, it does not use an LLM, so you can see pure search engine results only. This allows you to check in advance whether the chunking strategy is appropriate and whether the desired information is being retrieved correctly.

Search testing indirectly determines the quality of AI Chat answers. In the RAG pipeline, the retrieval stage results must be good for the LLM to generate accurate answers. Therefore, it is recommended to perform search testing before using AI Chat after collecting documents.


Search Modes

Three search modes are supported, each using a different search algorithm.

ModeEngineDescriptionBest For
VECTORVector DBSearches for semantically similar chunks based on cosine similarity between embedding vectors.Conceptual questions, cases with many synonyms/similar expressions
TEXTText search engine (BM25)Returns precise keyword matching results based on keyword frequency and document frequency.Proper nouns, code names, exact term searches
HYBRIDVector DB + Text searchPerforms VECTOR and TEXT searches in parallel, then merges results with Reciprocal Rank Fusion (RRF).Most general searches
Search Modes and Storage Targets

Available search modes depend on the storage targets set when creating the Knowledge. TEXT mode cannot be used for a Knowledge with only the VECTOR target configured, and vice versa. To use HYBRID mode, both VECTOR and TEXT targets must be enabled.


Search Parameters

The following parameters can be adjusted alongside the search query for fine-grained control over results.

Top-K

Specifies the maximum number of results to return.

ItemValue
Default10
Range1 ~ 100

Higher values allow you to see more results, but lower-ranked results may have less relevance.

Score Threshold

Sets the minimum similarity score (0.0 ~ 1.0) for results to be included. Results below this score are filtered out and not displayed.

ItemValue
Default0.5
Range0.0 ~ 1.0
tip

Setting the Score Threshold too high may filter out relevant results as well. It's recommended to start with a range of 0.3~0.5 and gradually increase it if there are too many irrelevant results.

Document Filter

Limits the search scope to specific documents only. Check the desired documents, and only chunks from those documents will be searched. If no filter is set, all documents within the Knowledge are searched.


Search Results Screen

When a search is executed, results are displayed sorted by relevance. Each result item includes the following information.

Result Item Structure

ItemDescription
Rank NumberRank by relevance (starting from 1)
Document NameThe name of the original document the chunk belongs to
Type TagDocument source type (web crawling, file upload, manual input)
Similarity ScoreRelevance score to the search query (percentage format, %)
Chunk ContentText content of the matched chunk (search terms are highlighted)
MetadataAdditional information such as section name, Chunk ID

Result Summary

The total result count is displayed at the top of search results. Results filtered by Score Threshold are not included in the total count.


Search Quality Improvement Guide

Approaches you can try when search test results don't meet expectations.

When Expected Results Don't Appear

CauseSolution
Chunks are too largeReduce chunk size for finer granularity
Chunks are too smallIncrease chunk size to include sufficient context
Insufficient keyword matchingSwitch to TEXT mode for precise keyword search
Insufficient semantic similaritySwitch to VECTOR mode for synonym/similar expression search
Relevant documents not collectedCollect additional documents containing the information

Comparing by Search Mode

Running the same query in all three modes and comparing results helps understand the search characteristics of the current Knowledge.

  1. VECTOR mode: Check if semantically related chunks are being retrieved correctly
  2. TEXT mode: Check if chunks containing key keywords are being retrieved without omissions
  3. HYBRID mode: Check if results from both modes are properly merged and showing optimal rankings
tip

Use search testing to verify whether your chunking strategy is appropriate. If expected results don't appear, refer to the Chunking and Options document to adjust chunking settings.

Score Threshold Setting Guidelines

Setting the Score Threshold too high may filter out relevant results. Starting with a range of 0.3~0.5 is recommended. VECTOR mode tends to have score distributions concentrated between 0.5~0.9 due to cosine similarity characteristics, while TEXT mode may have different distributions due to BM25 normalized scores, so appropriate thresholds may differ by mode.


Relationship Between Search Test and AI Chat

Search test results are directly linked to AI Chat answer quality.

Search Test ResultImpact on AI Chat
Relevant chunks rank at the topLLM receives accurate context and generates quality answers
Relevant chunks are not retrievedLLM answers without context, resulting in inaccurate or "information not found" responses
Irrelevant chunks rank at the topLLM may reference wrong context and generate incorrect answers
warning

If answer quality is poor in AI Chat, check whether retrieval stage results are adequate through search testing before changing LLM settings. Most RAG quality issues originate from the retrieval stage.


Next Steps

  • AI Chat — Once search quality is confirmed, utilize knowledge through RAG-based AI chat
  • Chunking and Options — Adjust chunking strategies that directly affect search quality
  • Settings — Change storage targets and embedding models