Search Test
In the Search tab of the Knowledge detail screen, you can directly search collected document chunks to validate the Knowledge's quality. Unlike AI Chat, it does not use an LLM, so you can see pure search engine results only. This allows you to check in advance whether the chunking strategy is appropriate and whether the desired information is being retrieved correctly.
Search testing indirectly determines the quality of AI Chat answers. In the RAG pipeline, the retrieval stage results must be good for the LLM to generate accurate answers. Therefore, it is recommended to perform search testing before using AI Chat after collecting documents.
Search Modes
Three search modes are supported, each using a different search algorithm.
| Mode | Engine | Description | Best For |
|---|---|---|---|
| VECTOR | Vector DB | Searches for semantically similar chunks based on cosine similarity between embedding vectors. | Conceptual questions, cases with many synonyms/similar expressions |
| TEXT | Text search engine (BM25) | Returns precise keyword matching results based on keyword frequency and document frequency. | Proper nouns, code names, exact term searches |
| HYBRID | Vector DB + Text search | Performs VECTOR and TEXT searches in parallel, then merges results with Reciprocal Rank Fusion (RRF). | Most general searches |
Available search modes depend on the storage targets set when creating the Knowledge. TEXT mode cannot be used for a Knowledge with only the VECTOR target configured, and vice versa. To use HYBRID mode, both VECTOR and TEXT targets must be enabled.
Search Parameters
The following parameters can be adjusted alongside the search query for fine-grained control over results.
Top-K
Specifies the maximum number of results to return.
| Item | Value |
|---|---|
| Default | 10 |
| Range | 1 ~ 100 |
Higher values allow you to see more results, but lower-ranked results may have less relevance.
Score Threshold
Sets the minimum similarity score (0.0 ~ 1.0) for results to be included. Results below this score are filtered out and not displayed.
| Item | Value |
|---|---|
| Default | 0.5 |
| Range | 0.0 ~ 1.0 |
Setting the Score Threshold too high may filter out relevant results as well. It's recommended to start with a range of 0.3~0.5 and gradually increase it if there are too many irrelevant results.
Document Filter
Limits the search scope to specific documents only. Check the desired documents, and only chunks from those documents will be searched. If no filter is set, all documents within the Knowledge are searched.
Search Results Screen
When a search is executed, results are displayed sorted by relevance. Each result item includes the following information.
Result Item Structure
| Item | Description |
|---|---|
| Rank Number | Rank by relevance (starting from 1) |
| Document Name | The name of the original document the chunk belongs to |
| Type Tag | Document source type (web crawling, file upload, manual input) |
| Similarity Score | Relevance score to the search query (percentage format, %) |
| Chunk Content | Text content of the matched chunk (search terms are highlighted) |
| Metadata | Additional information such as section name, Chunk ID |
Result Summary
The total result count is displayed at the top of search results. Results filtered by Score Threshold are not included in the total count.
Search Quality Improvement Guide
Approaches you can try when search test results don't meet expectations.
When Expected Results Don't Appear
| Cause | Solution |
|---|---|
| Chunks are too large | Reduce chunk size for finer granularity |
| Chunks are too small | Increase chunk size to include sufficient context |
| Insufficient keyword matching | Switch to TEXT mode for precise keyword search |
| Insufficient semantic similarity | Switch to VECTOR mode for synonym/similar expression search |
| Relevant documents not collected | Collect additional documents containing the information |
Comparing by Search Mode
Running the same query in all three modes and comparing results helps understand the search characteristics of the current Knowledge.
- VECTOR mode: Check if semantically related chunks are being retrieved correctly
- TEXT mode: Check if chunks containing key keywords are being retrieved without omissions
- HYBRID mode: Check if results from both modes are properly merged and showing optimal rankings
Use search testing to verify whether your chunking strategy is appropriate. If expected results don't appear, refer to the Chunking and Options document to adjust chunking settings.
Setting the Score Threshold too high may filter out relevant results. Starting with a range of 0.3~0.5 is recommended. VECTOR mode tends to have score distributions concentrated between 0.5~0.9 due to cosine similarity characteristics, while TEXT mode may have different distributions due to BM25 normalized scores, so appropriate thresholds may differ by mode.
Relationship Between Search Test and AI Chat
Search test results are directly linked to AI Chat answer quality.
| Search Test Result | Impact on AI Chat |
|---|---|
| Relevant chunks rank at the top | LLM receives accurate context and generates quality answers |
| Relevant chunks are not retrieved | LLM answers without context, resulting in inaccurate or "information not found" responses |
| Irrelevant chunks rank at the top | LLM may reference wrong context and generate incorrect answers |
If answer quality is poor in AI Chat, check whether retrieval stage results are adequate through search testing before changing LLM settings. Most RAG quality issues originate from the retrieval stage.
Next Steps
- AI Chat — Once search quality is confirmed, utilize knowledge through RAG-based AI chat
- Chunking and Options — Adjust chunking strategies that directly affect search quality
- Settings — Change storage targets and embedding models