Hybrid Search - Re-ranking and Blending Searches
On this page
Hybrid Search combines multiple search methods in one query.
Example 1 provides a simple example of how hybrid search can combine full-text search, vector search, and a SQL filter.
Example 1 - Hybrid Search of Comments
This example demonstrates a hybrid search over a comments table.
Consider a table of comments that contains:
-
id
– anINT
id -
comment
– aTEXT
field which stores the text of the comment -
comment_
– aembedding VECTOR
, which stores a vector embedding that captures the meaning of the comment as a vector, as described in Working with Vector Data. -
category
– aVARCHAR
which stores a category assigned to the comment
Create the comments
table and insert three rows into that table.
CREATE TABLE comments(id INT,comment TEXT,comment_embedding VECTOR(4) not null,category VARCHAR(256));
INSERT INTO comments VALUES(1, "The cafeteria in building 35 has a great salad bar",'[0.45, 0.55, 0.495, 0.5]',"Food"),(2, "I love the taco bar in the B16 cafeteria.",'[0.01111, 0.01111, 0.1, 0.999]',"Food"),(3, "The B24 restaurant salad bar is quite good.",'[0.1, 0.8, 0.2, 0.555]',"Food");
Add a full-text index to the table and run OPTIMIZE TABLE
to ensure the values above are indexed.
ALTER TABLE comments ADD FULLTEXT ft_comment(comment);OPTIMIZE TABLE comments FULL;
The SQL query below first creates a query vector (@query_
).
The @query_
is cast from a VECTOR
to a BLOB
for performance reasons.
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4):>BLOB;SELECT id, SUBSTRING(comment,0,20) AS comment,FORMAT(@query_vec <*> comment_embedding,4) AS vec_score,FORMAT(MATCH(comment) AGAINST ("restaurant"),4) AS ft_score,FORMAT((vec_score + ft_score) / 2,4) AS combined_scoreFROM commentsWHERE category = "Food"ORDER BY combined_score DESCLIMIT 2;
+------+----------------------+-----------+----------+----------------+
| id | comment | vec_score | ft_score | combined_score |
+------+----------------------+-----------+----------+----------------+
| 3 | The B24 restaurant s | 0.8993 | 0.1151 | 0.5072 |
| 1 | The cafeteria in bui | 0.9810 | 0.0000 | 0.4905 |
+------+----------------------+-----------+----------+----------------+
In the results, the row with id 3 has the word "restaurant" in it, so it gets a boost from the full-text score in this query and is the top-ranked result.
Also note that in this query, the comment string and the scores have been truncated for readability of the result.
The availability of vector search, full-text search, and standard SQL operations in SingleStore allows you to combine them as shown above.
Example 2 - Hybrid Search of Wikipedia Articles
SingleStore provides full-text search (Working with Full-Text Search) and Vector Indexing which can be used to speed up a hybrid search.
SingleStore has created a dataset of Wikipedia articles.
The example below uses just the data for video games, which is approximately 41,000 vectors based on 1,800 articles from Wikipedia.
Create a table to hold the vectors and paragraphs.
CREATE TABLE vecs(id BIGINT(20),url TEXT DEFAULT NULL,paragraph TEXT DEFAULT NULL,v VECTOR(1536) NOT NULL,SHARD KEY(id), KEY(id) USING HASH);
Load data into this table using the following pipeline.
-- since the bucket is open, you can leave the credentials clause as-isCREATE OR REPLACE PIPELINE wiki_pipeline ASload data S3's3://wikipedia-video-game-data/video-game-embeddings(1).csv'config '{"region":"us-west-1"}'credentials '{"aws_access_key_id": "", "aws_secret_access_key": ""}'SKIP DUPLICATE KEY ERRORSINTO TABLE vecsFORMAT csvFIELDS TERMINATED BY ','ENCLOSED BY '"'LINES TERMINATED BY '\r\n';START PIPELINE wiki_pipeline FOREGROUND;
Run full-text and vector indexes on the table.
ALTER TABLE vecs ADD FULLTEXT ft_para(paragraph);ALTER TABLE vecs ADD VECTOR INDEX ivf_v(v)INDEX_OPTIONS '{"index_type":"IVF_FLAT"}';
Run the following command to optimize the index searches and to ensure the values above are indexed.
OPTIMIZE TABLE vecs FULL;
A hybrid search query that uses both full-text and vector indexes can be constructed this way:
-
Create a subquery that uses vector search
-
Create a subquery that uses keyword search for similar content
-
Join the two with a full outer join
-
Produce a final result that combines the vector search score and full-text search score, and re-ranks
The query below first creates a query vector (@v_
) and then follows this outline and does an indexed hybrid vector and full-text search.
/* Get the vector for the first paragraph about the Mario Kart Game.It's a good semantic query vector for Mario Kart. */SET @v_mario_kart = (SELECT v FROM vecsWHERE URL = "https://en.wikipedia.org/wiki/Super_Mario_Kart"ORDER BY id LIMIT 1);WITH fts AS (SELECT id, paragraph,MATCH(paragraph) AGAINST("mario kart") AS SCOREFROM vecsWHERE MATCH(paragraph) AGAINST("mario kart")ORDER BY SCORE descLIMIT 200),vs AS (SELECT id, paragraph, v <*> @v_mario_kart AS SCOREFROM vecsORDER BY score DESCLIMIT 200)SELECT vs.id, SUBSTRING(vs.paragraph,0,25),FORMAT(IFNULL(fts.score, 0) * .3+ IFNULL(vs.score, 0) * .7, 4) AS score,FORMAT(fts.score, 4) AS fts_s,FORMAT(vs.score, 4) AS vs_sFROM fts FULL OUTER JOIN vs ON fts.id = vs.idORDER BY score DESCLIMIT 5;
+------------------+------------------------------+--------+--------+--------+
| id | SUBSTRING(vs.paragraph,0,25) | score | fts_s | vs_s |
+------------------+------------------------------+--------+--------+--------+
| 1125899906848349 | Super Mario Kart is a kar | 0.9943 | 0.9811 | 1.0000 |
| 1125899906848372 | Nintendo re-released Supe | 0.9353 | 1.0000 | 0.9075 |
| 2251799813690085 | Mario Kart DS was the bes | 0.9239 | 1.0000 | 0.8912 |
| 1125899906848375 | Several sequels to Super | 0.9144 | 0.8918 | 0.9240 |
| 1125899906855047 | Mario Kart: Super Circuit | 0.9130 | 0.9521 | 0.8962 |
+------------------+------------------------------+--------+--------+--------+
The final ranked result (by score
) is different from the order for the full-text score (fts_
) or the vector similarity score (vs_
).
SQL is used here in a relatively compact way.
For more details on this example and performance results see our blog on ANN search.
Related Topics
Last modified: April 26, 2024