Milvus向量数据库常见用法

Milvus是一种用于大规模相似度搜索和分析的开源向量数据库。它旨在提供高效的向量检索和快速的数据查询能力，适用于各种应用领域，包括图像和视频识别、自然语言处理、推荐系统等。

luxinfeng666

3706人浏览 · 2023-07-02 22:14:34

luxinfeng666 · 2023-07-02 22:14:34 发布

创建/断开客户端连接

from pymilvus import connections
# 创建连接
connections.connect(
  alias="default",
  user='username',
  password='password',
  host='localhost',
  port='19530'
)

# 断开连接
connections.disconnect("default")

管理Collection

创建Collection

# 定义Collection中的各个字段
fields = [
    FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
    FieldSchema(name="random", dtype=DataType.DOUBLE),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=1024)
]
# 创建Collection
schema = CollectionSchema(fields, "hello_milvus is the simplest demo to introduce the APIs")
hello_milvus = Collection("hello_milvus", schema)

主要参数：

Parameter	Description	Option
using (optional)	By specifying the server alias here, you can choose in which Milvus server you create a collection.	N/A
shards_num (optional)	Number of the shards for the collection to create.	[1,16]
num_partitions (optional)	Number of logical partitions for the collection to create.	[1,4096]
*kwargs: collection.ttl.seconds (optional)	Collection time to live (TTL) is the expiration time of a collection. Data in an expired collection will be cleaned up and will not be involved in searches or queries. Specify TTL in the unit of seconds.	The value should be 0 or greater. 0 means TTL is disabled.

重命名Collection

utility.rename_collection("old_collection", "new_collection") # Output: True

修改Collection属性

collection.set_properties(properties={"collection.ttl.seconds": 1800})

获取Collection各类属性

from pymilvus import Collection
collection = Collection("book")  # Get an existing collection.

collection.schema                # Return the schema.CollectionSchema of the collection.
collection.description           # Return the description of the collection.
collection.name                  # Return the name of the collection.
collection.is_empty              # Return the boolean value that indicates if the collection is empty.
collection.num_entities          # Return the number of entities in the collection.
collection.primary_field         # Return the schema.FieldSchema of the primary key field.
collection.partitions            # Return the list[Partition] object.
collection.indexes               # Return the list[Index] object.
collection.properties		# Return the expiration time of data in the collection.

删除一个集合（集合内的所有数据都被删除）

from pymilvus import utility
utility.drop_collection("book")

管理分区（Partitions）

使用分区可以更有效地组织和查询数据：我们可以将数据插入到特定的分区中，然后可以在查询时只查询和加载该分区，从而提高查询效率和减少资源占用。

创建分区

from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
collection.create_partition("novel")

判断分区是否存在

from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
collection.has_partition("novel")

删除分区（先释放再删除）

from pymilvus import Collection
collection.drop_partition("novel")

加载分区

from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
collection.load(["novel"], replica_number=2)

from pymilvus import Partition
partition = Partition("novel")       # Get an existing partition.
partition.load(replica_number=2)

释放分区

from pymilvus import Partition
partition = Partition("novel")       # Get an existing partition.
partition.release()

管理数据

插入数据

import random
data = [
  [i for i in range(2000)],
  [str(i) for i in range(2000)],
  [i for i in range(10000, 12000)],
  [[random.random() for _ in range(2)] for _ in range(2000)]
]

data.append([str("dy"*i) for i in range(2000)])

from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
mr = collection.insert(data)
collection.flush()

删除数据

expr = "book_id in [0,1]"
from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
collection.delete(expr)

管理索引

矢量索引是用于加速矢量相似性搜索的元数据的组织单元。如果没有基于向量构建的索引，Milvus将默认执行暴力搜索

创建矢量索引

index_params = {
  "metric_type":"L2",
  "index_type":"IVF_FLAT",
  "params":{"nlist":1024}
}

from pymilvus import Collection, utility
collection = Collection("book")      
collection.create_index(
  field_name="book_intro", 
  index_params=index_params
)

utility.index_building_progress("book")

创建标量索引

标量索引不需要设置索引类型以及索引参数，直接创建即可。

from pymilvus import Collection

collection = Collection("book")   
collection.create_index(
  field_name="book_name", 
  index_name="scalar_index",
)
collection.load()

删除索引

删除索引是删除该集合下的所有索引文件

from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
collection.drop_index()

搜索与查询

向量相似性搜索

Milvus中的向量相似度搜索会计算查询亮相与具有指定相似度度量的集合中的向量之间的距离，并返回最相似的结果。

from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
collection.drop_index()

search_params = {"metric_type": "L2", "params": {"nprobe": 10}, "offset": 5}

results = collection.search(
	data=[[0.1, 0.2]], 
	anns_field="book_intro", 
	param=search_params,
	limit=10, 
	expr=None,
	# set the names of the fields you want to retrieve from the search result.
	output_fields=['title'],
	consistency_level="Strong"
)

results[0].ids

results[0].distances

hit = results[0][0]
hit.entity.get('title')

# 搜索完成后，需要释放Milvus中加载的集合以减少内存消耗
collection.release()

必要的搜索参数

范围	描述
data	用于搜索的向量。
anns_field	要搜索的字段的名称。
param	特定于索引的搜索参数。有关详细信息，请参阅https://milvus.io/docs/index.md
offset	返回集中要跳过的结果数。该值与“limit”之和应小于 16384。
limit	要返回的最相似结果的数量。该值与“offset”之和应小于 16384。
expr	用于过滤属性的布尔表达式。有关详细信息，请参阅https://milvus.io/docs/boolean.md
partition_names（选修的）	要搜索的分区名称列表。
output_fields（选修的）	要返回的字段的名称。当前版本不支持矢量场。
timeout（选修的）	允许 RPC 的持续时间（以秒为单位）。当设置为 None 时，客户端会等待服务器响应或发生错误。
round_decimal（选修的）	返回距离的小数位数。
consistency_level（选修的）	搜索的一致性级别。

向量标量查询

from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
collection.load()

res = collection.query(
  expr = "book_id in [2,4,6,8]",
  offset = 0,
  limit = 10, 
  output_fields = ["book_id", "book_intro"],
  consistency_level="Strong"
)

腾讯云开发者社区

腾讯云面向开发者汇聚海量精品云计算使用和开发经验，营造开放的云计算技术生态圈。

更多推荐

cover

自动化提示词生成工具盘点

腾讯云开发者社区

cover

怎么用电脑兼职赚钱，普通人可做的6个副业项目（非常详细）零基础入门到精通，收藏这篇就够了

腾讯云开发者社区

cover

AI PPT免费使用技巧盘点：如何快速制作专业PPT？

腾讯云开发者社区

所有评论(0)

查看更多评论

luxinfeng666

已为社区贡献1条内容