spring-ai-alibaba 学习之 Jmanus(二)——操作浏览器
本文介绍了BrowserUseTool类的功能实现,这是一个基于Playwright的浏览器自动化工具。该类继承AbstractBaseTool并实现ToolCallBiFunctionDef接口,核心功能包括run方法执行操作和getCurrentToolStateString获取状态。支持15种浏览器操作,如导航、点击、输入文本、截图等。通过DriverWrapper封装Playwright实
前一篇分析了 Jmanus 的 /api/executor/executeByToolNameAsync 接口,该接口主要用来异步执行业务逻辑
其结果的查询由 /api/executor/details/{planId} 接口完成,该接口主要在 plan_execution_record 表中查询执行结果
BrowserUseTool
上一篇执行查询阿里巴巴股价的例子中,主要用到了浏览器工具,也就是 BrowserUseTool
该类继承了 AbstractBaseTool 类,而 AbstractBaseTool 实现了 ToolCallBiFunctionDef 接口
BrowserUseTool 中有两个重要的方法,一个是 run 方法,一个是 getCurrentToolStateString
- run:该方法对应 ReAct 模式中的行动,负责执行具体操作;
- getCurrentToolStateString:该方法对应 ReAct 模式中的观察,负责获取行动执行后的结果
方法逻辑
run
根据 action 的不同,会进入不同分支,支持的 action 如下
Supported operations include:
- 'navigate': Visit specific URL
- 'click': Click element by index
- 'input_text': Input text in element
- 'key_enter': Press Enter key
- 'screenshot': Capture screenshot
- 'get_html': Get HTML content of current page
- 'get_text': Get text content of current page
- 'execute_js': Execute JavaScript code
- 'scroll': Scroll page up/down
- 'refresh': Refresh current page
- 'new_tab': Open new tab
- 'close_tab': Close current tab
- 'switch_tab': Switch to specific tab
- 'get_element_position_by_name': Get element position by name
- 'move_to_and_click': Move to coordinates and click
navigate:导航到某个 url:
- 首先需要获取 DriverWrapper,若为空则需初始化:
- 先创建一个 Playwright:
- 若检测到未安装浏览器,则会自动下载安装浏览器
- 然后运行驱动,创建一个链接
- 然后使用 Playwright 启动浏览器实例
- 控制浏览器实例打开新页面
- 将 Playwright、浏览器实例、页面等封装为 DriverWrapper 并返回
- 先创建一个 Playwright:
- 然后使用 DriverWrapper 获取当前页
- 将当前页导航到指定 url
click:点击某个交互元素,需传入元素序号
- 第一步与 navigate 相同,需要先获取 DriverWrapper
- 通过 DriverWrapper 获取交互元素注册器
- 根据元素序号从交互元素注册器中获取指定的交互元素
- 通过交互元素的定位器 Locator 触发 click 操作
input_text:输入文本到指定交互元素中,需传入文本和元素序号
- 前3步与 click 相同,先获取指定的交互元素
- 通过交互元素的定位器 Locator,填入指定文本
get_text:获取当前页的文本内容
- 前2步与 navigate 相同,获取当前页
- 当前页所有的 frame,获取 body 部分的文本,拼接到一起返回
其他几种操作大致相同,都是使用基于 Playwright 的 DriverWrapper
getCurrentToolStateString
当前工具是浏览器,所以该方法主要用来获取浏览器页面内容
- 当前页面的 url 和标题
- 每个 tab 页的 url 和标题
- 滚动信息
- 当前 tab 页的交互元素,包括序号、标题、url 等
其他
浏览器类型取决于 BROWSER 环境变量,支持 webkit、firefox、chromium,默认为 chromium
输入格式定义
{
"oneOf": [
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "navigate"
},
"url": {
"type": "string",
"description": "URL to navigate to"
}
},
"required": ["action", "url"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "click"
},
"index": {
"type": "integer",
"description": "Element index to click"
}
},
"required": ["action", "index"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "input_text"
},
"index": {
"type": "integer",
"description": "Element index to input text"
},
"text": {
"type": "string",
"description": "Text to input"
}
},
"required": ["action", "index", "text"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "key_enter"
},
"index": {
"type": "integer",
"description": "Element index to press enter"
}
},
"required": ["action", "index"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "screenshot"
}
},
"required": ["action"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "get_html"
}
},
"required": ["action"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "get_text"
}
},
"required": ["action"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "execute_js"
},
"script": {
"type": "string",
"description": "JavaScript code to execute"
}
},
"required": ["action", "script"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "scroll"
},
"direction": {
"type": "string",
"enum": ["up", "down"],
"description": "Scroll direction"
}
},
"required": ["action", "direction"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "switch_tab"
},
"tab_id": {
"type": "integer",
"description": "Tab ID to switch to"
}
},
"required": ["action", "tab_id"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "new_tab"
},
"url": {
"type": "string",
"description": "URL to open in new tab"
}
},
"required": ["action", "url"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "close_tab"
}
},
"required": ["action"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "refresh"
}
},
"required": ["action"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "get_element_position"
},
"element_name": {
"type": "string",
"description": "Element name to get position"
}
},
"required": ["action", "element_name"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"const": "move_to_and_click"
},
"position_x": {
"type": "integer",
"description": "X coordinate to move to and click"
},
"position_y": {
"type": "integer",
"description": "Y coordinate to move to and click"
}
},
"required": ["action", "position_x", "position_y"],
"additionalProperties": false
}
]
}
更多推荐
所有评论(0)