前一篇分析了 Jmanus 的 /api/executor/executeByToolNameAsync 接口,该接口主要用来异步执行业务逻辑

其结果的查询由 /api/executor/details/{planId} 接口完成,该接口主要在 plan_execution_record 表中查询执行结果

BrowserUseTool

上一篇执行查询阿里巴巴股价的例子中,主要用到了浏览器工具,也就是 BrowserUseTool

该类继承了 AbstractBaseTool 类,而 AbstractBaseTool 实现了 ToolCallBiFunctionDef 接口

BrowserUseTool 中有两个重要的方法,一个是 run 方法,一个是 getCurrentToolStateString

  • run:该方法对应 ReAct 模式中的行动,负责执行具体操作;
  • getCurrentToolStateString:该方法对应 ReAct 模式中的观察,负责获取行动执行后的结果

方法逻辑

run

根据 action 的不同,会进入不同分支,支持的 action 如下

            Supported operations include:
				- 'navigate': Visit specific URL
				- 'click': Click element by index
				- 'input_text': Input text in element
				- 'key_enter': Press Enter key
				- 'screenshot': Capture screenshot
				- 'get_html': Get HTML content of current page
				- 'get_text': Get text content of current page
				- 'execute_js': Execute JavaScript code
				- 'scroll': Scroll page up/down
				- 'refresh': Refresh current page
				- 'new_tab': Open new tab
				- 'close_tab': Close current tab
				- 'switch_tab': Switch to specific tab
				- 'get_element_position_by_name': Get element position by name
				- 'move_to_and_click': Move to coordinates and click

navigate:导航到某个 url:

  1. 首先需要获取 DriverWrapper,若为空则需初始化:
    1. 先创建一个 Playwright:
      1. 若检测到未安装浏览器,则会自动下载安装浏览器
      2. 然后运行驱动,创建一个链接
    2. 然后使用 Playwright 启动浏览器实例
    3. 控制浏览器实例打开新页面
    4. 将 Playwright、浏览器实例、页面等封装为 DriverWrapper 并返回
  2. 然后使用 DriverWrapper 获取当前页
  3. 将当前页导航到指定 url

click:点击某个交互元素,需传入元素序号

  1. 第一步与 navigate 相同,需要先获取 DriverWrapper 
  2. 通过 DriverWrapper 获取交互元素注册器
  3. 根据元素序号从交互元素注册器中获取指定的交互元素
  4. 通过交互元素的定位器 Locator 触发 click 操作

input_text:输入文本到指定交互元素中,需传入文本和元素序号

  1. 前3步与 click 相同,先获取指定的交互元素
  2. 通过交互元素的定位器 Locator,填入指定文本

get_text:获取当前页的文本内容

  1. 前2步与 navigate 相同,获取当前页
  2. 当前页所有的 frame,获取 body 部分的文本,拼接到一起返回

其他几种操作大致相同,都是使用基于 Playwright 的 DriverWrapper

getCurrentToolStateString

当前工具是浏览器,所以该方法主要用来获取浏览器页面内容

  • 当前页面的 url 和标题
  • 每个 tab 页的 url 和标题
  • 滚动信息
  • 当前 tab 页的交互元素,包括序号、标题、url 等

其他

浏览器类型取决于 BROWSER 环境变量,支持 webkit、firefox、chromium,默认为 chromium

输入格式定义

{
				    "oneOf": [
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "navigate"
				                },
				                "url": {
				                    "type": "string",
				                    "description": "URL to navigate to"
				                }
				            },
				            "required": ["action", "url"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "click"
				                },
				                "index": {
				                    "type": "integer",
				                    "description": "Element index to click"
				                }
				            },
				            "required": ["action", "index"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "input_text"
				                },
				                "index": {
				                    "type": "integer",
				                    "description": "Element index to input text"
				                },
				                "text": {
				                    "type": "string",
				                    "description": "Text to input"
				                }
				            },
				            "required": ["action", "index", "text"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "key_enter"
				                },
				                "index": {
				                    "type": "integer",
				                    "description": "Element index to press enter"
				                }
				            },
				            "required": ["action", "index"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "screenshot"
				                }
				            },
				            "required": ["action"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "get_html"
				                }
				            },
				            "required": ["action"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "get_text"
				                }
				            },
				            "required": ["action"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "execute_js"
				                },
				                "script": {
				                    "type": "string",
				                    "description": "JavaScript code to execute"
				                }
				            },
				            "required": ["action", "script"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "scroll"
				                },
				                "direction": {
				                    "type": "string",
				                    "enum": ["up", "down"],
				                    "description": "Scroll direction"
				                }
				            },
				            "required": ["action", "direction"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "switch_tab"
				                },
				                "tab_id": {
				                    "type": "integer",
				                    "description": "Tab ID to switch to"
				                }
				            },
				            "required": ["action", "tab_id"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "new_tab"
				                },
				                "url": {
				                    "type": "string",
				                    "description": "URL to open in new tab"
				                }
				            },
				            "required": ["action", "url"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "close_tab"
				                }
				            },
				            "required": ["action"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "refresh"
				                }
				            },
				            "required": ["action"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "get_element_position"
				                },
				                "element_name": {
				                    "type": "string",
				                    "description": "Element name to get position"
				                }
				            },
				            "required": ["action", "element_name"],
				            "additionalProperties": false
				        },
				        {
				            "type": "object",
				            "properties": {
				                "action": {
				                    "type": "string",
				                    "const": "move_to_and_click"
				                },
				                "position_x": {
				                    "type": "integer",
				                    "description": "X coordinate to move to and click"
				                },
				                "position_y": {
				                    "type": "integer",
				                    "description": "Y coordinate to move to and click"
				                }
				            },
				            "required": ["action", "position_x", "position_y"],
				            "additionalProperties": false
				        }
				    ]
				}

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐