Flexsim 强化学习
本教程主要是复现了Flexsim 2022最新的官方文档的样例。环境依赖Flexsim 2022Python 3.0,相关库包括GymStable-baselines3模型搭建Flexsim模型搭建
效果
本教程主要是复现了Flexsim 2022最新的官方文档的样例。废话不多说先上优化前后效果(相同倍速),主要是根据等待时间策略学习出了比较好的处理产品的先后顺序。
Flexsim强化学习优化前
Flexsim强化学习优化后
环境依赖
- Flexsim 2022
- Python 3.0,相关库包括
Gym
Stable-baselines3
模型搭建
Flexsim模型搭建
- 新建模型,拖拉 Source, Queue, Processor, Sink元素。并连接元素。
- 在Toolbox中,添加一个Global Table;在table的properties中,将表格重命名为ChangeoverTimes,并将行数和列数改为5,添加以下元素。
表格代表由工件i换到工件j时所消耗的时间。 - 点击Processor来编辑它的properties,在Setup Time中,从下拉菜单选择From/To Lookup Table。为Table选择ChangeoverTimes/
- 点击Source来编辑它的Properties,在triggers中为On creation 添加要素,选择Data>set Label and color。将值改为duniform(1,5,getstream(current))。随机产生五种产品。
- 这个时候保存模型为 ChangeoverTimesRL.fsm。就可以看到模型随机产生物品。
- 在Toolbox中加入 Statistics > Model Parameter Table.
- 将Parameter1表格重命名为Observations,同理创建一个名为Actions的表格。
- 将Observations的Parameter2重命名为LastItemType; 在这一行的value将值设置为整数,上限值为5.
9. 将Actions表格中的Parameter3命名为ItemType。将value限定为整数,上限5同样对应5种商品。
- 点击Processor,在Properties中,点击Pull,在Pull Strategy的下拉菜单,选择Pull Best Item 选项。
- 在显示出的Label中,选择Custom Value,,并输入
item.Type == Model.parameters["ItemType"].value
保存模型,运行可以看到红色的优先被拉去。
为模型添加强化学习功能。
-
在Toolbox,添加Connectivity > Reinforcement Learning
-
在Observation Space中,选择Discrete, 在Observation的参数中,选择LastItem Type;在 Action Space中,选择Discrete,并选择ItemType.
-
点击Apply,点击Processor,在Setup Time picklist选择Codebutton,可以看到“f_lastlabelval”,之后要用到这个值
-
返回强化学习的属性窗口,在On observation 的trigger添加,并选择Code Snippet。将描述文字从Code Snippet改为Set observation parameter。
-
将下面代码粘贴到field
Model.parameters["LastItemType"].value = getvarnum(Model.find("Processor1"), "f_lastlabelval");
6. 返回3D视图,点击sink,在Labels中,添加一个Number label标签,并命名为LastTime
7. 再添加一个标签命名为Reward,勾选Automatically Reset按钮,并保存
8. 在Triggers中添加On Entry,添加Data>Increment Value选项,在Increment的下拉菜单,选择current.labels[“Reward”],在by中输入 10/(Model.time - current.LastTime)
9. 添加一个Data> Set Label选项,Object 选择Current,Label选为“LastTime”,在Value选择Model.time。这就是我们的奖励计算
10. 返回强化学习的属性,编辑奖励方程,将其命名从 Reward Function 改为 Reward based on throughput。粘贴下面的代码。
double reward = Model.find("Sink1").Reward;
Model.find("Sink1").Reward = 0;
int done = (Model.time > 1000);
return [reward, done];
11. 在On Request Action中,添加Take a Random Action选择
12. 在Decision Events中添加一个新的event ,选择Pull Strategy选项。
保存并运行模型。
Python部分
依次,修改并运行下述代码,分别起到测试接口,训练,测试功能。
红色的部分是要修改的地方。
flexsim_env.py
import gym
import os
import subprocess
import socket
import json
from gym import error, spaces, utils
from gym.utils import seeding
import numpy as np
class FlexSimEnv(gym.Env):
metadata = {'render.modes': ['human', 'rgb_array', 'ansi']}
def __init__(self, flexsimPath, modelPath, address='localhost', port=5005, verbose=False, visible=False):
self.flexsimPath = flexsimPath
self.modelPath = modelPath
self.address = address
self.port = port
self.verbose = verbose
self.visible = visible
self.lastObservation = ""
self._launch_flexsim()
self.action_space = self._get_action_space()
self.observation_space = self._get_observation_space()
def reset(self):
self._reset_flexsim()
state, reward, done = self._get_observation()
return state
def step(self, action):
self._take_action(action)
state, reward, done = self._get_observation()
info = {}
return state, reward, done, info
def render(self, mode='human'):
if mode == 'rgb_array':
return np.array([0,0,0])
elif mode == 'human':
print(self.lastObservation)
elif mode == 'ansi':
return self.lastObservation
else:
super(FlexSimEnv, self).render(mode=mode)
def close(self):
self._close_flexsim()
def seed(self, seed=None):
self.seedNum = seed
return self.seedNum
def _launch_flexsim(self):
if self.verbose:
print("Launching " + self.flexsimPath + " " + self.modelPath)
args = [self.flexsimPath, self.modelPath, "-training", self.address + ':' + str(self.port)]
if self.visible == False:
args.append("-maintenance")
args.append("nogui")
self.flexsimProcess = subprocess.Popen(args)
self._socket_init(self.address, self.port)
def _close_flexsim(self):
self.flexsimProcess.kill()
def _release_flexsim(self):
if self.verbose:
print("Sending StopWaiting message")
self._socket_send(b"StopWaiting?")
def _get_action_space(self):
self._socket_send(b"ActionSpace?")
if self.verbose:
print("Waiting for ActionSpace message")
actionSpaceBytes = self._socket_recv()
return self._convert_to_gym_space(actionSpaceBytes)
def _get_observation_space(self):
self._socket_send(b"ObservationSpace?")
if self.verbose:
print("Waiting for ObservationSpace message")
observationSpaceBytes = self._socket_recv()
return self._convert_to_gym_space(observationSpaceBytes)
def _reset_flexsim(self):
if self.verbose:
print("Sending Reset message")
resetString = "Reset?"
if hasattr(self, "seedNum"):
resetString = "Reset:" + str(self.seedNum) + "?"
self._socket_send(resetString.encode())
def _get_observation(self):
if self.verbose:
print("Waiting for Observation message")
observationBytes = self._socket_recv()
self.lastObservation = observationBytes.decode('utf-8')
state, reward, done = self._convert_to_observation(observationBytes)
return state, reward, done
def _take_action(self, action):
actionStr = json.dumps(action, cls=NumpyEncoder)
if self.verbose:
print("Sending Action message: " + actionStr)
actionMessage = "TakeAction:" + actionStr + "?"
self._socket_send(actionMessage.encode())
def _socket_init(self, host, port):
if self.verbose:
print("Waiting for FlexSim to connect to socket on " + self.address + ":" + str(self.port))
self.serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.serversocket.bind((host, port))
self.serversocket.listen();
(self.clientsocket, self.socketaddress) = self.serversocket.accept()
if self.verbose:
print("Socket connected")
if self.verbose:
print("Waiting for READY message")
message = self._socket_recv()
if self.verbose:
print(message.decode('utf-8'))
if message != b"READY":
raise RuntimeError("Did not receive READY! message")
def _socket_send(self, msg):
totalsent = 0
while totalsent < len(msg):
sent = self.clientsocket.send(msg[totalsent:])
if sent == 0:
raise RuntimeError("Socket connection broken")
totalsent = totalsent + sent
def _socket_recv(self):
chunks = []
while 1:
chunk = self.clientsocket.recv(2048)
if chunk == b'':
raise RuntimeError("Socket connection broken")
if chunk[-1] == ord('!'):
chunks.append(chunk[:-1])
break;
else:
chunks.append(chunk)
return b''.join(chunks)
def _convert_to_gym_space(self, spaceBytes):
paramsStartIndex = spaceBytes.index(ord('('))
paramsEndIndex = spaceBytes.index(ord(')'), paramsStartIndex)
type = spaceBytes[:paramsStartIndex]
params = json.loads(spaceBytes[paramsStartIndex+1:paramsEndIndex])
if type == b'Discrete':
return gym.spaces.Discrete(params)
elif type == b'Box':
return gym.spaces.Box(np.array(params[0]), np.array(params[1]))
elif type == b'MultiDiscrete':
return gym.spaces.MultiDiscrete(params)
elif type == b'MultiBinary':
return gym.spaces.MultiBinary(params)
raise RuntimeError("Could not parse gym space string")
def _convert_to_observation(self, spaceBytes):
observation = json.loads(spaceBytes)
state = observation["state"]
if isinstance(state, list):
state = np.array(observation["state"])
reward = observation["reward"]
done = (observation["done"] == 1)
return state, reward, done
class NumpyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
return json.JSONEncoder.default(self, obj)
def main():
env = FlexSimEnv(
flexsimPath = "C:/Program Files/FlexSim 2022/program/flexsim.exe",
modelPath = "E:/刘一阳资料/Flexsim/demo/ChangeoverTimesRL.fsm",
verbose = True,
visible = True
)
for i in range(2):
env.seed(i)
observation = env.reset()
env.render()
done = False
rewards = []
while not done:
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
env.render()
rewards.append(reward)
if done:
cumulative_reward = sum(rewards)
print("Reward: ", cumulative_reward, "\n")
env._release_flexsim()
input("Waiting for input to close FlexSim...")
env.close()
if __name__ == "__main__":
main()
flexsim_training.py
import gym
from flexsim_env import FlexSimEnv
from stable_baselines3.common.env_checker import check_env
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
def main():
print("Initializing FlexSim environment...")
# Create a FlexSim OpenAI Gym Environment
env = FlexSimEnv(
flexsimPath = "C:/Program Files/FlexSim 2022/program/flexsim.exe",
modelPath = "E:/刘一阳资料/Flexsim/demo/ChangeoverTimesRL.fsm",
verbose = False,
visible = False
)
check_env(env) # Check that an environment follows Gym API.
# Training a baselines3 PPO model in the environment
model = PPO("MlpPolicy", env, verbose=1)
print("Training model...")
model.learn(total_timesteps=50000)
# save the model
print("Saving model...")
model.save("ChangeoverTimesModel")
input("Waiting for input to do some test runs...")
# Run test episodes using the trained model
for i in range(4):
env.seed(i)
observation = env.reset()
env.render()
done = False
rewards = []
while not done:
action, _states = model.predict(observation)
observation, reward, done, info = env.step(action)
env.render()
rewards.append(reward)
if done:
cumulative_reward = sum(rewards)
print("Reward: ", cumulative_reward, "\n")
env._release_flexsim()
input("Waiting for input to close FlexSim...")
env.close()
if __name__ == "__main__":
main()
import json
from stable_baselines3 import PPO
from http.server import BaseHTTPRequestHandler, HTTPServer
from urllib.parse import urlparse, parse_qs
import numpy as np
class FlexSimInferenceServer(BaseHTTPRequestHandler):
def do_GET(self):
params = parse_qs(urlparse(self.path).query)
self._handle_reply(params)
def do_POST(self):
content_length = int(self.headers['Content-Length'])
body = self.rfile.read(content_length)
params = parse_qs(body)
self._handle_reply(params)
def _handle_reply(self, params):
if len(params):
observation = []
if b'observation' in params.keys():
observationBytes = params[b'observation'][0]
observation = np.array(json.loads(observationBytes))
elif 'observation' in params.keys():
observationBytes = params['observation'][0]
observation = np.array(json.loads(observationBytes))
if isinstance(observation, list):
observation = np.array(observation)
action, _states = FlexSimInferenceServer.model.predict(observation)
self.send_response(200)
self.send_header("Content-type", "application/json")
self.end_headers()
self.wfile.write(bytes(json.dumps(action, cls=NumpyEncoder), "utf-8"))
return
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(bytes("", "utf-8"))
class NumpyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
return json.JSONEncoder.default(self, obj)
def main():
print("Loading model...")
model = PPO.load("ChangeoverTimesModel.zip")
FlexSimInferenceServer.model = model
# Create server object
print("Starting server...")
hostName = "localhost"
serverPort = 8890
webServer = HTTPServer((hostName, serverPort), FlexSimInferenceServer)
print("Server started http://%s:%s" % (hostName, serverPort))
# Start the web server
try:
webServer.serve_forever()
except KeyboardInterrupt:
pass
webServer.server_close()
print("Server stopped.")
if __name__ == "__main__":
main()
运行到最后一个模型可以得到一个本地的ip接口。
并在模型中修改。
现在再运行就可以了。
更多推荐
所有评论(0)