[22213]Ambari 3.0.0 滚动重启只执行一台主机修复
# 一、问题现象:滚动更新后,只执行了一台
先看现象截图:滚动更新触发后,页面表现为“只更新一个”。

滚动更新后:

现象关键词
- 滚动重启 / 滚动更新触发成功
- 实际执行只落到一台主机(例如只重启
dev1.test.com的DATANODE) - 后续任务链直接退出,不再继续执行下一批 host
# 1、快速判断:不是 Agent 下发问题,而是调度链断了
从日志可以看到:命令仍然下发到了 agent(AgentCommandsPublisher.sendCommands),说明并不是“没下发”。
而真正致命的是:调度器执行 BatchRequest 的链路抛异常后退出,后续 host 的滚动步骤就没机会继续推进。
# 二、后台日志:ExecutionScheduleManager 抛 ClassCastException
直接看核心错误(截取关键段):
2026-02-16 20:19:49,706 INFO [ambari-client-thread-45] o.a.a.s.state.cluster.ClusterImpl:558 - Adding a new request schedule, clusterName = abc, id = 53, description = null
2026-02-16 20:19:49,706 INFO [ambari-client-thread-45] o.a.a.s.state.cluster.ClusterImpl:558 - Adding a new request schedule, clusterName = abc, id = 53, description = null
2026-02-16 20:19:49,772 INFO [ambari-client-thread-119] o.a.a.s.c.AmbariManagementControllerImpl:4152 - Received action execution request, clusterName=abc, request=isCommand :true, action :null, command :RESTART, inputs :{HAS_RESOURCE_FILTERS=true}, resourceFilters: [RequestResourceFilter{serviceName='HDFS', componentName='DATANODE', hostNames=[dev1.test.com]}], exclusive: false, clusterName :abc
2026-02-16 20:19:49,772 INFO [ambari-client-thread-119] o.a.a.s.c.AmbariManagementControllerImpl:4152 - Received action execution request, clusterName=abc, request=isCommand :true, action :null, command :RESTART, inputs :{HAS_RESOURCE_FILTERS=true}, resourceFilters: [RequestResourceFilter{serviceName='HDFS', componentName='DATANODE', hostNames=[dev1.test.com]}], exclusive: false, clusterName :abc
2026-02-16 20:19:49,813 INFO [ambari-client-thread-119] o.a.a.server.stageplanner.RoleGraph:175 - Detecting cycle graphs
2026-02-16 20:19:49,813 INFO [ambari-client-thread-119] o.a.a.server.stageplanner.RoleGraph:175 - Detecting cycle graphs
2026-02-16 20:19:49,814 INFO [ambari-client-thread-119] o.a.a.server.stageplanner.RoleGraph:176 - Graph:
(DATANODE, RESTART, 0)
2026-02-16 20:19:49,814 INFO [ambari-client-thread-119] o.a.a.server.stageplanner.RoleGraph:176 - Graph:
(DATANODE, RESTART, 0)
2026-02-16 20:19:49,843 ERROR [ExecutionScheduler_Worker-2] o.a.a.s.s.AbstractLinearExecutionJob:93 - Exception caught on execution of job LinearExecutionJobs.BatchRequestJob-53-1. Exiting linear chain...
org.apache.ambari.server.AmbariException: Exception occurred while performing request
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:683)
at org.apache.ambari.server.state.scheduler.BatchRequestJob.doWork(BatchRequestJob.java:82)
at org.apache.ambari.server.scheduler.AbstractLinearExecutionJob.execute(AbstractLinearExecutionJob.java:91)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.lang.ClassCastException: class org.glassfish.jersey.client.internal.HttpUrlConnector$1 cannot be cast to class java.lang.String (org.glassfish.jersey.client.internal.HttpUrlConnector$1 is in unnamed module of loader 'app'; java.lang.String is in module java.base of loader 'bootstrap')
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.convertToBatchRequestResponse(ExecutionScheduleManager.java:740)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.performApiRequest(ExecutionScheduleManager.java:942)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:671)
... 4 common frames omitted
2026-02-16 20:19:49,843 ERROR [ExecutionScheduler_Worker-2] o.a.a.s.s.AbstractLinearExecutionJob:93 - Exception caught on execution of job LinearExecutionJobs.BatchRequestJob-53-1. Exiting linear chain...
org.apache.ambari.server.AmbariException: Exception occurred while performing request
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:683)
at org.apache.ambari.server.state.scheduler.BatchRequestJob.doWork(BatchRequestJob.java:82)
at org.apache.ambari.server.scheduler.AbstractLinearExecutionJob.execute(AbstractLinearExecutionJob.java:91)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.lang.ClassCastException: class org.glassfish.jersey.client.internal.HttpUrlConnector$1 cannot be cast to class java.lang.String (org.glassfish.jersey.client.internal.HttpUrlConnector$1 is in unnamed module of loader 'app'; java.lang.String is in module java.base of loader 'bootstrap')
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.convertToBatchRequestResponse(ExecutionScheduleManager.java:740)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.performApiRequest(ExecutionScheduleManager.java:942)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:671)
... 4 common frames omitted
2026-02-16 20:19:49,855 INFO [ExecutionScheduler_Worker-2] org.quartz.core.JobRunShell:207 - Job LinearExecutionJobs.BatchRequestJob-53-1 threw a JobExecutionException:
org.quartz.JobExecutionException: org.apache.ambari.server.AmbariException: Exception occurred while performing request
at org.apache.ambari.server.scheduler.AbstractLinearExecutionJob.execute(AbstractLinearExecutionJob.java:97)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: org.apache.ambari.server.AmbariException: Exception occurred while performing request
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:683)
at org.apache.ambari.server.state.scheduler.BatchRequestJob.doWork(BatchRequestJob.java:82)
at org.apache.ambari.server.scheduler.AbstractLinearExecutionJob.execute(AbstractLinearExecutionJob.java:91)
... 2 common frames omitted
Caused by: java.lang.ClassCastException: class org.glassfish.jersey.client.internal.HttpUrlConnector$1 cannot be cast to class java.lang.String (org.glassfish.jersey.client.internal.HttpUrlConnector$1 is in unnamed module of loader 'app'; java.lang.String is in module java.base of loader 'bootstrap')
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.convertToBatchRequestResponse(ExecutionScheduleManager.java:740)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.performApiRequest(ExecutionScheduleManager.java:942)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:671)
... 4 common frames omitted
2026-02-16 20:19:49,892 INFO [agent-command-publisher-0] o.a.a.s.e.p.AgentCommandsPublisher:173 - AgentCommandsPublisher.sendCommands: sending ExecutionCommand for host dev1.test.com, role DATANODE, roleCommand CUSTOM_COMMAND, and command ID 142-0, task ID 1103
2026-02-16 20:19:49,892 INFO [agent-command-publisher-0] o.a.a.s.e.p.AgentCommandsPublisher:173 - AgentCommandsPublisher.sendCommands: sending ExecutionCommand for host dev1.test.com, role DATANODE, roleCommand CUSTOM_COMMAND, and command ID 142-0, task ID 1103
2026-02-16 20:19:50,092 INFO [agent-message-monitor-0] o.a.a.server.events.MessageEmitter:218 - Schedule execution command emitting, retry: 0, messageId: 1
2026-02-16 20:19:50,092 INFO [agent-message-monitor-0] o.a.a.server.events.MessageEmitter:218 - Schedule execution command emitting, retry: 0, messageId: 1
2026-02-16 20:19:50,094 WARN [agent-message-retry-0] o.a.a.server.events.MessageEmitter:255 - Reschedule execution command emitting, retry: 1, messageId: 1
2026-02-16 20:19:50,094 WARN [agent-message-retry-0] o.a.a.server.events.MessageEmitter:255 - Reschedule execution command emitting, retry: 1, messageId: 1
2026-02-16 20:19:58,414 WARN [ambari-client-thread-45] o.glassfish.jersey.internal.Errors:168 - The following warnings have been detected: WARNING: A HTTP GET method, public javax.ws.rs.core.Response org.apache.ambari.server.api.services.TaskService.getTask(java.lang.String,javax.ws.rs.core.HttpHeaders,javax.ws.rs.core.UriInfo,java.lang.String), should not consume any entity.
WARNING: A HTTP GET method, public javax.ws.rs.core.Response org.apache.ambari.server.api.services.TaskService.getComponents(java.lang.String,javax.ws.rs.core.HttpHeaders,javax.ws.rs.core.UriInfo), should not consume any entity.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
结论先行
真正导致“滚动重启只执行一台”的原因: 线性调度链(LinearExecutionJob)在执行 BatchRequestJob 时抛异常 → 直接退出 linear chain → 后续滚动步骤不再继续。
# 1、异常点位归因
| 维度 | 内容 |
|---|---|
| 触发线程 | ExecutionScheduler_Worker-* |
| 任务类型 | LinearExecutionJobs.BatchRequestJob-<scheduleId>-<seq> |
| 入口方法 | ExecutionScheduleManager.executeBatchRequest |
| 根因方法 | ExecutionScheduleManager.convertToBatchRequestResponse |
| 异常类型 | ClassCastException (HttpUrlConnector$1 -> String) |
为什么是 ClassCastException?
Jersey 2.x 下,Response.getEntity() 返回的对象并不保证是 String。
如果代码把 entity 强转 String,在某些 connector/stream 场景会出现 HttpUrlConnector$1 这种内部类型,直接炸。
# 三、修复点:替换 ExecutionScheduleManager.convertToBatchRequestResponse
本次修改文件:
处理办法可参考
# 四、构建与替换:最短路径落地到生产
# 1、编译命令(跳过 RAT / Checkstyle)
mvn -DskipTests -Drat.skip=true -Dcheckstyle.skip=true package
1
编译注意
如果本地环境有 checkstyle / rat 之类的强校验,直接跳过能节省大量时间;等修复验证通过再回头补规范更高效。
# 2、替换 ambari-server 产物到 /usr/lib/ambari-server
编译完成后,将生成的 ambari-server 相关 jar 替换到目标机器:

替换建议(生产习惯)
- 先备份:
cp -a xxx.jar xxx.jar.bak.$(date +%F_%T) - 替换后重启
ambari-server再验证 - 验证通过后再清理备份,避免回滚成本升高
# 五、验证结果:重试后滚动执行恢复正常
替换完成后重试,滚动更新成功:

日志侧也能看到行为恢复:

验证清单(建议照着勾)
- 1)触发滚动重启后,是否会持续推进到下一台 host
- 2)
ExecutionScheduler_Worker-*是否还出现ClassCastException - 3)
LinearExecutionJobs.BatchRequestJob-*是否还出现 “Exiting linear chain...” - 4)若开启 debug,确认能打印
Ambari API raw response(便于未来排障)