[22213]解决办法
# 一、问题现象:滚动更新后,只执行了一台
先看现象截图:滚动更新触发后,页面表现为“只更新一个”。

滚动更新后:

现象关键词
- 滚动重启 / 滚动更新触发成功
- 实际执行只落到一台主机(例如只重启
dev1.test.com的DATANODE) - 后续任务链直接退出,不再继续执行下一批 host
# 1、快速判断:不是 Agent 下发问题,而是调度链断了
从日志可以看到:命令仍然下发到了 agent(AgentCommandsPublisher.sendCommands),说明并不是“没下发”。
而真正致命的是:调度器执行 BatchRequest 的链路抛异常后退出,后续 host 的滚动步骤就没机会继续推进。
# 二、后台日志:ExecutionScheduleManager 抛 ClassCastException
直接看核心错误(截取关键段):
2026-02-16 20:19:49,706 INFO [ambari-client-thread-45] o.a.a.s.state.cluster.ClusterImpl:558 - Adding a new request schedule, clusterName = abc, id = 53, description = null
2026-02-16 20:19:49,706 INFO [ambari-client-thread-45] o.a.a.s.state.cluster.ClusterImpl:558 - Adding a new request schedule, clusterName = abc, id = 53, description = null
2026-02-16 20:19:49,772 INFO [ambari-client-thread-119] o.a.a.s.c.AmbariManagementControllerImpl:4152 - Received action execution request, clusterName=abc, request=isCommand :true, action :null, command :RESTART, inputs :{HAS_RESOURCE_FILTERS=true}, resourceFilters: [RequestResourceFilter{serviceName='HDFS', componentName='DATANODE', hostNames=[dev1.test.com]}], exclusive: false, clusterName :abc
2026-02-16 20:19:49,772 INFO [ambari-client-thread-119] o.a.a.s.c.AmbariManagementControllerImpl:4152 - Received action execution request, clusterName=abc, request=isCommand :true, action :null, command :RESTART, inputs :{HAS_RESOURCE_FILTERS=true}, resourceFilters: [RequestResourceFilter{serviceName='HDFS', componentName='DATANODE', hostNames=[dev1.test.com]}], exclusive: false, clusterName :abc
2026-02-16 20:19:49,813 INFO [ambari-client-thread-119] o.a.a.server.stageplanner.RoleGraph:175 - Detecting cycle graphs
2026-02-16 20:19:49,813 INFO [ambari-client-thread-119] o.a.a.server.stageplanner.RoleGraph:175 - Detecting cycle graphs
2026-02-16 20:19:49,814 INFO [ambari-client-thread-119] o.a.a.server.stageplanner.RoleGraph:176 - Graph:
(DATANODE, RESTART, 0)
2026-02-16 20:19:49,814 INFO [ambari-client-thread-119] o.a.a.server.stageplanner.RoleGraph:176 - Graph:
(DATANODE, RESTART, 0)
2026-02-16 20:19:49,843 ERROR [ExecutionScheduler_Worker-2] o.a.a.s.s.AbstractLinearExecutionJob:93 - Exception caught on execution of job LinearExecutionJobs.BatchRequestJob-53-1. Exiting linear chain...
org.apache.ambari.server.AmbariException: Exception occurred while performing request
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:683)
at org.apache.ambari.server.state.scheduler.BatchRequestJob.doWork(BatchRequestJob.java:82)
at org.apache.ambari.server.scheduler.AbstractLinearExecutionJob.execute(AbstractLinearExecutionJob.java:91)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.lang.ClassCastException: class org.glassfish.jersey.client.internal.HttpUrlConnector$1 cannot be cast to class java.lang.String (org.glassfish.jersey.client.internal.HttpUrlConnector$1 is in unnamed module of loader 'app'; java.lang.String is in module java.base of loader 'bootstrap')
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.convertToBatchRequestResponse(ExecutionScheduleManager.java:740)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.performApiRequest(ExecutionScheduleManager.java:942)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:671)
... 4 common frames omitted
2026-02-16 20:19:49,843 ERROR [ExecutionScheduler_Worker-2] o.a.a.s.s.AbstractLinearExecutionJob:93 - Exception caught on execution of job LinearExecutionJobs.BatchRequestJob-53-1. Exiting linear chain...
org.apache.ambari.server.AmbariException: Exception occurred while performing request
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:683)
at org.apache.ambari.server.state.scheduler.BatchRequestJob.doWork(BatchRequestJob.java:82)
at org.apache.ambari.server.scheduler.AbstractLinearExecutionJob.execute(AbstractLinearExecutionJob.java:91)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.lang.ClassCastException: class org.glassfish.jersey.client.internal.HttpUrlConnector$1 cannot be cast to class java.lang.String (org.glassfish.jersey.client.internal.HttpUrlConnector$1 is in unnamed module of loader 'app'; java.lang.String is in module java.base of loader 'bootstrap')
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.convertToBatchRequestResponse(ExecutionScheduleManager.java:740)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.performApiRequest(ExecutionScheduleManager.java:942)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:671)
... 4 common frames omitted
2026-02-16 20:19:49,855 INFO [ExecutionScheduler_Worker-2] org.quartz.core.JobRunShell:207 - Job LinearExecutionJobs.BatchRequestJob-53-1 threw a JobExecutionException:
org.quartz.JobExecutionException: org.apache.ambari.server.AmbariException: Exception occurred while performing request
at org.apache.ambari.server.scheduler.AbstractLinearExecutionJob.execute(AbstractLinearExecutionJob.java:97)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: org.apache.ambari.server.AmbariException: Exception occurred while performing request
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:683)
at org.apache.ambari.server.state.scheduler.BatchRequestJob.doWork(BatchRequestJob.java:82)
at org.apache.ambari.server.scheduler.AbstractLinearExecutionJob.execute(AbstractLinearExecutionJob.java:91)
... 2 common frames omitted
Caused by: java.lang.ClassCastException: class org.glassfish.jersey.client.internal.HttpUrlConnector$1 cannot be cast to class java.lang.String (org.glassfish.jersey.client.internal.HttpUrlConnector$1 is in unnamed module of loader 'app'; java.lang.String is in module java.base of loader 'bootstrap')
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.convertToBatchRequestResponse(ExecutionScheduleManager.java:740)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.performApiRequest(ExecutionScheduleManager.java:942)
at org.apache.ambari.server.scheduler.ExecutionScheduleManager.executeBatchRequest(ExecutionScheduleManager.java:671)
... 4 common frames omitted
2026-02-16 20:19:49,892 INFO [agent-command-publisher-0] o.a.a.s.e.p.AgentCommandsPublisher:173 - AgentCommandsPublisher.sendCommands: sending ExecutionCommand for host dev1.test.com, role DATANODE, roleCommand CUSTOM_COMMAND, and command ID 142-0, task ID 1103
2026-02-16 20:19:49,892 INFO [agent-command-publisher-0] o.a.a.s.e.p.AgentCommandsPublisher:173 - AgentCommandsPublisher.sendCommands: sending ExecutionCommand for host dev1.test.com, role DATANODE, roleCommand CUSTOM_COMMAND, and command ID 142-0, task ID 1103
2026-02-16 20:19:50,092 INFO [agent-message-monitor-0] o.a.a.server.events.MessageEmitter:218 - Schedule execution command emitting, retry: 0, messageId: 1
2026-02-16 20:19:50,092 INFO [agent-message-monitor-0] o.a.a.server.events.MessageEmitter:218 - Schedule execution command emitting, retry: 0, messageId: 1
2026-02-16 20:19:50,094 WARN [agent-message-retry-0] o.a.a.server.events.MessageEmitter:255 - Reschedule execution command emitting, retry: 1, messageId: 1
2026-02-16 20:19:50,094 WARN [agent-message-retry-0] o.a.a.server.events.MessageEmitter:255 - Reschedule execution command emitting, retry: 1, messageId: 1
2026-02-16 20:19:58,414 WARN [ambari-client-thread-45] o.glassfish.jersey.internal.Errors:168 - The following warnings have been detected: WARNING: A HTTP GET method, public javax.ws.rs.core.Response org.apache.ambari.server.api.services.TaskService.getTask(java.lang.String,javax.ws.rs.core.HttpHeaders,javax.ws.rs.core.UriInfo,java.lang.String), should not consume any entity.
WARNING: A HTTP GET method, public javax.ws.rs.core.Response org.apache.ambari.server.api.services.TaskService.getComponents(java.lang.String,javax.ws.rs.core.HttpHeaders,javax.ws.rs.core.UriInfo), should not consume any entity.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
结论先行
真正导致“滚动重启只执行一台”的原因: 线性调度链(LinearExecutionJob)在执行 BatchRequestJob 时抛异常 → 直接退出 linear chain → 后续滚动步骤不再继续。
# 1、异常点位归因
| 维度 | 内容 |
|---|---|
| 触发线程 | ExecutionScheduler_Worker-* |
| 任务类型 | LinearExecutionJobs.BatchRequestJob-<scheduleId>-<seq> |
| 入口方法 | ExecutionScheduleManager.executeBatchRequest |
| 根因方法 | ExecutionScheduleManager.convertToBatchRequestResponse |
| 异常类型 | ClassCastException (HttpUrlConnector$1 -> String) |
为什么是 ClassCastException?
Jersey 2.x 下,Response.getEntity() 返回的对象并不保证是 String。
如果代码把 entity 强转 String,在某些 connector/stream 场景会出现 HttpUrlConnector$1 这种内部类型,直接炸。
# 三、修复点:替换 ExecutionScheduleManager.convertToBatchRequestResponse
本次修改文件:
ambari-server/src/main/java/org/apache/ambari/server/scheduler/ExecutionScheduleManager.java
定位截图:

# 1、替换为安全读取 Response body 的实现
将原方法替换为如下实现(保持原有语义,同时解决 Jersey 2.x 的 entity 读取问题):
private BatchRequestResponse convertToBatchRequestResponse(Response clientResponse) {
BatchRequestResponse batchRequestResponse = new BatchRequestResponse();
// 0) NULL 保护:避免 performApiRequest / performApiGetRequest 异常时 response=null 导致 NPE
if (clientResponse == null) {
LOG.error("Ambari API response is NULL (clientResponse == null)");
batchRequestResponse.setReturnCode(500);
batchRequestResponse.setStatus(HostRoleStatus.FAILED.toString());
batchRequestResponse.setReturnMessage("Null response from Ambari API call");
return batchRequestResponse;
}
// 1) HTTP status
final int retCode = clientResponse.getStatus();
batchRequestResponse.setReturnCode(retCode);
// 2) Read response body safely (Jersey 2.x correct way)
String responseString = "";
try {
if (clientResponse.hasEntity()) {
responseString = clientResponse.readEntity(String.class);
if (responseString == null) {
responseString = "";
}
}
} catch (Exception ex) {
// 不要让异常继续冒泡,否则会把 scheduler chain 直接打断
LOG.warn("Failed to read Ambari API response entity as String. status={}", retCode, ex);
responseString = "";
}
// 3) Debug print raw response (safe truncate)
if (LOG.isDebugEnabled()) {
final int MAX_LEN = 8192; // 8KB 够看结构,避免刷屏
String bodyPreview = responseString;
if (bodyPreview.length() > MAX_LEN) {
bodyPreview = bodyPreview.substring(0, MAX_LEN) + "...[truncated, total=" + responseString.length() + "]";
}
String mediaType = "null";
try {
if (clientResponse.getMediaType() != null) {
mediaType = clientResponse.getMediaType().toString();
}
} catch (Exception ignore) {
}
LOG.debug("Ambari API raw response: httpStatus={}, contentType={}, length={}\n{}", retCode, mediaType, responseString.length(), bodyPreview);
}
// 4) Parse JSON (keep original semantics)
Map<String, Object> httpResponseMap;
try {
httpResponseMap = gson.<Map<String, Object>>fromJson(responseString, Map.class);
LOG.debug("Processing response as JSON");
} catch (JsonSyntaxException e) {
LOG.debug("Response is not valid JSON object. Recording as is");
httpResponseMap = new HashMap<>();
httpResponseMap.put("message", responseString);
} catch (Exception e) {
LOG.warn("Unexpected exception while parsing response JSON. status={}", retCode, e);
httpResponseMap = new HashMap<>();
httpResponseMap.put("message", responseString);
}
// 5) Convert to BatchRequestResponse (same logic as your original)
if (retCode < 300) {
if (httpResponseMap == null || httpResponseMap.isEmpty()) {
// Empty response on successful scenario
batchRequestResponse.setStatus(HostRoleStatus.COMPLETED.toString());
return batchRequestResponse;
}
Map requestMap = null;
Object requestMapObject = httpResponseMap.get("Requests");
if (requestMapObject instanceof Map) {
requestMap = (Map) requestMapObject;
}
if (requestMap != null) {
// requestId
Object idObj = requestMap.get(REQUESTS_ID_KEY);
if (idObj instanceof Double) {
batchRequestResponse.setRequestId(((Double) idObj).longValue());
} else if (idObj instanceof Number) {
batchRequestResponse.setRequestId(((Number) idObj).longValue());
} else if (idObj != null) {
try {
batchRequestResponse.setRequestId(Long.parseLong(idObj.toString()));
} catch (Exception ignore) {
}
}
// status: request_status or status
String status = null;
Object statusObj = requestMap.get(REQUESTS_STATUS_KEY);
if (statusObj != null) {
status = statusObj.toString();
}
Object status2Obj = requestMap.get("status");
if (status2Obj != null) {
status = status2Obj.toString();
}
// aborted/failed/timedout/total task counts
Object abortedObj = requestMap.get(REQUESTS_ABORTED_TASKS_KEY);
if (abortedObj instanceof Double) {
batchRequestResponse.setAbortedTaskCount(((Double) abortedObj).intValue());
} else if (abortedObj instanceof Number) {
batchRequestResponse.setAbortedTaskCount(((Number) abortedObj).intValue());
}
Object failedObj = requestMap.get(REQUESTS_FAILED_TASKS_KEY);
if (failedObj instanceof Double) {
batchRequestResponse.setFailedTaskCount(((Double) failedObj).intValue());
} else if (failedObj instanceof Number) {
batchRequestResponse.setFailedTaskCount(((Number) failedObj).intValue());
}
Object timedoutObj = requestMap.get(REQUESTS_TIMEDOUT_TASKS_KEY);
if (timedoutObj instanceof Double) {
batchRequestResponse.setTimedOutTaskCount(((Double) timedoutObj).intValue());
} else if (timedoutObj instanceof Number) {
batchRequestResponse.setTimedOutTaskCount(((Number) timedoutObj).intValue());
}
Object totalObj = requestMap.get(REQUESTS_TOTAL_TASKS_KEY);
if (totalObj instanceof Double) {
batchRequestResponse.setTotalTaskCount(((Double) totalObj).intValue());
} else if (totalObj instanceof Number) {
batchRequestResponse.setTotalTaskCount(((Number) totalObj).intValue());
}
batchRequestResponse.setStatus(status);
} else {
// Success but no "Requests" block
batchRequestResponse.setStatus(HostRoleStatus.COMPLETED.toString());
}
} else {
// Unsuccessful response
Object msg = (httpResponseMap != null) ? httpResponseMap.get("message") : null;
String msgStr = (msg != null) ? msg.toString() : responseString;
batchRequestResponse.setReturnMessage(msgStr);
batchRequestResponse.setStatus(HostRoleStatus.FAILED.toString());
}
return batchRequestResponse;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
这个修复的核心点
Response.readEntity(String.class)才是 Jersey 2.x 读取 body 的正确姿势- 加上
null保护 +try/catch,避免“读 body 失败”把 scheduler 整条链打断 - JSON 解析失败时兜底记录原文,避免二次异常
# 四、构建与替换:最短路径落地到生产
# 1、编译命令(跳过 RAT / Checkstyle)
mvn -DskipTests -Drat.skip=true -Dcheckstyle.skip=true package
1
编译注意
如果本地环境有 checkstyle / rat 之类的强校验,直接跳过能节省大量时间;等修复验证通过再回头补规范更高效。
# 2、替换 ambari-server 产物到 /usr/lib/ambari-server
编译完成后,将生成的 ambari-server 相关 jar 替换到目标机器:

替换建议(生产习惯)
- 先备份:
cp -a xxx.jar xxx.jar.bak.$(date +%F_%T) - 替换后重启
ambari-server再验证 - 验证通过后再清理备份,避免回滚成本升高
# 五、验证结果:重试后滚动执行恢复正常
替换完成后重试,滚动更新成功:

日志侧也能看到行为恢复:

验证清单(建议照着勾)
- 1)触发滚动重启后,是否会持续推进到下一台 host
- 2)
ExecutionScheduler_Worker-*是否还出现ClassCastException - 3)
LinearExecutionJobs.BatchRequestJob-*是否还出现 “Exiting linear chain...” - 4)若开启 debug,确认能打印
Ambari API raw response(便于未来排障)