TT Bigdata TT Bigdata
首页
  • 部署专题

    • 常规安装
    • 一键部署
  • 组件安装

    • 常规&高可用
  • 版本专题

    • 更新说明
  • Ambari-Env

    • 环境准备
    • 开始使用
  • 组件编译

    • 专区—Ambari
    • 专区—Bigtop-官方组件
    • 专区—Bigtop-扩展组件
  • 报错解决

    • 专区—Ambari
    • 专区—Bigtop
  • 其他技巧

    • Maven镜像加速
    • Gradle镜像加速
    • Bower镜像加速
    • 虚拟环境思路
    • R环境安装+一键安装脚本
    • Ivy配置私有镜像仓库
    • Node.js 多版本共存方案
    • Ambari Web本地启动
    • Npm镜像加速
    • PostgreSQL快速安装
    • Temurin JDK 23快速安装
  • 成神之路

    • 专区—Ambari
    • 专区—Bigtop
  • 集成案例

    • Redis集成教学
    • Dolphin集成教学
    • Doris集成教学
    • 持续整理...
  • 核心代码

    • 各组件代码
    • 通用代码模板
  • 国产化&其他系统

    • Rocky系列
    • Ubuntu系列
  • 生产调优

    • 组件调优指南
    • 1v1指导调优
  • 支持&共建

    • 蓝图愿景
    • 技术支持
    • 合作共建
登陆
GitHub (opens new window)

JaneTTR

数据酿造智慧,每一滴都是沉淀!
首页
  • 部署专题

    • 常规安装
    • 一键部署
  • 组件安装

    • 常规&高可用
  • 版本专题

    • 更新说明
  • Ambari-Env

    • 环境准备
    • 开始使用
  • 组件编译

    • 专区—Ambari
    • 专区—Bigtop-官方组件
    • 专区—Bigtop-扩展组件
  • 报错解决

    • 专区—Ambari
    • 专区—Bigtop
  • 其他技巧

    • Maven镜像加速
    • Gradle镜像加速
    • Bower镜像加速
    • 虚拟环境思路
    • R环境安装+一键安装脚本
    • Ivy配置私有镜像仓库
    • Node.js 多版本共存方案
    • Ambari Web本地启动
    • Npm镜像加速
    • PostgreSQL快速安装
    • Temurin JDK 23快速安装
  • 成神之路

    • 专区—Ambari
    • 专区—Bigtop
  • 集成案例

    • Redis集成教学
    • Dolphin集成教学
    • Doris集成教学
    • 持续整理...
  • 核心代码

    • 各组件代码
    • 通用代码模板
  • 国产化&其他系统

    • Rocky系列
    • Ubuntu系列
  • 生产调优

    • 组件调优指南
    • 1v1指导调优
  • 支持&共建

    • 蓝图愿景
    • 技术支持
    • 合作共建
登陆
GitHub (opens new window)
  • 案例-Ambari重启失败

    • 解读-Ambari3.0.0重启失败问题
    • 解决-追加日志锚点
      • 1. 引入常用依赖
      • 2. 定位关键卡死节点
      • 3. 日志锚点与性能统计插桩
        • 主体 patch diff 说明
      • 4. DAO 层同步插桩
      • 5. 编译&部署,查看实测效果
      • 6. 关键输出解读与典型误区
    • 解决-进一步分析与改造[一]
    • 解决-进一步分析与改造[二]
  • 案例-数据库连接问题

  • 案例-日志级别控制问题

  • 案例-找不到snappy-devel

  • SYS-Rocky
  • 案例-Ambari重启失败
JaneTTR
2025-06-06
目录

解决-追加日志锚点

在大体量组件复杂环境下遇到“启动卡死”类疑难问题时,传统日志往往无法反映每一步的真实耗时和卡点细节。本篇以 Ambari 3.0.0 为例,系统讲解如何快速基于源码追加日志锚点(含性能计时),让排查路径更短、更高效。

# 1. 引入常用依赖

提示

由于本人习惯使用 Hutool 和 Lombok ,后续日志与耗时统计全基于此栈,推荐提前集成,方便开发与团队协作。

在 ambari-server/pom.xml 文件末尾追加如下依赖:


<dependency>
    <groupId>cn.hutool</groupId>
    <artifactId>hutool-all</artifactId>
    <version>5.8.35</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.36</version>
</dependency>
1
2
3
4
5
6
7
8
9
10
11

pom依赖追加

# 2. 定位关键卡死节点

通过排查日志,发现每次卡死都停在

"Loading ldap-configuration configuration data"
1

进一步定位至:

ambari-server/src/main/java/org/apache/ambari/server/configuration/AmbariServerConfigurationProvider.java

源码日志锚点插桩

# 3. 日志锚点与性能统计插桩

# 主体 patch diff 说明

  • 增加 @Slf4j,替换原有 Logger,后续日志输出更便捷。
  • 所有关键点(JPA 启动、DAO 查询、属性转化、解密环节)都追加了毫秒级耗时日志。
  • 强化每一步执行后实时输出性能数据,为后续定位提供基础数据。
+import cn.hutool.core.date.DateUtil;
+import cn.hutool.core.date.TimeInterval;
+import cn.hutool.json.JSONUtil;
+import lombok.extern.slf4j.Slf4j;
...
+@Slf4j
 public abstract class AmbariServerConfigurationProvider<T extends AmbariServerConfiguration> implements Provider<T> {
...
+      final TimeInterval timer = DateUtil.timer();
+      var getV = ambariConfigurationDAOProvider.get();
+      log.info("=================Current get time  is [{}] ms",timer.intervalRestart());
+      T instance = loadInstance(getV.findByCategory(configurationCategory.getCategoryName()));
+      log.info("=================Current Execute loading time is [{}] ms",timer.intervalRestart());
       encryptor.decryptSensitiveData(instance);
+      log.info("=================Current Execute decryptSensitiveData time is [{}] ms",timer.intervalRestart());
...
+    TimeInterval timeInterval = DateUtil.timer();
...
+    log.info("========  toPropertis time is [{}] ,map is [{}]",
+            timeInterval.interval(),
+            JSONUtil.toJsonPrettyStr(map));
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

笔记

锚点建议按一次只加一层、逐步外扩的原则分批加,便于缩小故障范围。

# 4. DAO 层同步插桩

除了主流程,建议同时对 DAO 关键查询节点做性能锚点,锁定“数据库操作是否拖慢了流程”:

ambari-server/src/main/java/org/apache/ambari/server/orm/dao/AmbariConfigurationDAO.java

DAO锚点插桩

核心改造:

+    final TimeInterval timeInterval = DateUtil.timer();
...
+    log.info("========  [{}] build query time is [{}] ms",categoryName,timeInterval.intervalRestart());
+    var res = daoUtils.selectList(query);
+    log.info("========   select time is [{}] ms",timeInterval.intervalRestart());
1
2
3
4
5

# 5. 编译&部署,查看实测效果

完成所有锚点后,重新编译并部署,重启服务后实际日志输出示意:

image-20250615004427160

可以清楚看到每一阶段的毫秒级性能数据,实际瓶颈一目了然。

# 6. 关键输出解读与典型误区

23:13:10.870 [main] INFO org.apache.ambari.server.orm.dao.HostRoleCommandDAO -- Host role command status summary cache enabled !
23:13:10.874 [main] INFO org.apache.ambari.annotations.TransactionalLock$LockArea -- LockArea HRC_STATUS_CACHE is enabled
23:13:10.945 [main] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- Registered org.apache.ambari.server.ldap.service.AmbariLdapConfigurationProvider in event publisher
log4j:WARN No such property [fileNamePattern] in org.apache.log4j.RollingFileAppender.
23:13:11.165 [main] INFO org.apache.ambari.server.logging.LockFactory -- Lock profiling is disabled
23:13:11.170 [main] INFO org.apache.ambari.server.controller.AmbariServer -- Getting the controller
23:13:11.170 [main] DEBUG org.apache.ambari.server.controller.AmbariServer -- Proxy authentication not specified
23:13:11.665 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:13:11.66--ServerSession(1136705850)-- You have specified multiple ids for the entity class [org.apache.ambari.server.orm.entities.KerberosKeytabServiceMappingEntity] without specifying an @IdClass. By doing this you may lose the ability to find by identity, distributed cache support etc. Note: You may however use EntityManager find operations by passing a list of primary key fields. Else, you will have to use JPQL queries to read your entities. For other id options see @PrimaryKey.
23:13:11.703 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:13:11.703--ServerSession(1136705850)-- The reference column name [resource_type_id] mapped on the element [field permissions] does not correspond to a valid id or basic field/column on the mapping reference. Will use referenced column name as provided.
23:13:11.938 [main] INFO org.apache.ambari.server.checks.DatabaseConsistencyCheckHelper -- Checking DB store version
23:13:11.938 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- JPA initialized event received: JpaInitializedEvent{eventType=JPA_INITIALIZED}
23:13:11.938 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- Loading ldap-configuration configuration data
23:13:11.946 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- =================Current get time  is [0] ms
23:16:34.339 [main] INFO eclipselink -- [EL Info]: 2025-06-14 23:16:34.338--ServerSession(1136705850)-- EclipseLink, version: Eclipse Persistence Services - 2.7.14.v20231208-d05cebc9b0
23:16:34.363 [main] INFO com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource -- Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 5, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, contextClassLoaderSource -> caller, dataSourceName -> 2rwhi8bbusrj4l1eqmvvd|5d9b7a8a, debugUnreturnedConnectionStackTraces -> false, description -> null, driverClass -> org.postgresql.Driver, extensions -> {}, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, forceSynchronousCheckins -> false, forceUseNamedDriverClass -> false, identityToken -> 2rwhi8bbusrj4l1eqmvvd|5d9b7a8a, idleConnectionTestPeriod -> 7200, initialPoolSize -> 5, jdbcUrl -> jdbc:postgresql://192.168.3.1:5432/ambari, maxAdministrativeTaskTime -> 0, maxConnectionAge -> 0, maxIdleTime -> 14400, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 32, maxStatements -> 0, maxStatementsPerConnection -> 0, minPoolSize -> 5, numHelperThreads -> 3, preferredTestQuery -> SELECT 1, privilegeSpawnedThreads -> false, properties -> {password=******, user=******}, propertyCycle -> 0, statementCacheNumDeferredCloseThreads -> 0, testConnectionOnCheckin -> false, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, userOverrides -> {}, usesTraditionalReflectiveProxies -> false ]
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier '/mchange-commons.properties' could not be found. Skipping.
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier '/mchange-log.properties' could not be found. Skipping.
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier '/c3p0.properties' could not be found. Skipping.
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier 'hocon:/reference,/application,/c3p0,/' could not be found. Skipping.
23:16:34.375 [main] DEBUG com.mchange.v2.resourcepool.BasicResourcePool -- com.mchange.v2.resourcepool.BasicResourcePool@3f1d6a13 config: [start -> 5; min -> 5; max -> 32; inc -> 5; num_acq_attempts -> 30; acq_attempt_delay -> 1000; check_idle_resources_delay -> 7200000; max_resource_age -> 0; max_idle_time -> 14400000; excess_max_idle_time -> 0; destroy_unreturned_resc_time -> 0; expiration_enforcement_delay -> 900000; break_on_acquisition_failure -> false; debug_store_checkout_exceptions -> false; force_synchronous_checkins -> false]
23:16:34.375 [main] DEBUG com.mchange.v2.c3p0.impl.C3P0PooledConnectionPoolManager -- Created new pool for auth, username (masked): 'am******'.
23:16:34.375 [main] DEBUG com.mchange.v2.resourcepool.BasicResourcePool -- acquire test -- pool size: 0; target_pool_size: 5; desired target? 1
23:16:34.375 [main] DEBUG com.mchange.v2.resourcepool.BasicResourcePool -- awaitAvailable(): [unknown]
23:16:34.652 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.652--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [viewInstanceEntity] for the entity class [class org.apache.ambari.server.orm.entities.ViewURLEntity] since weaving was not enabled or did not occur.
23:16:34.653 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.653--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [user] for the entity class [class org.apache.ambari.server.orm.entities.UserAuthenticationEntity] since weaving was not enabled or did not occur.
23:16:34.656 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.656--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [requestEntity] for the entity class [class org.apache.ambari.server.orm.entities.UpgradeEntity] since weaving was not enabled or did not occur.
23:16:34.658 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.658--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [executionCommand] for the entity class [class org.apache.ambari.server.orm.entities.HostRoleCommandEntity] since weaving was not enabled or did not occur.
23:16:34.882 [pool-2-thread-1] INFO org.apache.ambari.server.orm.dao.AmbariConfigurationDAO -- ========  [ldap-configuration] build query time is [5] ms
23:16:34.904 [pool-2-thread-1] INFO org.apache.ambari.server.orm.dao.AmbariConfigurationDAO -- ========   select time is [22] ms
23:16:34.916 [main] INFO org.apache.ambari.server.checks.DatabaseConsistencyCheckHelper -- DB store version is compatible
23:16:34.919 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- ========  toPropertis time is [0] ,map is [{
}]
23:16:34.919 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- =================Current Execute loading time is [202974] ms
23:16:34.919 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- =================Current Execute decryptSensitiveData time is [0] ms
23:16:35.056 [main] INFO org.apache.ambari.server.stack.StackManager -- Initializing the stack manager...
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service package folder for service AMBARI_INFRA_SOLR-3.0.0 in common-services has been resolved to /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/package
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service folder /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/upgradesdoes not exist.
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service folder /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/checksdoes not exist.
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service folder /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/server_actionsdoes not exist.
23:16:35.298 [main] INFO org.apache.ambari.server.stack.ServiceDirectory -- Role command order info was loaded from file: /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/role_command_order.json
23:16:35.298 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Role Command Order for /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/role_command_order.json
               
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

从日志可见:

  • DAO 层大部分耗时很短
  • 但 loadInstance 流程某一步实际耗时近 200 秒!
  • decryptSensitiveData 几乎瞬间完成,非阻塞点
  • 日志锚点输出为后续进一步缩小排查范围提供了科学依据

注意

常见误区: 很多同学习惯只在“业务异常”点加日志,忽略流程耗时统计,导致排查效率极低。

#Ambari#Rocky Linux#组件兼容性#运维排查#日志分析
解读-Ambari3.0.0重启失败问题
解决-进一步分析与改造[一]

← 解读-Ambari3.0.0重启失败问题 解决-进一步分析与改造[一]→

最近更新
01
bigtop-select 打包缺 compat 报错修复 deb
07-16
02
bigtop-select 打包缺 control 文件报错修复 deb
07-16
03
首次编译-环境初始化 必装
07-16
更多文章>
Theme by Vdoing | Copyright © 2017-2025 JaneTTR | MIT License
  • 跟随系统
  • 浅色模式
  • 深色模式
  • 阅读模式