解决-追加日志锚点
在大体量组件复杂环境下遇到“启动卡死”类疑难问题时,传统日志往往无法反映每一步的真实耗时和卡点细节。本篇以 Ambari 3.0.0 为例,系统讲解如何快速基于源码追加日志锚点(含性能计时),让排查路径更短、更高效。
# 1. 引入常用依赖
提示
由于本人习惯使用 Hutool 和 Lombok ,后续日志与耗时统计全基于此栈,推荐提前集成,方便开发与团队协作。
在 ambari-server/pom.xml
文件末尾追加如下依赖:
<dependency>
<groupId>cn.hutool</groupId>
<artifactId>hutool-all</artifactId>
<version>5.8.35</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.36</version>
</dependency>
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
# 2. 定位关键卡死节点
通过排查日志,发现每次卡死都停在
"Loading ldap-configuration configuration data"
1
进一步定位至:
ambari-server/src/main/java/org/apache/ambari/server/configuration/AmbariServerConfigurationProvider.java
# 3. 日志锚点与性能统计插桩
# 主体 patch diff 说明
- 增加 @Slf4j,替换原有 Logger,后续日志输出更便捷。
- 所有关键点(JPA 启动、DAO 查询、属性转化、解密环节)都追加了毫秒级耗时日志。
- 强化每一步执行后实时输出性能数据,为后续定位提供基础数据。
+import cn.hutool.core.date.DateUtil;
+import cn.hutool.core.date.TimeInterval;
+import cn.hutool.json.JSONUtil;
+import lombok.extern.slf4j.Slf4j;
...
+@Slf4j
public abstract class AmbariServerConfigurationProvider<T extends AmbariServerConfiguration> implements Provider<T> {
...
+ final TimeInterval timer = DateUtil.timer();
+ var getV = ambariConfigurationDAOProvider.get();
+ log.info("=================Current get time is [{}] ms",timer.intervalRestart());
+ T instance = loadInstance(getV.findByCategory(configurationCategory.getCategoryName()));
+ log.info("=================Current Execute loading time is [{}] ms",timer.intervalRestart());
encryptor.decryptSensitiveData(instance);
+ log.info("=================Current Execute decryptSensitiveData time is [{}] ms",timer.intervalRestart());
...
+ TimeInterval timeInterval = DateUtil.timer();
...
+ log.info("======== toPropertis time is [{}] ,map is [{}]",
+ timeInterval.interval(),
+ JSONUtil.toJsonPrettyStr(map));
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
笔记
锚点建议按一次只加一层、逐步外扩的原则分批加,便于缩小故障范围。
# 4. DAO 层同步插桩
除了主流程,建议同时对 DAO 关键查询节点做性能锚点,锁定“数据库操作是否拖慢了流程”:
ambari-server/src/main/java/org/apache/ambari/server/orm/dao/AmbariConfigurationDAO.java
核心改造:
+ final TimeInterval timeInterval = DateUtil.timer();
...
+ log.info("======== [{}] build query time is [{}] ms",categoryName,timeInterval.intervalRestart());
+ var res = daoUtils.selectList(query);
+ log.info("======== select time is [{}] ms",timeInterval.intervalRestart());
1
2
3
4
5
2
3
4
5
# 5. 编译&部署,查看实测效果
完成所有锚点后,重新编译并部署,重启服务后实际日志输出示意:
可以清楚看到每一阶段的毫秒级性能数据,实际瓶颈一目了然。
# 6. 关键输出解读与典型误区
23:13:10.870 [main] INFO org.apache.ambari.server.orm.dao.HostRoleCommandDAO -- Host role command status summary cache enabled !
23:13:10.874 [main] INFO org.apache.ambari.annotations.TransactionalLock$LockArea -- LockArea HRC_STATUS_CACHE is enabled
23:13:10.945 [main] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- Registered org.apache.ambari.server.ldap.service.AmbariLdapConfigurationProvider in event publisher
log4j:WARN No such property [fileNamePattern] in org.apache.log4j.RollingFileAppender.
23:13:11.165 [main] INFO org.apache.ambari.server.logging.LockFactory -- Lock profiling is disabled
23:13:11.170 [main] INFO org.apache.ambari.server.controller.AmbariServer -- Getting the controller
23:13:11.170 [main] DEBUG org.apache.ambari.server.controller.AmbariServer -- Proxy authentication not specified
23:13:11.665 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:13:11.66--ServerSession(1136705850)-- You have specified multiple ids for the entity class [org.apache.ambari.server.orm.entities.KerberosKeytabServiceMappingEntity] without specifying an @IdClass. By doing this you may lose the ability to find by identity, distributed cache support etc. Note: You may however use EntityManager find operations by passing a list of primary key fields. Else, you will have to use JPQL queries to read your entities. For other id options see @PrimaryKey.
23:13:11.703 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:13:11.703--ServerSession(1136705850)-- The reference column name [resource_type_id] mapped on the element [field permissions] does not correspond to a valid id or basic field/column on the mapping reference. Will use referenced column name as provided.
23:13:11.938 [main] INFO org.apache.ambari.server.checks.DatabaseConsistencyCheckHelper -- Checking DB store version
23:13:11.938 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- JPA initialized event received: JpaInitializedEvent{eventType=JPA_INITIALIZED}
23:13:11.938 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- Loading ldap-configuration configuration data
23:13:11.946 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- =================Current get time is [0] ms
23:16:34.339 [main] INFO eclipselink -- [EL Info]: 2025-06-14 23:16:34.338--ServerSession(1136705850)-- EclipseLink, version: Eclipse Persistence Services - 2.7.14.v20231208-d05cebc9b0
23:16:34.363 [main] INFO com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource -- Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 5, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, contextClassLoaderSource -> caller, dataSourceName -> 2rwhi8bbusrj4l1eqmvvd|5d9b7a8a, debugUnreturnedConnectionStackTraces -> false, description -> null, driverClass -> org.postgresql.Driver, extensions -> {}, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, forceSynchronousCheckins -> false, forceUseNamedDriverClass -> false, identityToken -> 2rwhi8bbusrj4l1eqmvvd|5d9b7a8a, idleConnectionTestPeriod -> 7200, initialPoolSize -> 5, jdbcUrl -> jdbc:postgresql://192.168.3.1:5432/ambari, maxAdministrativeTaskTime -> 0, maxConnectionAge -> 0, maxIdleTime -> 14400, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 32, maxStatements -> 0, maxStatementsPerConnection -> 0, minPoolSize -> 5, numHelperThreads -> 3, preferredTestQuery -> SELECT 1, privilegeSpawnedThreads -> false, properties -> {password=******, user=******}, propertyCycle -> 0, statementCacheNumDeferredCloseThreads -> 0, testConnectionOnCheckin -> false, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, userOverrides -> {}, usesTraditionalReflectiveProxies -> false ]
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier '/mchange-commons.properties' could not be found. Skipping.
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier '/mchange-log.properties' could not be found. Skipping.
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier '/c3p0.properties' could not be found. Skipping.
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier 'hocon:/reference,/application,/c3p0,/' could not be found. Skipping.
23:16:34.375 [main] DEBUG com.mchange.v2.resourcepool.BasicResourcePool -- com.mchange.v2.resourcepool.BasicResourcePool@3f1d6a13 config: [start -> 5; min -> 5; max -> 32; inc -> 5; num_acq_attempts -> 30; acq_attempt_delay -> 1000; check_idle_resources_delay -> 7200000; max_resource_age -> 0; max_idle_time -> 14400000; excess_max_idle_time -> 0; destroy_unreturned_resc_time -> 0; expiration_enforcement_delay -> 900000; break_on_acquisition_failure -> false; debug_store_checkout_exceptions -> false; force_synchronous_checkins -> false]
23:16:34.375 [main] DEBUG com.mchange.v2.c3p0.impl.C3P0PooledConnectionPoolManager -- Created new pool for auth, username (masked): 'am******'.
23:16:34.375 [main] DEBUG com.mchange.v2.resourcepool.BasicResourcePool -- acquire test -- pool size: 0; target_pool_size: 5; desired target? 1
23:16:34.375 [main] DEBUG com.mchange.v2.resourcepool.BasicResourcePool -- awaitAvailable(): [unknown]
23:16:34.652 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.652--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [viewInstanceEntity] for the entity class [class org.apache.ambari.server.orm.entities.ViewURLEntity] since weaving was not enabled or did not occur.
23:16:34.653 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.653--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [user] for the entity class [class org.apache.ambari.server.orm.entities.UserAuthenticationEntity] since weaving was not enabled or did not occur.
23:16:34.656 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.656--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [requestEntity] for the entity class [class org.apache.ambari.server.orm.entities.UpgradeEntity] since weaving was not enabled or did not occur.
23:16:34.658 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.658--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [executionCommand] for the entity class [class org.apache.ambari.server.orm.entities.HostRoleCommandEntity] since weaving was not enabled or did not occur.
23:16:34.882 [pool-2-thread-1] INFO org.apache.ambari.server.orm.dao.AmbariConfigurationDAO -- ======== [ldap-configuration] build query time is [5] ms
23:16:34.904 [pool-2-thread-1] INFO org.apache.ambari.server.orm.dao.AmbariConfigurationDAO -- ======== select time is [22] ms
23:16:34.916 [main] INFO org.apache.ambari.server.checks.DatabaseConsistencyCheckHelper -- DB store version is compatible
23:16:34.919 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- ======== toPropertis time is [0] ,map is [{
}]
23:16:34.919 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- =================Current Execute loading time is [202974] ms
23:16:34.919 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- =================Current Execute decryptSensitiveData time is [0] ms
23:16:35.056 [main] INFO org.apache.ambari.server.stack.StackManager -- Initializing the stack manager...
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service package folder for service AMBARI_INFRA_SOLR-3.0.0 in common-services has been resolved to /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/package
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service folder /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/upgradesdoes not exist.
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service folder /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/checksdoes not exist.
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service folder /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/server_actionsdoes not exist.
23:16:35.298 [main] INFO org.apache.ambari.server.stack.ServiceDirectory -- Role command order info was loaded from file: /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/role_command_order.json
23:16:35.298 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Role Command Order for /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/role_command_order.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
从日志可见:
- DAO 层大部分耗时很短
- 但 loadInstance 流程某一步实际耗时近 200 秒!
decryptSensitiveData
几乎瞬间完成,非阻塞点- 日志锚点输出为后续进一步缩小排查范围提供了科学依据
注意
常见误区: 很多同学习惯只在“业务异常”点加日志,忽略流程耗时统计,导致排查效率极低。
- 01
- bigtop-select 打包缺 compat 报错修复 deb07-16
- 02
- bigtop-select 打包缺 control 文件报错修复 deb07-16
- 03
- 首次编译-环境初始化 必装07-16