TT Bigdata TT Bigdata
首页
  • 部署专题

    • 常规安装
    • 一键部署
  • 组件安装

    • 常规&高可用
  • 版本专题

    • 更新说明
  • Ambari-Env

    • 环境准备
    • 开始使用
  • 组件编译

    • 专区—Ambari
    • 专区—Bigtop-官方组件
    • 专区—Bigtop-扩展组件
  • 报错解决

    • 专区—Ambari
    • 专区—Bigtop
  • 其他技巧

    • Maven镜像加速
    • Gradle镜像加速
    • Bower镜像加速
    • 虚拟环境思路
    • R环境安装+一键安装脚本
    • Ivy配置私有镜像仓库
    • Node.js 多版本共存方案
    • Ambari Web本地启动
    • Npm镜像加速
    • PostgreSQL快速安装
    • Temurin JDK 23快速安装
  • 成神之路

    • 专区—Ambari
    • 专区—Bigtop
  • 集成案例

    • Redis集成教学
    • Dolphin集成教学
    • Doris集成教学
    • 持续整理...
  • 核心代码

    • 各组件代码
    • 通用代码模板
  • 国产化&其他系统

    • Rocky系列
    • Ubuntu系列
  • 生产调优

    • 组件调优指南
    • 1v1指导调优
  • 支持&共建

    • 蓝图愿景
    • 技术支持
    • 合作共建
登陆
GitHub (opens new window)

JaneTTR

数据酿造智慧,每一滴都是沉淀!
首页
  • 部署专题

    • 常规安装
    • 一键部署
  • 组件安装

    • 常规&高可用
  • 版本专题

    • 更新说明
  • Ambari-Env

    • 环境准备
    • 开始使用
  • 组件编译

    • 专区—Ambari
    • 专区—Bigtop-官方组件
    • 专区—Bigtop-扩展组件
  • 报错解决

    • 专区—Ambari
    • 专区—Bigtop
  • 其他技巧

    • Maven镜像加速
    • Gradle镜像加速
    • Bower镜像加速
    • 虚拟环境思路
    • R环境安装+一键安装脚本
    • Ivy配置私有镜像仓库
    • Node.js 多版本共存方案
    • Ambari Web本地启动
    • Npm镜像加速
    • PostgreSQL快速安装
    • Temurin JDK 23快速安装
  • 成神之路

    • 专区—Ambari
    • 专区—Bigtop
  • 集成案例

    • Redis集成教学
    • Dolphin集成教学
    • Doris集成教学
    • 持续整理...
  • 核心代码

    • 各组件代码
    • 通用代码模板
  • 国产化&其他系统

    • Rocky系列
    • Ubuntu系列
  • 生产调优

    • 组件调优指南
    • 1v1指导调优
  • 支持&共建

    • 蓝图愿景
    • 技术支持
    • 合作共建
登陆
GitHub (opens new window)
  • 案例-Ambari重启失败

    • 解读-Ambari3.0.0重启失败问题
    • 解决-追加日志锚点
    • 解决-进一步分析与改造[一]
      • 背景与现象还原
      • 线程分工与堆栈特征分析
      • jstack 实战定位
    • 解决-进一步分析与改造[二]
  • 案例-数据库连接问题

  • 案例-日志级别控制问题

  • 案例-找不到snappy-devel

  • SYS-Rocky
  • 案例-Ambari重启失败
JaneTTR
2025-06-06
目录

解决-进一步分析与改造[一]

# 背景与现象还原

在 Rocky Linux 8.10 环境下,Ambari 3.0.0 升级到 JDK 17 后,服务启动常见卡死在数据源初始化阶段,表现为长时间无响应。通过截取的典型日志片段如下:

23:13:10.870 [main] INFO org.apache.ambari.server.orm.dao.HostRoleCommandDAO -- Host role command status summary cache enabled !
23:13:10.874 [main] INFO org.apache.ambari.annotations.TransactionalLock$LockArea -- LockArea HRC_STATUS_CACHE is enabled
23:13:10.945 [main] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- Registered org.apache.ambari.server.ldap.service.AmbariLdapConfigurationProvider in event publisher
log4j:WARN No such property [fileNamePattern] in org.apache.log4j.RollingFileAppender.
23:13:11.165 [main] INFO org.apache.ambari.server.logging.LockFactory -- Lock profiling is disabled
23:13:11.170 [main] INFO org.apache.ambari.server.controller.AmbariServer -- Getting the controller
23:13:11.170 [main] DEBUG org.apache.ambari.server.controller.AmbariServer -- Proxy authentication not specified
23:13:11.665 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:13:11.66--ServerSession(1136705850)-- You have specified multiple ids for the entity class [org.apache.ambari.server.orm.entities.KerberosKeytabServiceMappingEntity] without specifying an @IdClass. By doing this you may lose the ability to find by identity, distributed cache support etc. Note: You may however use EntityManager find operations by passing a list of primary key fields. Else, you will have to use JPQL queries to read your entities. For other id options see @PrimaryKey.
23:13:11.703 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:13:11.703--ServerSession(1136705850)-- The reference column name [resource_type_id] mapped on the element [field permissions] does not correspond to a valid id or basic field/column on the mapping reference. Will use referenced column name as provided.
23:13:11.938 [main] INFO org.apache.ambari.server.checks.DatabaseConsistencyCheckHelper -- Checking DB store version
23:13:11.938 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- JPA initialized event received: JpaInitializedEvent{eventType=JPA_INITIALIZED}
23:13:11.938 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- Loading ldap-configuration configuration data
23:13:11.946 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- =================Current get time  is [0] ms
23:16:34.339 [main] INFO eclipselink -- [EL Info]: 2025-06-14 23:16:34.338--ServerSession(1136705850)-- EclipseLink, version: Eclipse Persistence Services - 2.7.14.v20231208-d05cebc9b0
23:16:34.363 [main] INFO com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource -- Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 5, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, contextClassLoaderSource -> caller, dataSourceName -> 2rwhi8bbusrj4l1eqmvvd|5d9b7a8a, debugUnreturnedConnectionStackTraces -> false, description -> null, driverClass -> org.postgresql.Driver, extensions -> {}, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, forceSynchronousCheckins -> false, forceUseNamedDriverClass -> false, identityToken -> 2rwhi8bbusrj4l1eqmvvd|5d9b7a8a, idleConnectionTestPeriod -> 7200, initialPoolSize -> 5, jdbcUrl -> jdbc:postgresql://192.168.3.1:5432/ambari, maxAdministrativeTaskTime -> 0, maxConnectionAge -> 0, maxIdleTime -> 14400, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 32, maxStatements -> 0, maxStatementsPerConnection -> 0, minPoolSize -> 5, numHelperThreads -> 3, preferredTestQuery -> SELECT 1, privilegeSpawnedThreads -> false, properties -> {password=******, user=******}, propertyCycle -> 0, statementCacheNumDeferredCloseThreads -> 0, testConnectionOnCheckin -> false, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, userOverrides -> {}, usesTraditionalReflectiveProxies -> false ]
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier '/mchange-commons.properties' could not be found. Skipping.
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier '/mchange-log.properties' could not be found. Skipping.
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier '/c3p0.properties' could not be found. Skipping.
23:16:34.374 [main] DEBUG com.mchange.v2.cfg.MConfig -- The configuration file for resource identifier 'hocon:/reference,/application,/c3p0,/' could not be found. Skipping.
23:16:34.375 [main] DEBUG com.mchange.v2.resourcepool.BasicResourcePool -- com.mchange.v2.resourcepool.BasicResourcePool@3f1d6a13 config: [start -> 5; min -> 5; max -> 32; inc -> 5; num_acq_attempts -> 30; acq_attempt_delay -> 1000; check_idle_resources_delay -> 7200000; max_resource_age -> 0; max_idle_time -> 14400000; excess_max_idle_time -> 0; destroy_unreturned_resc_time -> 0; expiration_enforcement_delay -> 900000; break_on_acquisition_failure -> false; debug_store_checkout_exceptions -> false; force_synchronous_checkins -> false]
23:16:34.375 [main] DEBUG com.mchange.v2.c3p0.impl.C3P0PooledConnectionPoolManager -- Created new pool for auth, username (masked): 'am******'.
23:16:34.375 [main] DEBUG com.mchange.v2.resourcepool.BasicResourcePool -- acquire test -- pool size: 0; target_pool_size: 5; desired target? 1
23:16:34.375 [main] DEBUG com.mchange.v2.resourcepool.BasicResourcePool -- awaitAvailable(): [unknown]
23:16:34.652 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.652--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [viewInstanceEntity] for the entity class [class org.apache.ambari.server.orm.entities.ViewURLEntity] since weaving was not enabled or did not occur.
23:16:34.653 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.653--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [user] for the entity class [class org.apache.ambari.server.orm.entities.UserAuthenticationEntity] since weaving was not enabled or did not occur.
23:16:34.656 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.656--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [requestEntity] for the entity class [class org.apache.ambari.server.orm.entities.UpgradeEntity] since weaving was not enabled or did not occur.
23:16:34.658 [main] WARN eclipselink -- [EL Warning]: 2025-06-14 23:16:34.658--ServerSession(1136705850)-- Reverting the lazy setting on the OneToOne or ManyToOne attribute [executionCommand] for the entity class [class org.apache.ambari.server.orm.entities.HostRoleCommandEntity] since weaving was not enabled or did not occur.
23:16:34.882 [pool-2-thread-1] INFO org.apache.ambari.server.orm.dao.AmbariConfigurationDAO -- ========  [ldap-configuration] build query time is [5] ms
23:16:34.904 [pool-2-thread-1] INFO org.apache.ambari.server.orm.dao.AmbariConfigurationDAO -- ========   select time is [22] ms
23:16:34.916 [main] INFO org.apache.ambari.server.checks.DatabaseConsistencyCheckHelper -- DB store version is compatible
23:16:34.919 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- ========  toPropertis time is [0] ,map is [{
}]
23:16:34.919 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- =================Current Execute loading time is [202974] ms
23:16:34.919 [pool-2-thread-1] INFO org.apache.ambari.server.configuration.AmbariServerConfigurationProvider -- =================Current Execute decryptSensitiveData time is [0] ms
23:16:35.056 [main] INFO org.apache.ambari.server.stack.StackManager -- Initializing the stack manager...
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service package folder for service AMBARI_INFRA_SOLR-3.0.0 in common-services has been resolved to /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/package
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service folder /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/upgradesdoes not exist.
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service folder /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/checksdoes not exist.
23:16:35.240 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Service folder /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/server_actionsdoes not exist.
23:16:35.298 [main] INFO org.apache.ambari.server.stack.ServiceDirectory -- Role command order info was loaded from file: /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/role_command_order.json
23:16:35.298 [main] DEBUG org.apache.ambari.server.stack.ServiceDirectory -- Role Command Order for /var/lib/ambari-server/resources/common-services/AMBARI_INFRA_SOLR/3.0.0/role_command_order.json
               
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

提示

从日志可见,Provider 获取与 JPA 查询本身耗时极短,并未发生数据库瓶颈,反而在 ORM 数据源初始化(EclipseLink + c3p0)时产生了异常耗时,基本排除 DB 端的性能瓶颈假设。

# 线程分工与堆栈特征分析

进一步追踪日志结构可以看到,主线程([main])负责 ORM 组件与数据源(EclipseLink/c3p0)的初始化,只有初始化完成后,后台线程如 [pool-2-thread-1] 才能继续执行数据查询、配置加载等操作。此时如果 JVM 主线程迟迟无法返回,就会造成服务整体假死。

注意

Ambari 的 ORM 初始化迟滞,往往意味着底层 JVM 某个 API 卡住,需进一步借助线程分析工具定位根因。

# jstack 实战定位

通过如下命令锁定实际 JVM 进程:

ps -ef | grep ambari-server
1

找到实际 PID 后,执行

jstack <pid>
1

即可获取完整线程堆栈:

jstack实际截图

节选主线程堆栈关键段:

"main" #1 prio=5 os_prio=0 cpu=4460.52ms elapsed=97.10s tid=0x00007ff6f002ce80 nid=0x232b59 runnable  [0x00007ff6f9cbc000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileInputStream.readBytes(java.base@17.0.15/Native Method)
        at sun.security.provider.NativePRNG$RandomIO.readFully(java.base@17.0.15/NativePRNG.java:431)
        ...
        at org.eclipse.persistence.internal.security.JCEEncryptor.encryptPassword(JCEEncryptor.java:102)
        ...
        at org.apache.ambari.server.controller.AmbariServer.main(AmbariServer.java:1096)
1
2
3
4
5
6
7
8

主线程卡在 sun.security.provider.NativePRNG,即 SecureRandom 初始化环节。重点

线程名 状态 典型调用栈 说明
main RUNNABLE SecureRandom → NativePRNG → /dev/random 随机数获取阻塞
pool-2-thread-1 BLOCKED 等待 EntityManagerFactory 初始化 等待主线程锁释放
其他 GC/监控线程 RUNNABLE 系统健康无异常

笔记

通过 jstack 可见,绝大多数业务线程都在等待主线程初始化数据源,这也间接说明了性能瓶颈高度集中。

#Ambari#Rocky Linux#组件兼容性#运维排查#日志分析
解决-追加日志锚点
解决-进一步分析与改造[二]

← 解决-追加日志锚点 解决-进一步分析与改造[二]→

最近更新
01
bigtop-select 打包缺 compat 报错修复 deb
07-16
02
bigtop-select 打包缺 control 文件报错修复 deb
07-16
03
首次编译-环境初始化 必装
07-16
更多文章>
Theme by Vdoing | Copyright © 2017-2025 JaneTTR | MIT License
  • 跟随系统
  • 浅色模式
  • 深色模式
  • 阅读模式