记一次日志引起的故障:服务器 mongod 日志出现暴增,一天增加了 7+G 之多… Why?
问题现象
内网测试服务器出现了日志暴增的现象,日志描述如下:
1
2
|
- I WRITE [conn1117] update db.robots query: { date: "2018-05-09", serial:"93542d35-5030-41b9-xxxx-0e36c546c336" } planSummary: COLLSCAN update: { $set: {
- I COMMAND [conn1117] command db.$cmd command: update { update: "robots", updates: [ { q:{ date: "2018-05-09", serial: "93542d35-5030-41b9-xxxx-0e36c546c336" }, u: { $set: {}}
|
可以从日志描述中了解到:COLLSCAN
,即 全表扫描
导致了 慢查询
问题,进而导致日志所有操作日志都会被记录在日志文件中去,也就产生了日志暴增问题。
那么,问题是:什么原因造成了 COOLSCAN
? 慢操作
是如何定义的? 或超过多少时间的操作被认为是 慢操作
?
- 日志描述的主要操作是:客户端的一个定时任务,即:每台设备每 6s 中都要向服务器上报一次设备数据包
问题原因
先来看看文档中的一些介绍:
官网 Doc-Log Messages 里面提到了:
Client operations appear in the log if their duration exceeds the slow operation threshold(慢操作阀值)
or when the log verbosity level(日志详细程度)
is 1 or higher. These log entries include the full command object associated with the operation.
Log Verbosity Level(日志详细程度)
default: 0
slow operation threshold 慢操作阈值
针对日志记录做测试
查询优化主要在代码中实现,这里就不做赘述了。 来看看使用 logrotate 回滚拆分日志文件吧
问题解决:使用 logrotate 回滚拆分日志文件
选型
使用Logrotate管理MongoDB日志-选型
使用Logrotate管理MongoDB日志
logrotate使用指南
找到日志文件及PID记录文件
查看 mongod.conf 就可知晓:
1
2
3
4
|
systemLog:
path: /data/log/mongod.log
processManagement:
pidFilePath: /var/run/mongodb/mongod.pid
|
编写配置文件
通过 man logrotate 查看详细参数,结合业务需求,编写的配置文件如下:文件存放在 /etc/logrotate.d/ 目录
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
$ vim /etc/logrotate.d/mongod
# 添加下面内容
/data/log/*.log {
daily
rotate 3
size 512M
dateext
compress
missingok
notifempty
copytruncate
create 644 root root
sharedscripts
postrotate
/bin/kill -SIGUSR1 'cat /var/run/mongodb/mongod.pid 2> /dev/null' 2>/dev/null || true
endscript
|
验证
执行命令 logrotate -f -v /etc/logrotate.d/mongod
来验证日志是否被轮换了,示例执行结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
$ logrotate -f -v /etc/logrotate.d/mongod
reading config file /etc/logrotate.d/mongod
Allocating hash table for state file, size 15360 B
Handling 1 logs
rotating pattern: /data/log/*.log forced from command line (3 rotations)
empty log files are not rotated, old logs are removed
considering log /data/log/mongod.log
log needs rotating
rotating log /data/log/mongod.log, log->rotateCount is 3
dateext suffix '-20180511'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
glob finding old rotated logs failed
copying /data/log/mongod.log to /data/log/mongod.log-20180511
truncating /data/log/mongod.log
running postrotate script
compressing log with: /bin/gzip
$ ls /data/log/
mongod.log mongod.log-20180511.gz
|
Note: 当日志文件的实际大小小于配置文件(i.e:/etc/logrotate.d/mongod) 中的 size 阈值时,是不会进行转储操作的 (即:cron.daily 任务也不会进行转储)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
$ du -sh /data/log/mongod.log
248K mongod.log
# debug 测试 /etc/logrotate.d/mongod 配置, 此时 size=100M
$ logrotate -dv /etc/logrotate.d/mongod
reading config file /etc/logrotate.d/mongod
Allocating hash table for state file, size 15360 B
Handling 1 logs
rotating pattern: /data/log/mongod.log 104857600 bytes (3 rotations)
empty log files are not rotated, old logs are removed
considering log /data/log/mongod.log
log does not need rotating (log size is below the 'size' threshold)
not running postrotate script, since no logs were rotated
# debug 测试 /etc/logrotate.d/mongod 配置, 此时 size=100K
$ logrotate -dv /etc/logrotate.d/mongod
reading config file /etc/logrotate.d/mongod
Allocating hash table for state file, size 15360 B
Handling 1 logs
rotating pattern: /data/log/mongod.log 102400 bytes (3 rotations)
empty log files are not rotated, old logs are removed
considering log /data/log/mongod.log
log needs rotating
rotating log /data/log/mongod.log, log->rotateCount is 3
dateext suffix '-20180511'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
glob finding old rotated logs failed
copying /data/log/mongod.log to /data/log/mongod.log-20180511
truncating /data/log/mongod.log
running postrotate script
running script with arg /data/log/mongod.log : "
/bin/kill -SIGUSR1 'cat /var/run/mongodb/mongod.pid 2> /dev/null' 2> /dev/null || true
"
compressing log with: /bin/gzip
|
确定执行 cron.daily 情况
1
2
3
4
5
6
7
8
9
10
11
|
$ tail --lines 1000 /var/log/cron | grep cron.daily
May 11 03:01:01 localhost anacron[15856]: Will run job `cron.daily' in 25 min.
May 11 03:26:01 localhost anacron[15856]: Job `cron.daily' started
May 11 03:26:01 localhost run-parts(/etc/cron.daily)[15929]: starting logrotate
May 11 03:26:01 localhost run-parts(/etc/cron.daily)[15937]: finished logrotate
May 11 03:26:01 localhost run-parts(/etc/cron.daily)[15929]: starting man-db.cron
May 11 03:26:05 localhost run-parts(/etc/cron.daily)[16285]: finished man-db.cron
May 11 03:26:05 localhost run-parts(/etc/cron.daily)[15929]: starting mlocate
May 11 03:26:12 localhost run-parts(/etc/cron.daily)[16296]: finished mlocate
May 11 03:26:12 localhost anacron[15856]: Job `cron.daily' terminated
[root@VM_110_229_centos ~]$ |
See Also
Thanks to the authors 🙂