ClickHouse 列式分析引擎部署教程
ClickHouse 是由 Yandex 开源的列式 OLAP 数据库,以极快的分析查询速度著称。它能在亿级数据上实现秒级查询,非常适合日志分析、业务报表、实时监控等分析场景。本文将在搬瓦工 VPS 上部署 ClickHouse 并演示核心功能。
一、安装 ClickHouse
1.1 APT 安装
apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | tee /etc/apt/sources.list.d/clickhouse.list
apt-get update
apt-get install -y clickhouse-server clickhouse-client
systemctl start clickhouse-server
systemctl enable clickhouse-server
1.2 Docker 安装
docker run -d --name clickhouse -p 8123:8123 -p 9000:9000 -v ch_data:/var/lib/clickhouse -v ch_logs:/var/log/clickhouse-server --restart unless-stopped clickhouse/clickhouse-server:latest
二、基础配置
编辑 /etc/clickhouse-server/config.xml,关键配置:
<listen_host>0.0.0.0</listen_host>
<max_memory_usage>1000000000</max_memory_usage>
<max_server_memory_usage_to_ram_ratio>0.6</max_server_memory_usage_to_ram_ratio>
设置密码,编辑 /etc/clickhouse-server/users.xml:
<users>
<default>
<password_sha256_hex>你的SHA256密码哈希</password_sha256_hex>
</default>
</users>
# 生成密码哈希
echo -n 'YourPassword123!' | sha256sum
三、创建数据库和表
clickhouse-client --password YourPassword123!
CREATE DATABASE analytics;
USE analytics;
-- 使用 MergeTree 引擎创建日志表
CREATE TABLE access_logs (
timestamp DateTime,
ip String,
method LowCardinality(String),
path String,
status_code UInt16,
response_time Float32,
user_agent String,
country LowCardinality(String)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, ip)
TTL timestamp + INTERVAL 90 DAY;
表引擎选择:MergeTree 是最常用的引擎;ReplacingMergeTree 可去重;AggregatingMergeTree 支持预聚合;SummingMergeTree 自动求和。
四、数据导入
# 从 CSV 导入
clickhouse-client --password YourPassword123! --query="INSERT INTO analytics.access_logs FORMAT CSV" < access_logs.csv
# 从 JSON 导入
cat logs.json | clickhouse-client --password YourPassword123! --query="INSERT INTO analytics.access_logs FORMAT JSONEachRow"
# 批量插入示例
INSERT INTO analytics.access_logs VALUES
('2026-03-28 10:00:00', '1.2.3.4', 'GET', '/api/products', 200, 0.035, 'Mozilla/5.0', 'CN'),
('2026-03-28 10:00:01', '5.6.7.8', 'POST', '/api/orders', 201, 0.120, 'Python/3.11', 'US');
五、分析查询
-- 统计每小时请求量
SELECT toStartOfHour(timestamp) AS hour, count() AS requests
FROM access_logs
WHERE timestamp >= today()
GROUP BY hour ORDER BY hour;
-- Top 10 访问路径
SELECT path, count() AS hits, avg(response_time) AS avg_rt
FROM access_logs
WHERE timestamp >= today() - INTERVAL 7 DAY
GROUP BY path ORDER BY hits DESC LIMIT 10;
-- 状态码分布
SELECT status_code, count() AS cnt, round(cnt * 100.0 / sum(cnt) OVER (), 2) AS pct
FROM access_logs
WHERE timestamp >= today()
GROUP BY status_code ORDER BY cnt DESC;
-- 国家维度分析
SELECT country, count() AS requests, avg(response_time) AS avg_rt
FROM access_logs GROUP BY country ORDER BY requests DESC LIMIT 20;
六、物化视图(预聚合)
-- 创建每小时聚合的物化视图
CREATE MATERIALIZED VIEW analytics.hourly_stats
ENGINE = SummingMergeTree()
ORDER BY (hour, path)
AS SELECT
toStartOfHour(timestamp) AS hour,
path,
count() AS requests,
sum(response_time) AS total_rt,
countIf(status_code >= 500) AS errors
FROM analytics.access_logs
GROUP BY hour, path;
七、性能优化
- ORDER BY 选择:将查询中最常用的过滤列放在 ORDER BY 前面。
- 分区策略:按月分区(toYYYYMM)是最常见的选择。
- LowCardinality:对枚举值字段使用 LowCardinality 类型大幅减少内存。
- TTL:设置数据过期时间自动清理。
- 物化视图:对高频聚合查询创建物化视图预计算。
# 查看查询执行计划
EXPLAIN SELECT count() FROM access_logs WHERE timestamp >= today();
# 查看表占用空间
SELECT table, formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts WHERE database = 'analytics'
GROUP BY table;
八、常见问题
- 内存不足:调整 max_memory_usage 限制单查询内存,搬瓦工 1GB 方案建议设为 500MB。
- 写入太慢:使用批量插入(每批至少 1000 行),避免逐行 INSERT。
- 查询超时:检查 ORDER BY 是否匹配查询条件,避免全表扫描。
总结
ClickHouse 是在搬瓦工 VPS 上构建分析平台的优秀选择,亿级数据秒级查询的能力令人印象深刻。适合日志分析、业务报表等 OLAP 场景。如果你需要分布式 SQL 事务能力,可以参考 TiDB 或 CockroachDB。选购搬瓦工 VPS 请查看 全部方案,使用优惠码 NODESEEK2026 享受 6.77% 折扣,通过 bwh81.net 进入官网。