ClickHouse 列式分析引擎部署教程

ClickHouse 是由 Yandex 开源的列式 OLAP 数据库,以极快的分析查询速度著称。它能在亿级数据上实现秒级查询,非常适合日志分析、业务报表、实时监控等分析场景。本文将在搬瓦工 VPS 上部署 ClickHouse 并演示核心功能。

一、安装 ClickHouse

1.1 APT 安装

apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | tee /etc/apt/sources.list.d/clickhouse.list
apt-get update
apt-get install -y clickhouse-server clickhouse-client
systemctl start clickhouse-server
systemctl enable clickhouse-server

1.2 Docker 安装

docker run -d --name clickhouse   -p 8123:8123 -p 9000:9000   -v ch_data:/var/lib/clickhouse   -v ch_logs:/var/log/clickhouse-server   --restart unless-stopped   clickhouse/clickhouse-server:latest

二、基础配置

编辑 /etc/clickhouse-server/config.xml,关键配置:

<listen_host>0.0.0.0</listen_host>
<max_memory_usage>1000000000</max_memory_usage>
<max_server_memory_usage_to_ram_ratio>0.6</max_server_memory_usage_to_ram_ratio>

设置密码,编辑 /etc/clickhouse-server/users.xml

<users>
  <default>
    <password_sha256_hex>你的SHA256密码哈希</password_sha256_hex>
  </default>
</users>
# 生成密码哈希
echo -n 'YourPassword123!' | sha256sum

三、创建数据库和表

clickhouse-client --password YourPassword123!

CREATE DATABASE analytics;
USE analytics;

-- 使用 MergeTree 引擎创建日志表
CREATE TABLE access_logs (
    timestamp DateTime,
    ip String,
    method LowCardinality(String),
    path String,
    status_code UInt16,
    response_time Float32,
    user_agent String,
    country LowCardinality(String)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, ip)
TTL timestamp + INTERVAL 90 DAY;

表引擎选择:MergeTree 是最常用的引擎;ReplacingMergeTree 可去重;AggregatingMergeTree 支持预聚合;SummingMergeTree 自动求和。

四、数据导入

# 从 CSV 导入
clickhouse-client --password YourPassword123! --query="INSERT INTO analytics.access_logs FORMAT CSV" < access_logs.csv

# 从 JSON 导入
cat logs.json | clickhouse-client --password YourPassword123! --query="INSERT INTO analytics.access_logs FORMAT JSONEachRow"

# 批量插入示例
INSERT INTO analytics.access_logs VALUES
('2026-03-28 10:00:00', '1.2.3.4', 'GET', '/api/products', 200, 0.035, 'Mozilla/5.0', 'CN'),
('2026-03-28 10:00:01', '5.6.7.8', 'POST', '/api/orders', 201, 0.120, 'Python/3.11', 'US');

五、分析查询

-- 统计每小时请求量
SELECT toStartOfHour(timestamp) AS hour, count() AS requests
FROM access_logs
WHERE timestamp >= today()
GROUP BY hour ORDER BY hour;

-- Top 10 访问路径
SELECT path, count() AS hits, avg(response_time) AS avg_rt
FROM access_logs
WHERE timestamp >= today() - INTERVAL 7 DAY
GROUP BY path ORDER BY hits DESC LIMIT 10;

-- 状态码分布
SELECT status_code, count() AS cnt, round(cnt * 100.0 / sum(cnt) OVER (), 2) AS pct
FROM access_logs
WHERE timestamp >= today()
GROUP BY status_code ORDER BY cnt DESC;

-- 国家维度分析
SELECT country, count() AS requests, avg(response_time) AS avg_rt
FROM access_logs GROUP BY country ORDER BY requests DESC LIMIT 20;

六、物化视图(预聚合)

-- 创建每小时聚合的物化视图
CREATE MATERIALIZED VIEW analytics.hourly_stats
ENGINE = SummingMergeTree()
ORDER BY (hour, path)
AS SELECT
    toStartOfHour(timestamp) AS hour,
    path,
    count() AS requests,
    sum(response_time) AS total_rt,
    countIf(status_code >= 500) AS errors
FROM analytics.access_logs
GROUP BY hour, path;

七、性能优化

  • ORDER BY 选择:将查询中最常用的过滤列放在 ORDER BY 前面。
  • 分区策略:按月分区(toYYYYMM)是最常见的选择。
  • LowCardinality:对枚举值字段使用 LowCardinality 类型大幅减少内存。
  • TTL:设置数据过期时间自动清理。
  • 物化视图:对高频聚合查询创建物化视图预计算。
# 查看查询执行计划
EXPLAIN SELECT count() FROM access_logs WHERE timestamp >= today();

# 查看表占用空间
SELECT table, formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts WHERE database = 'analytics'
GROUP BY table;

八、常见问题

  • 内存不足:调整 max_memory_usage 限制单查询内存,搬瓦工 1GB 方案建议设为 500MB。
  • 写入太慢:使用批量插入(每批至少 1000 行),避免逐行 INSERT。
  • 查询超时:检查 ORDER BY 是否匹配查询条件,避免全表扫描。

总结

ClickHouse 是在搬瓦工 VPS 上构建分析平台的优秀选择,亿级数据秒级查询的能力令人印象深刻。适合日志分析、业务报表等 OLAP 场景。如果你需要分布式 SQL 事务能力,可以参考 TiDBCockroachDB。选购搬瓦工 VPS 请查看 全部方案,使用优惠码 NODESEEK2026 享受 6.77% 折扣,通过 bwh81.net 进入官网。

关于本站

搬瓦工VPS中文网(bwgvps.com)是非官方中文信息站,整理搬瓦工的方案、优惠和教程。我们不销售主机,不提供技术服务。

新手必读
搬瓦工优惠码

NODESEEK2026(优惠 6.77%)

购买时填入即可抵扣。