Playwright 浏览器自动化教程

Playwright 是由 Microsoft 开发的现代浏览器自动化框架,支持 Chromium、Firefox 和 WebKit 三大浏览器引擎,能够在无头模式下运行,非常适合在搬瓦工 VPS 上执行自动化测试、网页截图、数据采集等任务。相比传统方案,Playwright 具备自动等待、网络拦截、多标签页管理等先进特性。

一、系统要求

  • 操作系统:Ubuntu 20.04+(推荐 Ubuntu 22.04)。
  • 内存:至少 1GB,建议 2GB 以上(浏览器引擎较占内存)。
  • Python:3.8 以上版本,或使用 Node.js 16+ 版本。

二、安装 Playwright(Python 版)

2.1 安装系统依赖

apt update && apt upgrade -y
apt install python3 python3-pip python3-venv -y

2.2 创建项目环境

mkdir -p /opt/playwright-project && cd /opt/playwright-project
python3 -m venv venv
source venv/bin/activate

2.3 安装 Playwright

pip install playwright
playwright install --with-deps chromium

--with-deps 参数会自动安装浏览器运行所需的系统依赖库。如需安装全部浏览器:

playwright install --with-deps

三、安装 Playwright(Node.js 版)

3.1 安装 Node.js

curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
apt install nodejs -y
node --version && npm --version

3.2 初始化项目

mkdir -p /opt/playwright-node && cd /opt/playwright-node
npm init -y
npm install playwright

3.3 安装浏览器

npx playwright install --with-deps chromium

四、基础操作示例(Python)

4.1 网页截图

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page(viewport={'width': 1920, 'height': 1080})
    page.goto('https://example.com')
    page.screenshot(path='screenshot.png', full_page=True)
    browser.close()
    print('截图已保存')

4.2 表单填写与提交

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto('https://example.com/login')

    # 填写表单
    page.fill('input[name="username"]', 'myuser')
    page.fill('input[name="password"]', 'mypassword')

    # 点击登录按钮
    page.click('button[type="submit"]')

    # 等待导航完成
    page.wait_for_load_state('networkidle')

    print(f'当前页面: {page.url}')
    browser.close()

4.3 页面内容提取

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto('https://example.com')

    # 获取页面标题
    title = page.title()
    print(f'页面标题: {title}')

    # 提取所有链接
    links = page.eval_on_selector_all('a[href]',
        'elements => elements.map(e => ({text: e.textContent.trim(), href: e.href}))')

    for link in links:
        print(f'{link["text"]} -> {link["href"]}')

    browser.close()

五、异步操作

Playwright 支持异步 API,适合高并发场景:

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)

        # 创建多个页面并行处理
        urls = [
            'https://example.com/page1',
            'https://example.com/page2',
            'https://example.com/page3',
        ]

        tasks = []
        for url in urls:
            page = await browser.new_page()
            tasks.append(process_page(page, url))

        await asyncio.gather(*tasks)
        await browser.close()

async def process_page(page, url):
    await page.goto(url)
    title = await page.title()
    await page.screenshot(path=f'screenshot_{hash(url)}.png')
    print(f'{url} -> {title}')

asyncio.run(main())

六、网络请求拦截

Playwright 可以拦截和修改网络请求,用于屏蔽广告、模拟接口等场景:

from playwright.sync_api import sync_playwright

def handle_route(route):
    # 屏蔽图片和字体请求以加速加载
    if route.request.resource_type in ['image', 'font', 'stylesheet']:
        route.abort()
    else:
        route.continue_()

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()

    # 设置路由拦截
    page.route('**/*', handle_route)

    page.goto('https://example.com')
    content = page.content()
    print(f'页面大小: {len(content)} bytes')
    browser.close()

七、录制操作脚本

Playwright 提供了 codegen 工具,可以录制浏览器操作并自动生成代码:

# 录制操作(需要图形界面,适合本地开发)
playwright codegen https://example.com

# 在 VPS 上录制,生成 Python 代码
playwright codegen --target python -o script.py https://example.com

八、配置浏览器上下文

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=True,
        args=['--no-sandbox', '--disable-dev-shm-usage']
    )

    context = browser.new_context(
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        locale='zh-CN',
        timezone_id='Asia/Shanghai',
        permissions=['geolocation'],
        geolocation={'latitude': 39.9042, 'longitude': 116.4074},
    )

    page = context.new_page()
    page.goto('https://example.com')
    page.screenshot(path='configured.png')
    context.close()
    browser.close()

九、使用 systemd 管理服务

如果需要将 Playwright 脚本作为常驻服务运行:

cat > /etc/systemd/system/playwright-task.service <<EOF
[Unit]
Description=Playwright Automation Task
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/opt/playwright-project
ExecStart=/opt/playwright-project/venv/bin/python script.py
Restart=on-failure
RestartSec=30
Environment=PLAYWRIGHT_BROWSERS_PATH=/opt/playwright-project/browsers

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable playwright-task
systemctl start playwright-task

十、常见问题

浏览器启动失败

通常是缺少系统依赖,运行以下命令安装:

playwright install-deps

内存不足导致崩溃

在低内存 VPS 上,建议配置 swap 空间并仅安装 Chromium:

fallocate -l 2G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab

总结

Playwright 是功能全面的浏览器自动化框架,在搬瓦工 VPS 上可以实现网页测试、数据采集、截图监控等多种自动化任务。建议选择 2GB 以上内存的方案以获得更好的运行体验。选购搬瓦工 VPS 请参考 全部方案,购买时使用优惠码 NODESEEK2026 可享受 6.77% 折扣。如需了解其他自动化工具,可参考 Puppeteer 教程Selenium 教程

关于本站

搬瓦工VPS中文网(bwgvps.com)是非官方中文信息站,整理搬瓦工的方案、优惠和教程。我们不销售主机,不提供技术服务。

新手必读
搬瓦工优惠码

NODESEEK2026(优惠 6.77%)

购买时填入即可抵扣。