subprocess：命令行与外部程序调用

Python 办公自动化专题 · 用Python调用系统命令和外部程序的终极方案

专题：Python 自动化办公系统学习

关键词：Python, 自动化办公, subprocess, 命令行, 外部程序, Popen, 进程管理, Shell, Python系统管理

一、subprocess概述

subprocess是Python标准库中最核心的系统级模块之一，它的设计目标是用统一、安全且强大的接口取代旧有的os.system()、os.popen()、os.spawn*()等散落各处的进程创建函数。在Python 2.4引入subprocess之后，官方文档明确推荐使用该模块进行所有子进程管理操作，并在后续版本中持续强化其能力，使其成为Python与操作系统交互的首选桥梁。理解subprocess对于从事系统管理、自动化运维、CI/CD流程开发的工程师而言是基本功。

该模块的核心价值在于将"启动进程"、"管理进程生命周期"、"与进程通信"这三项任务整合到一套一致的API中。相较于os.system()只能执行命令却无法捕获输出，又或是os.popen()只能单向管道通信，subprocess同时支持标准输入、标准输出和标准错误的三向流控制，且能精细地管理进程的创建、等待、终止和返回值检查。这种全面性使得一个subprocess调用就能完成以往需要多个函数组合才能实现的功能，大幅降低了代码的复杂度和出错概率。

从API设计角度看，subprocess模块提供了两条使用路径——高层的run()函数和低层的Popen类。run()函数封装了最常见的"启动-等待-获取结果"场景，适合大多数批量任务和一次性命令调用。而Popen类则暴露了更底层的进程控制能力，支持异步执行、持续流式通信、并发多进程管理等高级需求。选择哪条路径取决于具体场景：简单的命令执行优先用run()，需要复杂交互或精细控制的场景则用Popen。

import subprocess

# 基本用法：运行命令并获取输出
result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
print(result.returncode)   # 0 表示成功
print(result.stdout)       # 标准输出内容

# 对比旧式 os.system（不推荐）
# os.system("ls -l")       # 无法捕获输出，返回退出码

# run() 的核心参数一览
result = subprocess.run(
    ["ping", "-c", "4", "8.8.8.8"],
    capture_output=True,    # 捕获 stdout 和 stderr
    text=True,              # 以文本模式（而非字节）返回
    timeout=30,             # 超时秒数，超时抛出 TimeoutExpired
    check=True,             # 非零返回时抛出 CalledProcessError
    input="y\n"             # 向 stdin 传递输入
)
print(f"返回码: {result.returncode}")
print(f"输出行数: {len(result.stdout.splitlines())}")

# Popen 底层接口（run() 的内部实现基础）
proc = subprocess.Popen(
    ["ffmpeg", "-version"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True
)
stdout, stderr = proc.communicate(timeout=10)
print(f"FFmpeg 版本信息:\n{stdout[:200]}...")
proc.wait()  # 等待进程结束

二、run方法

subprocess.run()是Python 3.5引入的高层API，也是日常使用频率最高的函数。它内部创建Popen对象并调用communicate()等待进程结束，最终返回一个CompletedProcess实例，其中封装了返回码、标准输出和标准错误三大核心信息。使用run()时，开发者只需关注"我要执行什么命令"和"我要如何获取结果"两个问题，而无需操心底层的文件描述符管理、进程轮询和资源回收等细节。这种"声明式"的设计极大地提升了代码的可读性和可维护性。

run()的关键参数中，capture_output=True是最常用的选项之一。它等价于将stdout和stderr同时设置为subprocess.PIPE，让Python捕获子进程的全部输出内容。与之配合的text=True参数则决定了输出的数据类型：当text=True（Python 3.7+）或universal_newlines=True时，输出以字符串形式返回；默认以bytes形式返回。在文本处理场景中启用text模式可以省去手动decode()的麻烦。需要注意，当输出内容非常大时（如读取GB级日志），将输出全部捕获到内存中可能导致内存溢出，此时应考虑使用Popen配合流式读取。

check=True是另一个重要的安全参数。当设置为True时，如果被调用的命令返回非零退出码，run()会立即抛出subprocess.CalledProcessError异常。这在构建自动化脚本时尤其有价值——如果上游命令失败（如编译报错、文件不存在），脚本不应盲目继续执行。结合try/except使用，可以优雅地处理命令失败的各种场景。此外，timeout参数为命令执行设置了时间上限，超时后抛出TimeoutExpired异常并自动终止子进程，有效防止"僵尸"进程长期占用系统资源。

import subprocess

# 基本运行 — 执行命令，不关心输出
subprocess.run(["mkdir", "-p", "/tmp/test_dir"], check=True)

# 捕获输出并检查返回值
result = subprocess.run(
    ["df", "-h"],
    capture_output=True,
    text=True,
    check=True
)
print("磁盘使用情况:" + result.stdout[:500])

# 传递输入到子进程的 stdin
result = subprocess.run(
    ["grep", "ext4"],
    capture_output=True,
    text=True,
    input=result.stdout  # 将上面 df 的输出作为 grep 的输入
)
print(f"ext4 分区:\n{result.stdout}")

# 超时控制与异常处理
import subprocess
import sys

try:
    result = subprocess.run(
        ["sleep", "10"],
        timeout=3,          # 3秒后超时
        capture_output=True,
        text=True
    )
except subprocess.TimeoutExpired:
    print("命令执行超时，已被终止", file=sys.stderr)
except subprocess.CalledProcessError as e:
    print(f"命令返回非零退出码: {e.returncode}", file=sys.stderr)
except FileNotFoundError as e:
    print(f"命令未找到: {e.filename}", file=sys.stderr)
else:
    print(f"命令成功执行，返回码: {result.returncode}")

# 静默运行 — 忽略输出但检查返回码
result = subprocess.run(
    ["which", "python3"],
    capture_output=True,
    text=True
)
if result.returncode == 0:
    print(f"Python3 路径: {result.stdout.strip()}")
else:
    print("Python3 未安装")

三、Popen对象

subprocess.Popen是subprocess模块的基石，run()函数本质上是对Popen的封装。当需要超越"运行并等待"的简单模式时——例如启动一个长期运行的后台进程、并发管理多个子进程、实时逐行读取输出流、或向子进程持续写入数据——就必须直接使用Popen。Popen的构造函数接收与run()几乎相同的参数，但行为模式截然不同：它创建进程后立即返回，不等待进程结束，将进程的生命周期管理完全交给开发者。

Popen对象的核心方法包括poll()、wait()和communicate()。poll()是非阻塞的，检查进程是否已结束，返回None（运行中）或退出码（已结束）。wait()是阻塞的，一直等待到进程结束并返回退出码，可设置timeout参数。communicate()是最重要的交互方法，它向stdin写入数据（如有input），同时读取stdout和stderr，等待进程结束，返回(stdout, stderr)元组。communicate()内部会处理管道缓冲区问题，是推荐的读写方式。需要注意，PIPE管道是有容量限制的（通常64KB），如果子进程输出大量数据且没有及时读取，会导致子进程阻塞甚至死锁，这是Popen使用中最常见的陷阱。

在并发场景中，Popen的优势尤为突出。可以同时启动多个Popen对象，然后通过轮询或异步等待的方式统一收集中间结果。例如同时执行多个网络请求、并行压缩多个文件、或批量转码多媒体文件。结合concurrent.futures或手动的事件循环，可以构建高效的并行任务处理系统。需要注意的是，每个Popen对象都需要妥善清理——确保调用communicate()或wait()来回收进程资源，否则可能产生僵尸进程，在长期运行的服务中造成文件描述符泄露。

import subprocess
import time

# 启动后台进程（不等待）
proc = subprocess.Popen(
    ["ping", "-c", "10", "8.8.8.8"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True
)

# 非阻塞检查进程状态
while proc.poll() is None:
    print(f"进程仍在运行... PID: {proc.pid}")
    time.sleep(1)

print(f"进程已结束，返回码: {proc.returncode}")

# 读取剩余输出
stdout, stderr = proc.communicate()
print(f"最后输出:\n{stdout[-200:]}")

# 并发启动多个进程
import subprocess
import time

commands = [
    ["sleep", "3"],
    ["sleep", "2"],
    ["sleep", "1"],
]

# 同时启动所有进程
processes = []
for cmd in commands:
    proc = subprocess.Popen(cmd)
    processes.append((cmd, proc))
    print(f"已启动: {' '.join(cmd)}, PID: {proc.pid}")

# 统一等待所有进程完成
start = time.time()
for cmd, proc in processes:
    proc.wait()
    elapsed = time.time() - start
    print(f"{' '.join(cmd)} 完成, 耗时: {elapsed:.1f}s")

print(f"全部完成，总耗时: {time.time() - start:.1f}s")

# 使用 communicate() 安全读写
proc = subprocess.Popen(
    ["cat"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True
)

# 发送输入并读取输出
stdout, stderr = proc.communicate(input="Hello\nWorld\n", timeout=5)
print(f"cat 返回:\n{stdout}")
print(f"返回码: {proc.returncode}")

四、标准输入输出

subprocess对标准流（stdin、stdout、stderr）的精细控制是其相比于os.system()最核心的改进。通过stdin、stdout、stderr这三个参数，可以重定向到subprocess.PIPE（Python管道）、subprocess.DEVNULL（丢弃）、现有文件描述符、或者继承父进程的流（默认行为）。这种灵活性使得subprocess能够组合出极其丰富的进程通信模式：管道链（类似Shell的"|"操作符）、输出捕获、日志记录、交互式输入等，几乎覆盖了所有系统编程场景。

管道链式调用是标准流控制的经典应用。在Shell中，链式调用（如 cmd1 | cmd2 | cmd3）将前一个命令的标准输出连接到后一个命令的标准输入，形成数据处理流水线。在Python中，可以通过手动串联多个Popen对象的stdout和stdin来实现完全相同的效果。这种做法的优势在于可以在流水线的任何环节介入Python代码进行定制处理——例如在数据流经时进行过滤、转换或监控——这是纯Shell管道无法比拟的。实现时需注意：前一个进程的stdout必须设置为subprocess.PIPE，后一个进程的stdin要设置为前一个进程的proc.stdout。

实时读取输出是Popen的另一个典型使用场景。当子进程产生持续、长时间的输出（如构建日志、下载进度、服务日志）时，使用communicate()一次性读取会阻塞到进程结束，无法看到中间结果。正确的做法是逐行迭代proc.stdout流，在读取的同时处理每一行内容。这种方式特别适合需要在UI上实时显示进度的应用（如图形界面的安装程序、Web后端的日志流推送）。需要注意的是，子进程的输出是否是line-buffered取决于其自身的缓冲策略，有时需要设置环境变量（如PYTHONUNBUFFERED=1）来强制无缓冲输出，确保实时性。

import subprocess

# 管道链 — 模拟 shell 的 cmd1 | cmd2 | cmd3
# ps aux | grep python | head -5
proc1 = subprocess.Popen(
    ["ps", "aux"],
    stdout=subprocess.PIPE,
    text=True
)
proc2 = subprocess.Popen(
    ["grep", "python"],
    stdin=proc1.stdout,     # 将 proc1 的输出作为输入
    stdout=subprocess.PIPE,
    text=True
)
proc3 = subprocess.Popen(
    ["head", "-5"],
    stdin=proc2.stdout,
    stdout=subprocess.PIPE,
    text=True
)

# 关闭 proc1 和 proc2 的写入端（重要！避免死锁）
proc1.stdout.close()
proc2.stdout.close()

output, _ = proc3.communicate()
print(f"运行的 Python 进程 (前5条):\n{output}")

# 实时逐行读取输出
import subprocess

proc = subprocess.Popen(
    ["ping", "8.8.8.8"],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,  # 合并 stderr 到 stdout
    text=True,
    bufsize=1                  # 行缓冲
)

# 实时读取每一行
for line in proc.stdout:
    line = line.strip()
    if not line:
        continue
    print(f"[PING] {line}")
    # 检查是否包含特定信息
    if "time=" in line:
        time_ms = line.split("time=")[1].split(" ")[0]
        print(f"  -> 延迟: {time_ms}")

# 等待结束
proc.wait()
print(f"Ping 完成，返回码: {proc.returncode}")

# 标准错误捕获与分离
result = subprocess.run(
    ["ls", "/nonexistent"],
    capture_output=True,
    text=True
)
print(f"stdout: [{result.stdout}]")
print(f"stderr: [{result.stderr.strip()}]")
print(f"返回码: {result.returncode}")

# 丢弃输出（不需要关注结果时）
subprocess.run(
    ["rm", "-f", "/tmp/temp_file"],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL
)

五、Shell命令

subprocess中的shell=True参数是一个双刃剑。当设置为True时，命令字符串会通过系统的Shell（/bin/sh 或 cmd.exe）解释执行，这意味着可以像在终端中输入一样使用Shell的特性——包括环境变量展开（$HOME）、通配符匹配（*.txt）、管道运算符（|）、输出重定向（>）、以及执行多条命令（; 和 &&）。这在需要快速执行复杂Shell命令时非常方便，但从安全角度考虑，官方文档明确警告应尽可能避免使用shell=True。

shell=True的风险主要体现在两个方面。第一是命令注入安全漏洞：如果命令字符串中包含用户输入的内容（如文件名、搜索关键词），恶意用户可以构造特殊字符串来执行任意命令。例如，如果用户输入 "; rm -rf /"，最终执行的命令会变成 "echo ; rm -rf /"，后果不堪设想。第二是平台兼容性问题：不同操作系统的Shell行为差异巨大，Windows的cmd.exe与Unix的/bin/sh在语法、转义规则和可用命令上完全不同，使用shell=True会让代码变得不可移植。安全使用shell=True的最佳实践是永远不要让不可信的用户输入拼接到命令字符串中，或者使用shlex.quote()对参数进行安全转义。

安全替代方案是始终使用列表形式的命令（如 ["ls", "-l", filename]），完全不经过Shell解释。这样每个参数都被Python当作独立token传递给操作系统，其中的特殊字符不会被视为Shell语法。如果需要通配符展开，可以用glob.glob()先展开再传参；需要环境变量，可以用env参数直接设置；需要简单的管道组合，可以用Popen链式调用。虽然列表形式稍显啰嗦，但换来的安全性和可移植性是值得的。总的来说，shell=True应该被视为"知道自己在做什么"时的特例，而非日常使用的默认选项。

import subprocess

# 危险：shell=True 且拼接用户输入
# user_input = "; rm -rf /"  # 恶意输入
# subprocess.run(f"echo {user_input}", shell=True)  # 危险！

# 安全：使用列表形式，无需 shell
user_filename = "document.txt"
subprocess.run(["ls", "-l", user_filename], check=True)

# 如果确实需要 shell 特性，使用 shlex.quote() 转义
import shlex
user_input = "safe_file.txt"
safe_cmd = f"grep 'error' {shlex.quote(user_input)} | head -10"
result = subprocess.run(safe_cmd, shell=True, capture_output=True, text=True)
print(result.stdout)

# shell=True 的实际应用场景（仅限可控环境）
import subprocess

# 场景1：路径展开和通配符
result = subprocess.run(
    "du -sh /var/log/*.log | sort -rh | head -5",
    shell=True,
    capture_output=True,
    text=True
)
print("最大的5个日志文件:")
print(result.stdout)

# 场景2：环境变量引用
result = subprocess.run(
    "echo '当前用户:' $USER ' 家目录:' $HOME",
    shell=True,
    capture_output=True,
    text=True
)
print(result.stdout.strip())

# 安全替代方案：列表形式 + glob 展开
import subprocess
import glob

# 替代通配符
log_files = glob.glob("/var/log/*.log")
if log_files:
    result = subprocess.run(
        ["du", "-sh"] + log_files,
        capture_output=True,
        text=True
    )
    print("日志文件大小:")
    print(result.stdout)

# 替代管道：Python 处理
df_result = subprocess.run(
    ["df", "-h"],
    capture_output=True,
    text=True
)
# 用 Python 代码替代 grep
for line in df_result.stdout.splitlines():
    if "ext4" in line:
        print(f"ext4 分区: {line}")

六、错误与超时

在自动化脚本中，子进程的运行状况直接决定了整个任务的成败。subprocess模块提供了多层次的错误处理机制，覆盖了"命令不存在"、"命令执行失败"、"命令卡死超时"三类典型场景。合理的错误处理不仅能让脚本在异常情况下给出清晰的诊断信息，还能避免资源泄露——例如超时的子进程如果未被正确终止，会变成僵尸进程占用系统PID和文件描述符，长期积累可能导致系统服务不可用。

CalledProcessError异常是run(check=True)时抛出的标准异常，包含了returncode返回码、cmd命令和output/output内容。通过捕获这个异常，可以针对不同的退出码采取不同的恢复策略——例如，网络请求失败后等待重试、编译错误后发送通知、文件不存在则尝试替代路径。TimeoutExpired异常则对应timeout超时场景，异常对象中包含了已读取的输出内容（如果有），这在调试长时间运行的命令时非常有价值。捕获TimeoutExpired后，应调用kill()或terminate()确保子进程被彻底销毁，然后决定是重试还是跳过。

进程终止策略需要根据具体场景选择：terminate()发送SIGTERM信号（Windows上是TerminateProcess），请求进程优雅退出，给进程一个清理资源的机会；kill()发送SIGKILL信号（Windows上同样是TerminateProcess），强制立即终止，进程无法捕获此信号进行清理。通用做法是先尝试terminate()，等待一段时间（例如3秒），如果进程仍未退出再调用kill()。对于长时间运行的命令，强烈建议始终设置timeout参数，或在Popen循环中使用poll()配合超时检查，将进程管理的主动权握在自己手中。

import subprocess
import time

def run_with_retry(cmd, max_retries=3, timeout=30):
    """带重试和安全超时的命令执行函数"""
    for attempt in range(max_retries):
        try:
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=timeout,
                check=True
            )
            return result
        except subprocess.TimeoutExpired as e:
            print(f"尝试 {attempt+1}/{max_retries}: 超时 ({timeout}s)")
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)  # 指数退避
        except subprocess.CalledProcessError as e:
            print(f"尝试 {attempt+1}/{max_retries}: 失败 (返回码={e.returncode})")
            print(f"错误输出: {e.stderr[:200]}")
            if attempt == max_retries - 1:
                raise
            time.sleep(1)
    return None

# 使用示例
try:
    result = run_with_retry(
        ["curl", "-s", "https://api.example.com/health"],
        max_retries=3,
        timeout=10
    )
    print("API 健康检查通过")
except (subprocess.TimeoutExpired, subprocess.CalledProcessError) as e:
    print(f"健康检查最终失败: {e}")

# 手动进程终止模式
import subprocess
import os
import signal

proc = subprocess.Popen(
    ["ping", "8.8.8.8"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True
)

try:
    # 设置超时等待
    stdout, stderr = proc.communicate(timeout=5)
except subprocess.TimeoutExpired:
    print(f"进程 {proc.pid} 超时，正在终止...")

    # 第一步：尝试优雅终止
    proc.terminate()
    try:
        proc.wait(timeout=3)
        print("进程已优雅终止")
    except subprocess.TimeoutExpired:
        # 第二步：强制杀死
        print("强制杀死进程...")
        proc.kill()
        proc.wait()
        print("进程已强制终止")

    # 读取已完成的部分输出
    stdout = proc.stdout.read() if proc.stdout else ""
    print(f"已捕获的输出:\n{stdout[:200]}")

print(f"最终返回码: {proc.returncode}")

# 全面异常处理模式
import subprocess

def safe_exec(cmd, timeout=30):
    """安全执行命令，处理所有可能的异常"""
    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=timeout,
            check=False  # 不自动抛异常，手动处理
        )

        if result.returncode != 0:
            return {
                "success": False,
                "returncode": result.returncode,
                "stdout": result.stdout,
                "stderr": result.stderr,
                "error": f"非零退出码: {result.returncode}"
            }

        return {"success": True, "data": result.stdout}

    except FileNotFoundError:
        return {"success": False, "error": f"命令未找到: {cmd[0]}"}
    except subprocess.TimeoutExpired:
        return {"success": False, "error": f"命令超时 ({timeout}s)"}
    except PermissionError:
        return {"success": False, "error": f"权限不足: {cmd[0]}"}

# 使用
result = safe_exec(["ls", "/root"])  # 可能权限不足
if not result["success"]:
    print(f"执行失败: {result['error']}")
else:
    print(result["data"])

七、交互式程序

控制交互式程序是subprocess最具挑战性但也最有价值的应用之一。许多系统工具和网络服务是设计为交互式使用的——例如telnet、ssh、ftp、passwd、fdisk、以及各类CLI配置工具。这些程序并非一次性执行并退出，而是启动后持续等待用户输入命令。使用subprocess控制这类程序需要精确管理输入输出的时序：先发送一条命令，等待输出回显，解析输出内容判断下一步操作，再发送下一条命令。这种模式类似于Unix世界中的expect工具，但在Python中实现更为灵活。

控制交互式程序的核心难点在于"同步"——即Python发送输入和读取输出的时序必须准确对齐。简单地在启动后立即发送所有输入行通常不会奏效，因为交互式程序需要处理完前一条命令后才会输出提示符并等待下一条输入。正确的做法是逐行交互：发送一行输入后，持续读取输出直到发现预期的提示符（如 "$ "、"password:" 或特定关键字），然后再发送下一行。为了实现这一点，通常需要设置bufsize=1（行缓冲）和 universal_newlines=True，然后在循环中迭代 proc.stdout 或使用 select 模块进行超时控制。

对于生产场景，有几个成熟的第三方库（如 pexpect、paramiko）封装了交互式控制的复杂性，但理解底层的subprocess实现仍然非常有用。自行实现交互控制时需要注意几个关键点：一是子进程可能使用行缓冲或全缓冲，设置环境变量如 TERM=dumb 和 PYTHONUNBUFFERED=1 可以禁用缓冲；二是超时机制必不可少，防止交互式程序因异常状态而永久挂起；三是错误恢复逻辑，当输出与预期不符时能够优雅降级而非死循环。掌握这些技巧后，你可以用纯Python自动化几乎所有CLI工具的操作流程。

import subprocess
import time

def interactive_telnet(host, port, commands, timeout=10):
    """模拟 expect 风格的基本交互"""
    proc = subprocess.Popen(
        ["telnet", host, str(port)],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        text=True,
        bufsize=1
    )

    output = []
    start = time.time()

    for cmd in commands:
        # 等待合适的时机发送命令
        # 简单策略：等待一小段时间后发送
        time.sleep(1)

        if time.time() - start > timeout:
            print("全局超时")
            break

        # 发送命令
        print(f"发送: {cmd}")
        proc.stdin.write(cmd + "\n")
        proc.stdin.flush()

        # 读取响应
        line_count = 0
        while line_count < 5:  # 最多读取5行
            if proc.poll() is not None:
                break
            try:
                line = proc.stdout.readline()
                if line:
                    output.append(line.strip())
                    print(f"收到: {line.strip()}")
                    line_count += 1
            except (ValueError, OSError):
                break

    # 关闭并等待
    try:
        proc.stdin.close()
    except:
        pass
    proc.wait(timeout=5)

    return output

# 示例：连接本地 SSH（需先启动 sshd）
# commands = ["root@localhost", "password123", "ls -la", "exit"]
# output = interactive_telnet("localhost", 22, commands)
# print("交互记录:", "\n".join(output))

# 实用示例：自动执行 fdisk 创建分区（演示用，勿在生产环境运行）
import subprocess
import time

def automate_fdisk(device, commands):
    """自动化 fdisk 交互"""
    proc = subprocess.Popen(
        ["fdisk", device],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        text=True,
        bufsize=1
    )

    for cmd in commands:
        # 发送命令
        proc.stdin.write(cmd + "\n")
        proc.stdin.flush()
        print(f"-> 发送: {cmd!r}")
        time.sleep(0.3)

        # 读取响应
        while True:
            line = proc.stdout.readline()
            if not line:
                break
            print(f"<- {line.strip()}", flush=True)
            # 检测特定提示符
            if "Command (m for help):" in line or "次单位" in line:
                break

    proc.stdin.write("w\n")  # 写入并退出
    proc.stdin.flush()
    proc.wait()
    print("fdisk 完成")

# 注意：以下为示例，请勿直接运行
# automate_fdisk("/dev/sdb", ["n", "p", "", "", "", "t", "83", "p", "w"])

# 使用 pexpect 风格的纯 subprocess 实现
import subprocess
import threading
import queue

class InteractiveProcess:
    """基本的交互式进程控制类"""

    def __init__(self, cmd):
        self.proc = subprocess.Popen(
            cmd,
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
            bufsize=1
        )
        self.output_queue = queue.Queue()
        self._reader = threading.Thread(target=self._read_output, daemon=True)
        self._reader.start()

    def _read_output(self):
        """后台线程持续读取输出"""
        try:
            for line in self.proc.stdout:
                self.output_queue.put(line)
        except (ValueError, OSError):
            pass

    def send(self, cmd):
        """发送命令"""
        self.proc.stdin.write(cmd + "\n")
        self.proc.stdin.flush()

    def expect(self, pattern, timeout=5):
        """等待直到输出匹配模式"""
        start = time.time()
        accumulated = ""
        while time.time() - start < timeout:
            try:
                line = self.output_queue.get(timeout=0.5)
                accumulated += line
                if pattern in line:
                    return accumulated
            except queue.Empty:
                continue
        raise TimeoutError(f"未匹配到模式: {pattern!r}")

    def close(self):
        self.proc.terminate()
        self.proc.wait()

# 使用示例
# proc = InteractiveProcess(["bash"])
# proc.send("echo 'Hello World'")
# output = proc.expect("World")
# print(output)

八、环境与工作目录

子进程的执行环境对其行为有着决定性的影响。subprocess的env和cwd两个参数分别控制子进程的环境变量和工作目录，是构建可预测、可复现的自动化环境的核心工具。默认情况下，子进程会继承父进程的完整环境变量副本，这意味着PATH、HOME、LANG等变量的值在子进程中与在Python脚本中完全相同。这种继承行为大多数时候是便利的，但在某些场景下反而成为隐患——例如生产环境中需要精确控制程序的搜索路径，或需要为子进程模拟一个干净的"沙盒"环境。

env参数允许你完全自定义子进程的环境变量字典。当传递env时，子进程将只看到该字典中定义的变量，而不会继承父进程的任何环境变量。这意味着如果你不手动设置PATH，子进程将无法找到任何可执行文件。一个常见的模式是使用os.environ.copy()复制当前环境，然后针对性地修改个别变量——例如添加新的PATH条目或设置特定的配置变量。这种"继承并覆盖"的策略既保留了基础环境，又能实现精确的自定义。对于需要隔离测试的场景，也可以从零构建一个最小环境，只包含子进程所需的极少量变量。

cwd参数控制子进程的工作目录。设置cwd可以让命令在指定的目录下执行，这在需要处理相对路径或项目文件时特别有用。例如在构建脚本中，可以先cd到项目目录再执行构建命令。需要注意的是，cwd只影响子进程自身的工作目录，不会改变Python进程的工作目录，因此多个子进程可以各自在不同的目录下同时运行。此外，当与相对路径组合使用时，如果子进程依赖当前目录下的文件或配置，cwd必须正确设置，否则会出现"文件未找到"等令人困惑的错误。

import subprocess
import os

# 继承并修改环境变量
my_env = os.environ.copy()
my_env["PATH"] = "/usr/local/bin:" + my_env.get("PATH", "")
my_env["APP_ENV"] = "production"
my_env["PYTHONUNBUFFERED"] = "1"  # 禁用 Python 缓冲

result = subprocess.run(
    ["python3", "-c", "import os; print(os.environ.get('APP_ENV'))"],
    env=my_env,
    capture_output=True,
    text=True,
    check=True
)
print(f"自定义环境变量: {result.stdout.strip()}")  # 输出 production

# 从零构建最小环境
import subprocess

# 最小环境（仅包含必要变量）
minimal_env = {
    "PATH": "/usr/bin:/bin",
    "HOME": "/tmp/sandbox",
    "USER": "nobody",
    "LANG": "C.UTF-8"
}

result = subprocess.run(
    ["env"],
    env=minimal_env,
    capture_output=True,
    text=True
)
print("最小环境变量列表:")
for line in result.stdout.strip().splitlines():
    print(f"  {line}")

# 安全沙盒：限制可执行路径
sandbox_env = {
    "PATH": "/usr/bin:/bin",
    "TMPDIR": "/tmp/sandbox_tmp"
}
subprocess.run(
    ["ls", "-la", "/tmp/sandbox_tmp"],
    env=sandbox_env,
    cwd="/tmp/sandbox_tmp",  # 工作目录也切换到沙盒
    capture_output=True,
    text=True
)

# 工作目录控制实战
import subprocess
import tempfile
import os

# 在指定工作目录下运行命令
with tempfile.TemporaryDirectory() as tmpdir:
    # 在临时目录中创建一些文件
    subprocess.run(
        ["touch", "file1.txt", "file2.txt", "file3.txt"],
        cwd=tmpdir,
        check=True
    )

    # 在工作目录下执行 ls
    result = subprocess.run(
        ["ls", "-la"],
        cwd=tmpdir,
        capture_output=True,
        text=True,
        check=True
    )
    print(f"临时目录 {tmpdir} 中的文件:")
    print(result.stdout)

    # 相对路径也基于工作目录
    result = subprocess.run(
        ["cat", "file1.txt"],
        cwd=tmpdir,
        capture_output=True,
        text=True
    )
    print(f"cat file1.txt 返回码: {result.returncode}")

# 对比：不设置 cwd 的情况
result = subprocess.run(
    ["pwd"],
    capture_output=True,
    text=True
)
print(f"Python 进程的工作目录: {result.stdout.strip()}")

九、实战案例

subprocess在自动化办公和系统管理中有极为广泛的应用。本节通过三个典型的实战案例展示如何将前面八节的知识点融会贯通，构建真正的生产级自动化工具。这些案例涵盖了批量文件处理、系统命令封装和构建脚本编写三大类常见需求，每个案例都包含完整的错误处理、资源管理和日志记录机制，可以直接作为实际项目的起点。

批量文件格式转换是办公自动化的高频场景。无论是用ffmpeg进行视频转码、用ImageMagick处理图片、还是用pandoc转换文档格式，subprocess都可以将这些外部工具无缝集成到Python工作流中。核心设计模式是"路径收集 + 并发执行 + 进度报告 + 错误聚合"。利用Python的Pathlib收集需要处理的文件列表，使用Popen或run()调用外部工具，结合concurrent.futures实现并行处理，最后汇总所有成功和失败的结果。这种模式可以轻松扩展到任意数量的文件和格式类型。

系统命令封装和自动化构建脚本是系统管理员的日常。通过将常用的Shell命令序列封装为Python函数，可以获得比纯Shell脚本更强的错误处理能力、更好的跨平台兼容性、以及更清晰的日志输出。例如封装一个deploy()函数，内部依次执行代码拉取、依赖安装、静态文件编译、服务重启等步骤，每一步都包含超时控制、重试逻辑和错误诊断信息。配合Python的日志模块，可以构建完整的审计轨迹，这在生产环境的变更管理中至关重要。

import subprocess
import concurrent.futures
import time
from pathlib import Path

def convert_video(input_path, output_dir, crf=23):
    """使用 ffmpeg 转换单个视频文件"""
    output_path = output_dir / f"{input_path.stem}_compressed{input_path.suffix}"
    cmd = [
        "ffmpeg", "-i", str(input_path),
        "-c:v", "libx264",
        "-preset", "medium",
        "-crf", str(crf),
        "-c:a", "aac",
        "-b:a", "128k",
        "-y",  # 覆盖已存在文件
        str(output_path)
    ]

    start = time.time()
    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=300  # 5分钟超时
        )
        if result.returncode == 0:
            duration = time.time() - start
            return {
                "file": input_path.name,
                "status": "成功",
                "duration": f"{duration:.1f}s",
                "output": output_path.name
            }
        else:
            return {
                "file": input_path.name,
                "status": "失败",
                "error": result.stderr[-200:]  # 截取最后200字符错误
            }
    except subprocess.TimeoutExpired:
        return {"file": input_path.name, "status": "超时", "error": "超过300秒"}
    except FileNotFoundError:
        return {"file": input_path.name, "status": "失败", "error": "ffmpeg 未安装"}

def batch_convert(input_dir, output_dir, crf=23, max_workers=4):
    """批量并行转换视频文件"""
    input_dir = Path(input_dir)
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    video_files = list(input_dir.glob("*.mp4")) + list(input_dir.glob("*.avi"))
    print(f"找到 {len(video_files)} 个视频文件，开始转换 (CRF={crf})...")

    results = []
    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {
            executor.submit(convert_video, vf, output_dir, crf): vf
            for vf in video_files
        }
        for future in concurrent.futures.as_completed(futures):
            result = future.result()
            results.append(result)
            print(f"  [{result['status']}] {result['file']}")

    # 汇总统计
    success = sum(1 for r in results if r["status"] == "成功")
    failed = sum(1 for r in results if r["status"] != "成功")
    print(f"\n转换完成: {success} 成功, {failed} 失败")

    return results

# 使用示例
# batch_convert("raw_videos", "compressed_videos", crf=28, max_workers=4)

import subprocess
import logging
import shutil
from pathlib import Path

# 配置日志
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S"
)
logger = logging.getLogger("build")

class BuildAutomation:
    """项目构建与部署自动化"""

    def __init__(self, project_dir):
        self.project_dir = Path(project_dir).resolve()

    def run_cmd(self, cmd, desc, timeout=120):
        """执行命令并记录日志"""
        logger.info(f"开始: {desc}")
        logger.debug(f"命令: {' '.join(cmd)}")

        try:
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=timeout,
                check=True,
                cwd=self.project_dir
            )
            logger.info(f"完成: {desc}")
            return result
        except subprocess.CalledProcessError as e:
            logger.error(f"失败: {desc} (返回码={e.returncode})")
            logger.error(f"stdout: {e.stdout[-300:]}")
            logger.error(f"stderr: {e.stderr[-300:]}")
            raise
        except subprocess.TimeoutExpired:
            logger.error(f"超时: {desc} ({timeout}s)")
            raise

    def git_pull(self):
        """拉取最新代码"""
        return self.run_cmd(
            ["git", "pull", "--ff-only"],
            "拉取 Git 代码",
            timeout=60
        )

    def install_deps(self):
        """安装依赖"""
        return self.run_cmd(
            ["pip", "install", "-r", "requirements.txt"],
            "安装 Python 依赖",
            timeout=300
        )

    def collect_static(self):
        """收集静态文件"""
        return self.run_cmd(
            ["python", "manage.py", "collectstatic", "--noinput"],
            "收集静态文件",
            timeout=120
        )

    def run_migrations(self):
        """运行数据库迁移"""
        return self.run_cmd(
            ["python", "manage.py", "migrate", "--noinput"],
            "执行数据库迁移",
            timeout=120
        )

    def restart_service(self, service_name):
        """重启系统服务"""
        return self.run_cmd(
            ["systemctl", "restart", service_name],
            f"重启服务 {service_name}",
            timeout=30
        )

    def build_all(self, service_name="myapp"):
        """执行完整构建流程"""
        steps = [
            ("git_pull", self.git_pull),
            ("install_deps", self.install_deps),
            ("collect_static", self.collect_static),
            ("run_migrations", self.run_migrations),
            ("restart_service", lambda: self.restart_service(service_name)),
        ]

        results = {}
        for name, step_fn in steps:
            try:
                results[name] = step_fn()
            except Exception as e:
                logger.critical(f"构建在步骤 [{name}] 失败: {e}")
                return False, results

        logger.info("构建全部完成")
        return True, results

# 使用示例
# builder = BuildAutomation("/var/www/myapp")
# success, results = builder.build_all("myapp")
# if success:
#     print("部署成功")
# else:
#     print("部署失败，请检查日志")

import subprocess
import json
import platform

class SystemInfoCollector:
    """系统信息采集工具 —— 封装系统命令"""

    def get_cpu_info(self):
        """获取 CPU 信息"""
        if platform.system() == "Linux":
            result = subprocess.run(
                ["lscpu"],
                capture_output=True, text=True
            )
            info = {}
            for line in result.stdout.splitlines():
                if ":" in line:
                    key, val = line.split(":", 1)
                    info[key.strip()] = val.strip()
            return info
        else:
            result = subprocess.run(
                ["wmic", "cpu", "get", "Name,NumberOfCores,MaxClockSpeed", "/format:csv"],
                capture_output=True, text=True
            )
            return {"raw": result.stdout.strip()}

    def get_memory_info(self):
        """获取内存信息"""
        if platform.system() == "Linux":
            result = subprocess.run(
                ["free", "-h"],
                capture_output=True, text=True
            )
            return {"raw": result.stdout.strip()}
        else:
            result = subprocess.run(
                ["wmic", "memorychip", "get", "Capacity,Speed", "/format:csv"],
                capture_output=True, text=True
            )
            return {"raw": result.stdout.strip()}

    def get_disk_info(self):
        """获取磁盘信息"""
        result = subprocess.run(
            ["df", "-h", "--type=ext4", "--type=xfs"],
            capture_output=True, text=True
        )
        # 按行解析为结构化数据
        lines = result.stdout.strip().splitlines()
        if not lines:
            return []
        headers = lines[0].split()
        disks = []
        for line in lines[1:]:
            parts = line.split()
            if len(parts) == len(headers):
                disks.append(dict(zip(headers, parts)))
        return disks

    def get_network_info(self):
        """获取网络接口信息"""
        result = subprocess.run(
            ["ip", "-j", "addr"],  # JSON 输出
            capture_output=True, text=True
        )
        if result.returncode == 0:
            return json.loads(result.stdout)
        return {"error": "无法获取网络信息"}

    def collect_all(self):
        """收集所有系统信息"""
        return {
            "hostname": subprocess.run(
                ["hostname"], capture_output=True, text=True
            ).stdout.strip(),
            "cpu": self.get_cpu_info(),
            "memory": self.get_memory_info(),
            "disk": self.get_disk_info(),
            "network": self.get_network_info(),
            "os": platform.platform()
        }

# 使用示例
collector = SystemInfoCollector()
info = collector.collect_all()
print(json.dumps(info, indent=2, ensure_ascii=False))