cProfile性能分析：函数级性能剖析

Python 测试与调试专题 · 精准定位代码性能瓶颈

专题：Python 测试与调试系统学习

关键词：Python, 测试, 调试, cProfile, 性能分析, pstats, snakeviz, Gprof2Dot, 函数剖析, Python性能

一、性能分析概述

性能分析（Profiling）是软件优化过程中不可或缺的一环。在Python开发中，性能分析主要分为确定性分析（Deterministic Profiling）和统计性分析（Statistical Profiling）两大类。确定性分析通过钩子函数记录每个函数调用的精确时间和次数，数据精确但运行时开销较大；统计性分析则通过采样方式定期检查程序状态，开销较小但精度较低。Python标准库提供了三种性能分析工具：profile（纯Python实现，开销较大）、cProfile（C扩展实现，低开销，推荐用于生产环境）以及已废弃的hotshot。性能分析（Profiling）与基准测试（Benchmarking）有本质区别：前者关注"代码的时间都花在了哪里"，后者关注"代码执行需要多长时间"；Profiling回答的是定性问题，Benchmarking回答的是定量问题。在实际开发中，通常先通过Profiling定位热点，再通过Benchmarking验证优化效果，两者结合才能高效提升程序性能。

# 基本性能分析示例
import cProfile

def slow_function():
    total = 0
    for i in range(1000000):
        total += i ** 2
    return total

cProfile.run('slow_function()', sort='cumtime')

# 确定性分析 vs 统计性分析对比
import cProfile
import random

def analyze_data(data):
    result = []
    for item in data:
        result.append(process_item(item))
    return result

def process_item(item):
    # 模拟耗时操作
    return [x * x for x in range(100) if x % 2 == 0]

# cProfile是确定性分析工具，记录每次函数调用
cProfile.run('analyze_data(range(100))', sort='time')

# Profiling vs Benchmarking 示例
import cProfile
import time

# Benchmarking：测量"执行需要多长时间"
start = time.perf_counter()
result = sum(i ** 2 for i in range(100000))
end = time.perf_counter()
print(f"Benchmark: {end - start:.4f}s")

# Profiling：分析"时间花在了哪里"
cProfile.run('sum(i ** 2 for i in range(100000))', sort='cumtime')

二、cProfile命令行使用

cProfile最常用的方式是通过命令行直接对Python脚本进行性能分析。基本命令格式为 python -m cProfile script.py，这会在脚本执行完毕后自动输出一份统计报告。命令行的 -s 参数用于指定排序方式，常用选项包括 cumtime（累计时间排序，默认）、time（内部时间排序）、ncalls（调用次数排序）、name（函数名排序）等。-o 参数可以将分析结果输出到二进制文件，方便后续使用pstats进行详细分析。在分析结果中，cumtime（累计时间）是一个关键指标，它表示函数本身及其所有子函数调用的总执行时间，适合定位整个调用链上的性能瓶颈；而 tottime（内部时间）则仅统计函数本身代码的执行时间，不包括子函数调用，适合定位具体的耗时函数。理解这两个指标的区别是正确使用cProfile的基础。

# 命令行使用示例
"""
# 基本性能分析，运行后直接输出统计结果
python -m cProfile example.py

# 按累计时间排序（从高到低，常用）
python -m cProfile -s cumtime example.py

# 按内部时间排序（排除子函数调用）
python -m cProfile -s time example.py

# 输出到二进制文件供pstats后续分析
python -m cProfile -o output.prof example.py
"""

# 在脚本中结合cProfile和pstats使用
import cProfile
import pstats

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

def calculate():
    for i in range(30):
        fibonacci(i)

# 直接运行并输出统计
cProfile.run('calculate()', sort='cumtime')

# 保存到二进制文件
cProfile.run('calculate()', 'fibonacci.prof')

# 从文件加载并详细分析
p = pstats.Stats('fibonacci.prof')
p.sort_stats('cumtime').print_stats(20)

# cumtime与tottime解读
import time

def helper():
    time.sleep(0.1)  # 子函数耗时
    return sum(range(1000))

def worker():
    helper()          # 子函数调用，worker的cumtime包含helper时间
    time.sleep(0.05)  # 函数自身代码耗时
    return sum(range(500))

def main():
    worker()

# 观察输出：worker的cumtime > tottime
# cumtime包含了worker+helper的总体时间
# tottime仅统计worker自身代码（不含helper）的执行时间
cProfile.run('main()', sort='cumtime')

三、Profile API编程

除了命令行方式，cProfile还提供了完善的编程接口，允许开发者以更灵活的方式嵌入性能分析。cProfile.Profile() 是核心类，通过 enable() 和 disable() 方法可以精确控制性能分析的起止范围，只对关键代码段进行分析而不是分析整个程序，这在大型应用中尤其重要。create_stats() 方法用于停止分析并生成统计结果，run() 方法接受一个字符串形式的Python代码并执行分析，runcall() 方法则直接接受一个可调用对象及其参数。这种编程接口非常适合在生产环境中对特定API端点或关键功能进行选择性分析，也可以在测试框架中自动嵌入，实现对性能退化的事前预警。结合上下文管理器或装饰器模式，可以进一步封装分析逻辑，使其对业务代码的侵入性降到最低。

# 使用Profile API进行选择性分析
import cProfile

def data_processing():
    total = 0
    for i in range(500000):
        total += i * i
    return total

def io_operation():
    # 模拟IO操作
    data = [i for i in range(10000)]
    return sum(data)

# 精确控制分析范围
profiler = cProfile.Profile()
profiler.enable()  # 开始分析

# 只对关键的数据处理部分进行分析
result1 = data_processing()
result2 = io_operation()

profiler.disable()  # 停止分析
profiler.print_stats(sort='cumtime')

# run() 和 runcall() 方法使用
import cProfile

def complex_calculation(n, multiplier):
    result = []
    for i in range(n):
        result.append(i * multiplier)
    return sum(result)

profiler = cProfile.Profile()

# 使用字符串执行（和cProfile.run类似）
profiler.run('complex_calculation(10000, 3)')

# 直接调用函数并传递参数
profiler.runcall(complex_calculation, 5000, 2)

# 手动生成统计并输出
profiler.create_stats()
profiler.print_stats(sort='cumtime')

# 上下文管理器风格封装
import cProfile
from contextlib import contextmanager

@contextmanager
def profile_context(sort='cumtime', lines=30):
    profiler = cProfile.Profile()
    profiler.enable()
    try:
        yield
    finally:
        profiler.disable()
        import pstats, io
        s = io.StringIO()
        ps = pstats.Stats(profiler, stream=s).sort_stats(sort)
        ps.print_stats(lines)
        print(s.getvalue())

# 使用上下文管理器进行性能分析
with profile_context(sort='time', lines=15):
    result = [i ** 3 for i in range(100000)]
    print(f"计算结果长度: {len(result)}")

四、pstats统计报告分析

pstats模块是cProfile的统计报告组件，提供了强大的数据处理能力。Stats 类可以从cProfile生成的二进制文件加载性能数据，也可以直接从Profile对象获取数据。sort_stats() 方法支持按多种条件排序，包括 cumtime（累计时间）、time（内部时间）、ncalls（调用次数）、pcalls（原始调用次数，排除递归）、name（函数名）等，支持链式调用实现多级排序。print_stats() 用于输出函数调用统计，可以指定行数限制或正则表达式过滤。print_callees() 显示每个函数调用了哪些子函数，print_callers() 则显示每个函数被哪些父函数调用，这两者结合起来可以完整还原程序的调用图。strip_dirs() 方法可以去除文件路径中的目录信息，使输出更简洁。这些功能让开发者能够从不同维度深入分析性能数据，快速定位优化方向。

# 基本Stats使用
import cProfile
import pstats

def outer(n):
    total = 0
    for i in range(n):
        total += inner(i)
    return total

def inner(x):
    return sum(range(x))

cProfile.run('outer(1000)', 'analysis.prof')

# 加载分析结果
stats = pstats.Stats('analysis.prof')

# 去除目录信息（输出更简洁）
stats.strip_dirs()

# 按累计时间排序，仅显示前10条
stats.sort_stats('cumtime')
stats.print_stats(10)

# 查看函数之间的调用关系
print("调用者 -> 被调用者关系:")
stats.print_callees()

print("\n被谁调用:")
stats.print_callers()

# 多级排序与正则过滤
import pstats

stats = pstats.Stats('analysis.prof')
stats.strip_dirs()

# 多级排序：先按cumtime降序，再按ncalls降序
stats.sort_stats('cumtime', 'ncalls')

# 按正则表达式过滤，只显示包含特定模式的函数
print("=== 包含'inner'的函数 ===")
stats.print_stats('inner')

# 按正则表达式过滤
import re
print("\n=== 包含'inner'或'outer'的函数 ===")
stats.print_stats(re.compile(r'(inner|outer)'))

# 打印每个函数调用的子函数
print("\n=== inner函数的被调用关系 ===")
stats.sort_stats('cumtime')
stats.print_callees('inner')

# 使用SortKey枚举（Python 3.7+）
import pstats
from pstats import SortKey

stats = pstats.Stats('analysis.prof')

# 使用SortKey枚举替代字符串参数
stats.sort_stats(SortKey.CUMULATIVE, SortKey.NCALLS)

# 输出完整统计报告
print("=" * 60)
print("完整性能统计报告")
print("=" * 60)
stats.print_stats()

# 调用者信息：看哪些函数调用了目标函数
print("\n" + "=" * 60)
print("调用者信息 (Callers)")
print("=" * 60)
stats.print_callers()

# 被调用者信息：看目标函数调用了哪些子函数
print("\n" + "=" * 60)
print("被调用者信息 (Callees)")
print("=" * 60)
stats.print_callees()

五、可视化分析

文本形式的性能报告虽然信息完整，但对于复杂的调用关系往往不够直观，特别是当程序有成百上千个函数时，逐行阅读统计输出效率很低。Gprof2Dot是一个将cProfile输出转换为Graphviz DOT格式的工具，然后可以使用Graphviz引擎生成函数调用图，节点的大小和颜色深浅可以直观反映各函数的耗时情况。SnakeViz是一个基于Web的交互式可视化工具，直接读取cProfile的二进制输出文件，提供Icicle图（自上而下层次结构）和Sunburst图（径向层次结构）两种展示方式，支持鼠标悬停查看详细信息、缩放聚焦特定函数。火焰图（Flame Graph）以堆叠的矩形展示函数调用栈，每个矩形的宽度正比于函数的执行时间，X轴表示时间分布，Y轴表示调用栈深度，是系统级性能分析中常用的可视化手段。调用图中红色节点通常代表高耗时函数，是优化工作的首要目标。

# 使用Gprof2Dot生成调用图
"""
# 安装依赖
pip install gprof2dot
# Windows需要安装Graphviz并添加到系统PATH

# 1. 首先生成cProfile数据
python -m cProfile -o output.prof your_script.py

# 2. 转换为PNG图片
gprof2dot -f pstats output.prof | dot -Tpng -o callgraph.png

# 3. 输出SVG格式（适合网页嵌入）
gprof2dot -f pstats output.prof | dot -Tsvg -o callgraph.svg

# 4. 只显示耗时超过50%的节点，边权重不低于0.1
gprof2dot -f pstats -n 0.5 -e 0.1 output.prof -o callgraph.dot
"""

# 使用SnakeViz进行交互式可视化
import cProfile
import tempfile
import os

def performance_test():
    result = []
    for i in range(10000):
        result.append(sum(range(i)))
    return sum(result)

# 生成分析数据
profiler = cProfile.Profile()
profiler.enable()
result = performance_test()
profiler.disable()

# 保存到临时文件供SnakeViz加载
with tempfile.NamedTemporaryFile(
    suffix='.prof', prefix='profile_', delete=False
) as f:
    profiler.dump_stats(f.name)
    print(f"分析文件路径: {f.name}")
    print("运行命令查看可视化:")
    print(f"  snakeviz {f.name}")

# SnakeViz会在浏览器中打开http://localhost:8080
# 提供Icicle和Sunburst两种视图

# 生成火焰图友好数据
import cProfile
import io
import pstats

def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)

def compute_series():
    results = []
    for i in range(15):
        results.append(factorial(i))
    return results

# 进行性能分析
profiler = cProfile.Profile()
profiler.enable()
compute_series()
profiler.disable()

# 输出文本格式统计，可用于火焰图工具
s = io.StringIO()
ps = pstats.Stats(profiler, stream=s)
ps.sort_stats('cumtime')
ps.print_stats()
print(s.getvalue())

# 解读调用图的关键原则：
# 1. 红色/深色节点 = 高耗时函数，优先优化
# 2. 宽节点 = tottime高，函数自身代码耗时多
# 3. 多入边节点 = 被频繁调用，考虑缓存结果
# 4. 深调用栈 = 可能有过度抽象，考虑扁平化

六、性能热点识别

性能热点识别是性能优化最关键的步骤，找到正确的热点才能事半功倍。cumtime（累计时间）反映函数及其所有子调用的总耗时，适合从高层面定位瓶颈在哪个调用分支；tottime（内部时间）仅反映函数自身代码的耗时，排除子函数调用，适合定位具体的耗时函数。当某个函数的cumtime远大于tottime时，说明该函数的主要耗时在子函数调用中，需要深入子函数进一步分析。ncalls（调用次数）也是一个重要指标，如果一个函数被调用了数百万次，即使单次耗时很少，累积起来也可能成为性能瓶颈。对于递归函数，cProfile会区分原始调用次数（primitive calls，显示为斜杠前的数字）和总调用次数（斜杠后的数字），前者记录最外层直接调用次数，后者记录包括递归在内的所有调用。减少不必要的函数调用、利用缓存避免重复计算、优化高频调用路径是性能优化的常见方向。

# cumtime与tottime综合分析
import time

def fast_function():
    """纯内部计算，无子函数调用"""
    return sum(i * i for i in range(100000))

def function_delegating_work():
    """委托子函数完成工作，tottime小但cumtime大"""
    time.sleep(0.02)  # 自身少量耗时
    return fast_function()  # 子函数耗时

def top_level():
    """顶层函数，cumtime包含所有子调用"""
    for _ in range(5):
        function_delegating_work()

import cProfile
cProfile.run('top_level()', sort='cumtime')
# 观察要点：
# - top_level: cumtime最大，tottime极小 -> 瓶颈在子调用链
# - function_delegating_work: cumtime大，tottime小 -> 瓶颈进一步下移
# - fast_function: cumtime=tottime -> 此处是实际耗时函数

# ncalls分析：递归函数的调用次数
import cProfile

def recursive_fib(n):
    """普通递归斐波那契 — 大量重复调用"""
    if n <= 1:
        return n
    return recursive_fib(n-1) + recursive_fib(n-2)

def memoized_fib(n, memo=None):
    """带缓存的斐波那契 — 避免重复计算，调用次数极少"""
    if memo is None:
        memo = {}
    if n in memo:
        return memo[n]
    if n <= 1:
        return n
    memo[n] = memoized_fib(n-1, memo) + memoized_fib(n-2, memo)
    return memo[n]

print("=== 递归版本（ncalls呈指数增长）===")
cProfile.run('recursive_fib(30)', sort='cumtime')
# ncalls列会显示如 "2692537/1" 表示总调用269万次，原始调用1次

print("\n=== 缓存版本（ncalls大幅减少）===")
cProfile.run('memoized_fib(30)', sort='cumtime')
# ncalls列会显示极小的数字

# 原生调用(primitive)与总调用解读
import cProfile

def factorial(n):
    """递归阶乘 - 观察ncalls的斜杠表示法"""
    if n <= 1:
        return 1
    return n * factorial(n-1)

# 输出中ncalls列格式为 "total/primitive"
# 例如 "6/1" 表示总调用6次，原始调用1次
cProfile.run('factorial(5)', sort='cumtime')

# 对于非递归函数，ncalls = primitive calls
# 对于递归函数，ncalls = primitive calls + 递归调用次数
# 如果ncalls的斜杠左侧远大于右侧，说明递归深度大
# 此时应考虑尾递归优化、迭代替代或记忆化缓存

七、解析器选择：cProfile vs profile

Python标准库提供了两个性能分析模块：cProfile和profile，它们在实现原理和性能开销上有显著差异。cProfile是用C语言实现的C扩展模块，运行时开销较小（通常为10-20%的性能影响），适合对性能敏感的应用程序进行分析，是官方推荐的性能分析工具。profile是纯Python实现，运行时开销较大（可能达到100%以上甚至更高），但其纯Python特性使得扩展和定制更加方便，适合需要自定义分析逻辑、修改分析行为的场景。在Python 3.8+中，cProfile已经支持线程安全，可以在多线程程序中对各个线程分别进行分析。对于子进程分析，可以结合multiprocessing模块和Profile API实现跨进程性能分析。此外，通过自定义profile装饰器，可以方便地将性能分析能力集成到现有代码中，实现一键分析任意函数的耗时分布。

# cProfile vs profile 性能开销对比
import cProfile
import profile
import time

def compute():
    total = 0
    for i in range(500000):
        total += i * i
    return total

# 使用cProfile（C扩展实现，低开销）
start = time.time()
cProfile.run('compute()', sort='cumtime')
cprofile_time = time.time() - start

# 使用profile（纯Python实现，高开销）
start = time.time()
profile.run('compute()', sort='cumtime')
profile_time = time.time() - start

print(f"\ncProfile总耗时: {cprofile_time:.3f}s")
print(f"profile总耗时: {profile_time:.3f}s")
print(f"profile开销是cProfile的 {profile_time/cprofile_time:.1f} 倍")
# 建议：日常开发用cProfile，仅当需要定制分析行为时用profile

# 多线程环境下的性能分析
import cProfile
import threading

def worker(thread_id):
    """工作线程函数"""
    total = 0
    for i in range(200000):
        total += i * thread_id
    return total

def threaded_work():
    threads = []
    for i in range(4):
        t = threading.Thread(target=worker, args=(i,))
        threads.append(t)
        t.start()

    for t in threads:
        t.join()

# cProfile支持线程安全（Python 3.8+）
profiler = cProfile.Profile()
profiler.enable()
threaded_work()
profiler.disable()
profiler.print_stats(sort='cumtime')

# 注意：cProfile默认汇总所有线程的数据
# 如需按线程单独分析，需在每个线程中单独创建Profile实例

# 自定义profile装饰器
import cProfile
import pstats
import functools
import io

def profile_decorator(sort='cumtime', lines=30):
    """可配置的性能分析装饰器

    用法：
        @profile_decorator(sort='time', lines=20)
        def my_function():
            ...
    """
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            profiler = cProfile.Profile()
            profiler.enable()
            result = func(*args, **kwargs)
            profiler.disable()

            # 输出统计报告
            s = io.StringIO()
            ps = pstats.Stats(profiler, stream=s).sort_stats(sort)
            ps.print_stats(lines)
            print(f"\n--- 性能分析: {func.__name__} ---")
            print(s.getvalue())
            return result
        return wrapper
    return decorator

@profile_decorator(sort='time', lines=15)
def data_pipeline():
    """数据处理管线"""
    result = []
    for i in range(10000):
        result.append(sum(range(i)))
    return sum(result)

data_pipeline()

八、基准测试集成

性能分析（Profiling）和基准测试（Benchmarking）在性能优化工作中相辅相成，缺一不可。Python内置的timeit模块适合对微小代码片段进行精确计时，通过多次运行取最小/平均时间，但无法提供函数调用层次的信息。pytest-benchmark是pytest生态中的基准测试插件，可以在测试框架中方便地集成性能测试，支持多次运行取统计值、比较历史性能数据、检测性能回归。在持续集成（CI）流程中，可以配置性能回归检测门限，当性能下降超过指定阈值时自动标记构建失败。持续性能分析（Continuous Profiling）是一种新兴的工程实践，在生产环境中持续采集性能数据，及时发现和定位由于代码变更导致的性能退化，将性能问题消灭在萌芽阶段。

# timeit vs cProfile 对比使用
import timeit
import cProfile

def test_list_comp():
    """列表推导式版本"""
    return [i ** 2 for i in range(1000)]

def test_for_loop():
    """for循环版本"""
    result = []
    for i in range(1000):
        result.append(i ** 2)
    return result

# timeit：精确计时，适合微基准测试
t1 = timeit.timeit(test_list_comp, number=10000)
t2 = timeit.timeit(test_for_loop, number=10000)
print(f"列表推导式: {t1:.4f}s")
print(f"for循环:    {t2:.4f}s")
print(f"性能比: {t2/t1:.2f}x （列表推导式更快）")

# cProfile：分析调用结构（但微函数的分析结果不够精确）
print("\n=== cProfile整体分析 ===")
cProfile.run('test_list_comp()', sort='cumtime')

# pytest-benchmark 集成示例
"""
# 安装
pip install pytest-benchmark

# benchmark_test.py 文件内容：

import pytest

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

def fibonacci_memo(n, memo=None):
    if memo is None:
        memo = {}
    if n in memo:
        return memo[n]
    if n <= 1:
        return n
    memo[n] = fibonacci_memo(n-1, memo) + fibonacci_memo(n-2, memo)
    return memo[n]

@pytest.mark.benchmark
def test_fibonacci_recursive(benchmark):
    result = benchmark(fibonacci, 30)
    assert result == 832040

@pytest.mark.benchmark
def test_fibonacci_memoized(benchmark):
    result = benchmark(fibonacci_memo, 30)
    assert result == 832040

# 运行命令：
#   pytest benchmark_test.py --benchmark-only
#   pytest benchmark_test.py --benchmark-save=baseline
#   pytest benchmark_test.py --benchmark-compare=baseline
"""

# 快速性能回归检查实用函数
import cProfile
import pstats
import io
import time

def performance_check(func, *args, threshold=1.0, **kwargs):
    """对目标函数进行一次Profiling + Benchmarking

    Args:
        func: 待分析的函数
        threshold: 基准时间阈值（秒），超过时发出警告
    Returns:
        elapsed: 实际执行时间
    """
    # Benchmarking
    start = time.perf_counter()
    result = func(*args, **kwargs)
    elapsed = time.perf_counter() - start

    print(f"执行时间: {elapsed:.4f}s", end='')
    if elapsed > threshold:
        print(" ⚠ 超过阈值!")
    else:
        print(" ✓")

    # Profiling
    profiler = cProfile.Profile()
    profiler.enable()
    func(*args, **kwargs)
    profiler.disable()

    s = io.StringIO()
    ps = pstats.Stats(profiler, stream=s).sort_stats('cumtime')
    ps.print_stats(15)
    print(s.getvalue())

    return elapsed

# 使用示例
def process_data(size=5000):
    return [sum(range(i)) for i in range(size)]

print("=" * 50)
print("性能回归检查")
print("=" * 50)
performance_check(process_data, 5000, threshold=0.5)

九、实战案例

本节通过三个实战案例展示cProfile在实际项目中的应用。Web请求性能剖析展示了如何分析Web应用接口的响应时间分布，定位高延迟的端点处理函数，特别是序列化和数据聚合类操作的优化。数据处理管线优化展示了在数据科学和ETL场景中如何使用cProfile发现数据处理瓶颈，对比不同实现方式（列表推导式 vs for循环、内置函数 vs 手动实现）的性能差异。数据库查询性能分析展示了在ORM框架场景中通过cProfile分析N+1查询问题、识别不必要的重复查询，从而优化数据库访问性能。每个案例都包含了完整的分析过程、关键指标解读和优化建议。

# 案例1: Web请求性能剖析
import cProfile
import pstats
import io

class RequestHandler:
    """模拟Web请求处理器"""

    def handle_request(self, endpoint, params):
        if endpoint == '/api/users':
            return self.get_users()
        elif endpoint == '/api/orders':
            return self.get_orders()
        elif endpoint == '/api/dashboard':
            return self.get_dashboard()

    def get_users(self):
        # 模拟数据库查询 + 序列化
        users = [{'id': i, 'name': f'user_{i}'} for i in range(500)]
        return self.serialize_users(users)

    def serialize_users(self, users):
        # 逐个序列化为大写
        return [{'id': u['id'], 'name': u['name'].upper()} for u in users]

    def get_orders(self):
        orders = [{'id': i, 'amount': i * 10.5} for i in range(300)]
        return self.calculate_totals(orders)

    def calculate_totals(self, orders):
        for order in orders:
            order['tax'] = order['amount'] * 0.13
            order['total'] = order['amount'] + order['tax']
        return orders

    def get_dashboard(self):
        # 同时获取用户和订单数据进行聚合
        data = {
            'users': len(self.get_users()),
            'orders': len(self.get_orders())
        }
        return self.aggregate_stats(data)

    def aggregate_stats(self, data):
        return {k: v for k, v in data.items()}

# 模拟100次并发请求
handler = RequestHandler()
profiler = cProfile.Profile()
profiler.enable()

for _ in range(100):
    handler.handle_request('/api/users', {})
    handler.handle_request('/api/orders', {})

profiler.disable()

s = io.StringIO()
ps = pstats.Stats(profiler, stream=s).sort_stats('cumtime')
ps.print_stats(15)
print("=== Web请求性能分析 ===")
print(s.getvalue())
# 观察：get_users和get_orders各耗时多少？
# 哪个序列化方法是热点？

# 案例2: 数据处理管线优化
import cProfile
import pstats
import io

class DataPipeline:
    """数据处理管线 — 对比优化前后性能"""

    def __init__(self, data_size=10000):
        self.data = list(range(data_size))

    def pipeline_v1(self):
        """未优化版本：手动循环"""
        result = self.filter_data(self.data)
        result = self.transform_data(result)
        result = self.aggregate_data(result)
        return result

    def pipeline_v2(self):
        """优化版本：内置函数+生成器"""
        result = self.filter_data_fast(self.data)
        result = self.transform_data_fast(result)
        result = self.aggregate_data_fast(result)
        return result

    def filter_data(self, data):
        return [x for x in data if x % 2 == 0]

    def filter_data_fast(self, data):
        return list(filter(lambda x: x % 2 == 0, data))

    def transform_data(self, data):
        result = []
        for x in data:
            result.append(x * 2 + 1)
        return result

    def transform_data_fast(self, data):
        return [x * 2 + 1 for x in data]

    def aggregate_data(self, data):
        total = 0
        for x in data:
            total += x
        return total / len(data) if data else 0

    def aggregate_data_fast(self, data):
        return sum(data) / len(data) if data else 0

pipeline = DataPipeline(20000)

# 分析未优化版本
profiler = cProfile.Profile()
profiler.enable()
pipeline.pipeline_v1()
profiler.disable()

s = io.StringIO()
ps = pstats.Stats(profiler, stream=s).sort_stats('cumtime')
ps.print_stats(15)
print("=== 未优化版本 ===")
print(s.getvalue())

# 分析优化版本
profiler2 = cProfile.Profile()
profiler2.enable()
pipeline.pipeline_v2()
profiler2.disable()

s2 = io.StringIO()
ps2 = pstats.Stats(profiler2, stream=s2).sort_stats('cumtime')
ps2.print_stats(15)
print("\n=== 优化版本 ===")
print(s2.getvalue())

# 案例3: 数据库N+1查询问题分析（模拟）
import cProfile
import pstats
import io

class DatabaseSimulator:
    """模拟数据库操作"""

    def __init__(self):
        self.cache = {}

    def slow_query(self, table, record_id):
        """模拟无索引的全表扫描查询"""
        result = [i for i in range(10000) if i == record_id]
        return result

    def n_plus_one_query(self, count=50):
        """N+1查询问题：1次主查询 + N次关联查询"""
        # 1次主查询获取所有主记录
        main_records = [{'id': i} for i in range(count)]

        # N次关联查询：对每条记录做一次额外查询
        for record in main_records:
            # 每次查询都需要独立的扫描操作
            sub_data = [
                self.slow_query('sub_table', record['id'])
                for _ in range(10)
            ]
        return main_records

    def batched_query(self, count=50):
        """优化版本：批量查询替代逐条查询"""
        main_records = [{'id': i} for i in range(count)]
        # 一次批量查询获取所有关联数据
        batch_data = {
            i: [self.slow_query('sub_table', i)]
            for i in range(count)
        }
        return main_records

db = DatabaseSimulator()

# 分析N+1问题版本
profiler = cProfile.Profile()
profiler.enable()
db.n_plus_one_query(30)
profiler.disable()

s = io.StringIO()
ps = pstats.Stats(profiler, stream=s).sort_stats('cumtime')
ps.print_stats(15)
print("=== N+1查询问题分析 ===")
print(s.getvalue())
# 观察：slow_query被调用了多少次？
# 优化思路：使用IN查询一次批量加载所有关联记录
# 或者使用懒加载+缓存减少重复查询