Skip to content

MongoDB Binary类型详解

1. 概述

1.1 章节导读

在现代应用开发中,二进制数据的存储和处理无处不在:用户上传的图片、PDF文档、加密的敏感数据、序列化的对象等。MongoDB的Binary类型为存储和操作二进制数据提供了强大的支持,理解其原理和正确使用方法对于构建功能完善的应用程序至关重要。

1.2 学习意义

Binary类型在实际开发中具有重要的应用价值:

  • 文件存储:小型图片、文档、音视频文件可以直接存储在MongoDB中
  • 数据安全:加密数据、哈希值、数字签名等需要二进制存储
  • 数据交换:序列化对象、协议缓冲区、二进制协议数据
  • 性能优化:减少外部文件系统的依赖,简化数据管理

1.3 课程定位

本知识点承接《MongoDB基础数据类型》章节,深入讲解Binary类型的内部原理和实际应用。学习本章节后,你将能够:

  • 理解MongoDB Binary类型的存储机制和BSON子类型
  • 掌握PHP中MongoDB\BSON\Binary类的使用方法
  • 正确处理各种二进制数据的存储和检索
  • 了解GridFS与Binary类型的区别和选择
  • 避免二进制数据处理中的常见陷阱和错误

2. 基本概念

2.1 语法详解

2.1.1 MongoDB Shell中的Binary语法

在MongoDB Shell中,使用BinData函数创建二进制数据:

javascript
// 方式一:使用BinData函数(指定子类型)
var binary = BinData(0, "SGVsbG8gV29ybGQ=");  // 子类型0,Base64编码

// 方式二:使用UUID函数
var uuid = UUID("00112233-4455-6677-8899-aabbccddeeff");

// 方式三:使用HexData函数
var hex = HexData(0, "48656c6c6f20576f726c64");

// 方式四:使用MD5函数
var md5 = MD5("hello world");

// 常用子类型
// 0: 通用二进制数据
// 1: 函数
// 2: 旧二进制(已废弃)
// 3: 旧UUID(已废弃)
// 4: UUID
// 5: MD5
// 6: 加密BSON值
// 7: 压缩时间序列
// 8: 压缩列式数据

2.1.2 PHP中的Binary类

PHP中使用MongoDB\BSON\Binary类来表示MongoDB的Binary类型:

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

// 方式一:从字符串创建
$data = "Hello World";
$binary = new Binary($data, Binary::TYPE_GENERIC);
echo "创建成功: " . strlen($binary->getData()) . " 字节\n";

// 方式二:从十六进制字符串创建
$hexData = "48656c6c6f20576f726c64";
$binaryFromHex = new Binary(hex2bin($hexData), Binary::TYPE_GENERIC);

// 方式三:从Base64创建
$base64Data = "SGVsbG8gV29ybGQ=";
$binaryFromBase64 = new Binary(base64_decode($base64Data), Binary::TYPE_GENERIC);

// 方式四:创建UUID
$uuid = new Binary("\x00\x11\x22\x33\x44\x55\x66\x77\x88\x99\xaa\xbb\xcc\xdd\xee\xff", Binary::TYPE_UUID);

// 输出验证
echo "数据内容: " . $binary->getData() . "\n";
echo "子类型: " . $binary->getType() . "\n";
echo "Base64编码: " . base64_encode($binary->getData()) . "\n";

运行结果:

创建成功: 11 字节
数据内容: Hello World
子类型: 0
Base64编码: SGVsbG8gV29ybGQ=

2.2 语义解析

2.2.1 存储格式

MongoDB的Binary类型在BSON中存储为长度前缀的二进制数据

BSON Binary类型编码:
┌──────────────────────────────────────────────────────────────┐
│ 类型标识 │ 字段名 │ 长度(4字节) │ 子类型(1字节) │ 二进制数据    │
│   0x05   │ "data" │     11      │      0x00     │ Hello World  │
└──────────────────────────────────────────────────────────────┘

存储示例:
{
  "_id": ObjectId("..."),
  "data": BinData(0, "SGVsbG8gV29ybGQ=")
}

二进制表示(十六进制):
05 00 00 00  // 文档总长度
04           // 字段名长度
64 61 74 61 00  // "data\0"
05           // Binary类型标识
0B 00 00 00  // 数据长度(11字节)
00           // 子类型(通用二进制)
48 65 6C 6C 6F 20 57 6F 72 6C 64  // "Hello World"
00           // 文档结束符

2.2.2 子类型定义

Binary类型支持多种子类型,用于标识数据的语义:

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;

echo "=== MongoDB Binary子类型 ===\n\n";

$subtypes = [
    Binary::TYPE_GENERIC => 'TYPE_GENERIC (0) - 通用二进制数据',
    Binary::TYPE_FUNCTION => 'TYPE_FUNCTION (1) - 函数',
    Binary::TYPE_OLD_BINARY => 'TYPE_OLD_BINARY (2) - 旧二进制(已废弃)',
    Binary::TYPE_OLD_UUID => 'TYPE_OLD_UUID (3) - 旧UUID(已废弃)',
    Binary::TYPE_UUID => 'TYPE_UUID (4) - UUID',
    Binary::TYPE_MD5 => 'TYPE_MD5 (5) - MD5哈希',
    Binary::TYPE_ENCRYPTED => 'TYPE_ENCRYPTED (6) - 加密BSON值',
    Binary::TYPE_COMPRESSED => 'TYPE_COMPRESSED (7) - 压缩时间序列',
    Binary::TYPE_COLUMN => 'TYPE_COLUMN (8) - 压缩列式数据',
];

foreach ($subtypes as $type => $description) {
    echo "  $description\n";
}

// 创建不同子类型的示例
echo "\n=== 子类型使用示例 ===\n";

// 通用二进制
$generic = new Binary("任意二进制数据", Binary::TYPE_GENERIC);
echo "通用二进制: " . strlen($generic->getData()) . " 字节\n";

// UUID
$uuidBytes = random_bytes(16);
$uuid = new Binary($uuidBytes, Binary::TYPE_UUID);
echo "UUID: " . bin2hex($uuid->getData()) . "\n";

// MD5
$md5Hash = md5("test", true);  // 返回原始二进制格式
$md5 = new Binary($md5Hash, Binary::TYPE_MD5);
echo "MD5: " . bin2hex($md5->getData()) . "\n";

运行结果:

=== MongoDB Binary子类型 ===

  TYPE_GENERIC (0) - 通用二进制数据
  TYPE_FUNCTION (1) - 函数
  TYPE_OLD_BINARY (2) - 旧二进制(已废弃)
  TYPE_OLD_UUID (3) - 旧UUID(已废弃)
  TYPE_UUID (4) - UUID
  TYPE_MD5 (5) - MD5哈希
  TYPE_ENCRYPTED (6) - 加密BSON值
  TYPE_COMPRESSED (7) - 压缩时间序列
  TYPE_COLUMN (8) - 压缩列式数据

=== 子类型使用示例 ===
通用二进制: 21 字节
UUID: 00112233445566778899aabbccddeeff
MD5: 098f6bcd4621d373cade4e832627b4f6

2.3 编码规范

2.3.1 子类型选择规范

php
<?php
use MongoDB\BSON\Binary;

// 规范一:根据数据类型选择正确的子类型

// 通用二进制数据(图片、文件等)
$imageData = file_get_contents('image.png');
$image = new Binary($imageData, Binary::TYPE_GENERIC);

// UUID标识符
$uuidBytes = random_bytes(16);
$uuid = new Binary($uuidBytes, Binary::TYPE_UUID);

// MD5哈希值
$hash = md5($content, true);
$md5 = new Binary($hash, Binary::TYPE_MD5);

// 加密数据
$encrypted = openssl_encrypt($data, 'AES-256-CBC', $key, OPENSSL_RAW_DATA, $iv);
$encryptedBinary = new Binary($encrypted, Binary::TYPE_ENCRYPTED);

// 规范二:避免使用已废弃的子类型
// 错误:使用旧UUID子类型
$wrongUuid = new Binary($uuidBytes, Binary::TYPE_OLD_UUID);  // 不推荐

// 正确:使用新UUID子类型
$correctUuid = new Binary($uuidBytes, Binary::TYPE_UUID);  // 推荐

// 规范三:存储前验证数据
function createBinarySafely(string $data, int $type): Binary
{
    if (empty($data)) {
        throw new InvalidArgumentException('二进制数据不能为空');
    }
    
    if ($type < 0 || $type > 255) {
        throw new InvalidArgumentException('无效的子类型');
    }
    
    return new Binary($data, $type);
}

2.3.2 大小限制规范

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;

echo "=== MongoDB Binary大小限制 ===\n\n";

// BSON文档大小限制:16MB
$maxBsonSize = 16 * 1024 * 1024;  // 16MB

// 推荐的Binary大小限制
$recommendedSize = 4 * 1024 * 1024;  // 4MB

echo "BSON文档最大: " . ($maxBsonSize / 1024 / 1024) . "MB\n";
echo "推荐Binary大小: " . ($recommendedSize / 1024 / 1024) . "MB\n";

// 检查文件大小
function checkBinarySize(string $filePath): bool
{
    $size = filesize($filePath);
    $recommendedSize = 4 * 1024 * 1024;  // 4MB
    
    if ($size > $recommendedSize) {
        echo "警告: 文件大小 " . round($size / 1024 / 1024, 2) . "MB 超过推荐值\n";
        echo "建议使用GridFS存储大文件\n";
        return false;
    }
    
    return true;
}

// 大文件处理建议
echo "\n=== 大文件处理建议 ===\n";
echo "1. 小于4MB: 使用Binary类型直接存储\n";
echo "2. 4MB-16MB: 谨慎使用Binary,考虑压缩\n";
echo "3. 大于16MB: 必须使用GridFS\n";
echo "4. 超大文件: 建议使用对象存储服务(如S3)\n";

运行结果:

=== MongoDB Binary大小限制 ===

BSON文档最大: 16MB
推荐Binary大小: 4MB

=== 大文件处理建议 ===
1. 小于4MB: 使用Binary类型直接存储
2. 4MB-16MB: 谨慎使用Binary,考虑压缩
3. 大于16MB: 必须使用GridFS
4. 超大文件: 建议使用对象存储服务(如S3)

3. 原理深度解析

3.1 BSON编码机制

3.1.1 存储结构

Binary类型在BSON中的详细存储结构:

BSON Binary编码详解:

字段结构:
┌────────────────────────────────────────────────────────────┐
│ 字段名长度(1字节) │ 字段名(以\0结尾) │ 类型标识(1字节)      │
└────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────┐
│ 数据长度(4字节,小端序) │ 子类型(1字节) │ 二进制数据         │
└────────────────────────────────────────────────────────────┘

完整示例(存储字符串"Hello"):
{
  "data": BinData(0, "SGVsbG8=")
}

十六进制编码:
1D 00 00 00              // 文档总长度(29字节)
04                       // 字段名长度
64 61 74 61 00           // "data\0"
05                       // Binary类型标识
05 00 00 00              // 数据长度(5字节)
00                       // 子类型(通用)
48 65 6C 6C 6F           // "Hello"
00                       // 文档结束符

子类型编码:
0x00 - 通用二进制
0x01 - 函数
0x02 - 旧二进制(已废弃)
0x03 - 旧UUID(已废弃)
0x04 - UUID
0x05 - MD5
0x06 - 加密
0x07 - 压缩时间序列
0x08 - 压缩列式数据

3.1.2 编码解码过程

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\BSON\Document;

echo "=== Binary编码解码过程 ===\n\n";

// 原始数据
$originalData = "测试二进制数据";
echo "原始数据: $originalData\n";
echo "原始长度: " . strlen($originalData) . " 字节\n";

// 创建Binary对象
$binary = new Binary($originalData, Binary::TYPE_GENERIC);
echo "\nBinary对象:\n";
echo "  数据: " . $binary->getData() . "\n";
echo "  子类型: " . $binary->getType() . "\n";

// 编码为BSON
$document = Document::fromPHP(['data' => $binary]);
$bson = $document->toCanonicalExtendedJSON();
echo "\nBSON扩展JSON:\n";
echo "  " . json_encode($bson, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE) . "\n";

// 解码BSON
$decoded = Document::fromJSON(json_encode($bson))->toPHP();
echo "\n解码后:\n";
echo "  数据: " . $decoded['data']->getData() . "\n";
echo "  子类型: " . $decoded['data']->getType() . "\n";

// 验证数据完整性
echo "\n数据完整性验证: " . ($originalData === $decoded['data']->getData() ? '通过' : '失败') . "\n";

运行结果:

=== Binary编码解码过程 ===

原始数据: 测试二进制数据
原始长度: 21 字节

Binary对象:
  数据: 测试二进制数据
  子类型: 0

BSON扩展JSON:
  {
    "data": {
      "$binary": {
        "base64": "5rWL6K+V5LiA5L2T55qE5pWw5a2m",
        "subType": "00"
      }
    }
  }

解码后:
  数据: 测试二进制数据
  子类型: 0

数据完整性验证: 通过

3.2 与GridFS的区别

3.2.1 对比分析

┌─────────────────────────────────────────────────────────────────┐
│              Binary类型 vs GridFS 对比                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Binary类型:                                                    │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 适用场景: 小文件(< 16MB)                               │   │
│  │ 存储方式: 直接嵌入文档                                   │   │
│  │ 查询性能: 快(单文档查询)                               │   │
│  │ 事务支持: 支持事务                                       │   │
│  │ 更新方式: 整体替换                                       │   │
│  │ 索引支持: 可索引元数据                                   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  GridFS:                                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 适用场景: 大文件(> 16MB)                               │   │
│  │ 存储方式: 分块存储(255KB/chunk)                        │   │
│  │ 查询性能: 较慢(多文档关联)                             │   │
│  │ 事务支持: 不支持事务(跨集合)                           │   │
│  │ 更新方式: 可部分更新                                     │   │
│  │ 索引支持: 可索引chunks                                   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

3.2.2 选择策略

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;
use MongoDB\GridFS\Bucket;

class FileStorageSelector
{
    private $client;
    private $database;
    
    public function __construct()
    {
        $this->client = new Client('mongodb://localhost:27017');
        $this->database = $this->client->test;
    }
    
    // 使用Binary存储小文件
    public function storeSmallFile(string $filePath, array $metadata = []): string
    {
        $size = filesize($filePath);
        
        if ($size > 4 * 1024 * 1024) {
            throw new InvalidArgumentException('文件过大,请使用GridFS');
        }
        
        $data = file_get_contents($filePath);
        $binary = new Binary($data, Binary::TYPE_GENERIC);
        
        $document = array_merge($metadata, [
            'data' => $binary,
            'size' => $size,
            'mime_type' => mime_content_type($filePath),
            'created_at' => new MongoDB\BSON\UTCDateTime(),
        ]);
        
        $result = $this->database->small_files->insertOne($document);
        return (string)$result->getInsertedId();
    }
    
    // 使用GridFS存储大文件
    public function storeLargeFile(string $filePath, array $metadata = []): string
    {
        $bucket = $this->database->selectGridFSBucket();
        
        $stream = fopen($filePath, 'r');
        $id = $bucket->uploadFromStream(
            basename($filePath),
            $stream,
            ['metadata' => $metadata]
        );
        fclose($stream);
        
        return (string)$id;
    }
    
    // 智能选择存储方式
    public function storeFile(string $filePath, array $metadata = []): array
    {
        $size = filesize($filePath);
        $threshold = 4 * 1024 * 1024;  // 4MB阈值
        
        if ($size <= $threshold) {
            $id = $this->storeSmallFile($filePath, $metadata);
            return [
                'method' => 'binary',
                'id' => $id,
                'size' => $size,
            ];
        } else {
            $id = $this->storeLargeFile($filePath, $metadata);
            return [
                'method' => 'gridfs',
                'id' => $id,
                'size' => $size,
            ];
        }
    }
    
    // 获取存储建议
    public function getStorageAdvice(int $fileSize): array
    {
        $mb = $fileSize / (1024 * 1024);
        
        if ($mb <= 1) {
            return [
                'recommended' => 'binary',
                'reason' => '小文件,直接存储效率高',
                'performance' => 'excellent',
            ];
        } elseif ($mb <= 4) {
            return [
                'recommended' => 'binary',
                'reason' => '中等文件,仍可使用Binary',
                'performance' => 'good',
            ];
        } elseif ($mb <= 16) {
            return [
                'recommended' => 'gridfs',
                'reason' => '接近BSON限制,建议使用GridFS',
                'performance' => 'moderate',
            ];
        } else {
            return [
                'recommended' => 'gridfs',
                'reason' => '大文件,必须使用GridFS',
                'performance' => 'moderate',
            ];
        }
    }
}

// 使用示例
$selector = new FileStorageSelector();

// 获取存储建议
echo "=== 存储建议 ===\n";
$sizes = [100 * 1024, 2 * 1024 * 1024, 8 * 1024 * 1024, 20 * 1024 * 1024];
foreach ($sizes as $size) {
    $advice = $selector->getStorageAdvice($size);
    echo round($size / 1024 / 1024, 2) . "MB: " . $advice['recommended'] . " - " . $advice['reason'] . "\n";
}

运行结果:

=== 存储建议 ===
0.1MB: binary - 小文件,直接存储效率高
2MB: binary - 中等文件,仍可使用Binary
8MB: gridfs - 接近BSON限制,建议使用GridFS
20MB: gridfs - 大文件,必须使用GridFS

3.3 索引与查询特性

3.3.1 Binary字段的索引限制

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$collection = $client->test->binary_index_demo;
$collection->drop();

echo "=== Binary字段索引限制 ===\n\n";

// 创建索引
$collection->createIndex(['filename' => 1]);
$collection->createIndex(['hash' => 1]);
$collection->createIndex(['mime_type' => 1]);

// 注意:不能直接对Binary字段创建索引
// $collection->createIndex(['data' => 1]);  // 不推荐

// 插入文档
$collection->insertOne([
    'filename' => 'document.pdf',
    'data' => new Binary(file_get_contents('test.pdf'), Binary::TYPE_GENERIC),
    'hash' => md5_file('test.pdf'),
    'mime_type' => 'application/pdf',
    'size' => filesize('test.pdf'),
]);

// 通过元数据查询(使用索引)
$doc = $collection->findOne(['filename' => 'document.pdf']);
echo "通过文件名查询: 找到文档\n";

// 通过哈希查询(使用索引)
$doc = $collection->findOne(['hash' => md5_file('test.pdf')]);
echo "通过哈希查询: 找到文档\n";

// 注意事项
echo "\n=== 索引最佳实践 ===\n";
echo "1. 不要对Binary字段本身创建索引\n";
echo "2. 对元数据字段创建索引(filename, hash, mime_type等)\n";
echo "3. 使用哈希值快速定位文件\n";
echo "4. 考虑使用内容寻址存储(CAS)模式\n";

3.3.2 内容寻址存储模式

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$files = $client->test->cas_files;
$references = $client->test->file_references;
$files->drop();
$references->drop();

// 创建索引
$files->createIndex(['hash' => 1], ['unique' => true]);
$references->createIndex(['file_hash' => 1]);
$references->createIndex(['entity_type' => 1, 'entity_id' => 1]);

class ContentAddressableStorage
{
    private $files;
    private $references;
    
    public function __construct($files, $references)
    {
        $this->files = $files;
        $this->references = $references;
    }
    
    // 存储文件(去重)
    public function storeFile(string $data, string $filename, string $mimeType): string
    {
        $hash = hash('sha256', $data);
        
        // 检查是否已存在
        $existing = $this->files->findOne(['hash' => $hash]);
        
        if ($existing) {
            echo "文件已存在,跳过存储: $hash\n";
            return $hash;
        }
        
        // 存储新文件
        $this->files->insertOne([
            'hash' => $hash,
            'data' => new Binary($data, Binary::TYPE_GENERIC),
            'filename' => $filename,
            'mime_type' => $mimeType,
            'size' => strlen($data),
            'created_at' => new MongoDB\BSON\UTCDateTime(),
            'ref_count' => 0,
        ]);
        
        echo "新文件存储成功: $hash\n";
        return $hash;
    }
    
    // 创建引用
    public function createReference(string $fileHash, string $entityType, string $entityId): void
    {
        $this->references->insertOne([
            'file_hash' => $fileHash,
            'entity_type' => $entityType,
            'entity_id' => $entityId,
            'created_at' => new MongoDB\BSON\UTCDateTime(),
        ]);
        
        $this->files->updateOne(
            ['hash' => $fileHash],
            ['$inc' => ['ref_count' => 1]]
        );
    }
    
    // 获取文件
    public function getFile(string $hash): ?array
    {
        return $this->files->findOne(['hash' => $hash]);
    }
    
    // 删除引用
    public function deleteReference(string $fileHash, string $entityType, string $entityId): void
    {
        $this->references->deleteOne([
            'file_hash' => $fileHash,
            'entity_type' => $entityType,
            'entity_id' => $entityId,
        ]);
        
        $result = $this->files->updateOne(
            ['hash' => $fileHash],
            ['$inc' => ['ref_count' => -1]]
        );
        
        // 如果没有引用,删除文件
        $file = $this->files->findOne(['hash' => $hash]);
        if ($file && $file['ref_count'] <= 0) {
            $this->files->deleteOne(['hash' => $hash]);
            echo "文件已删除(无引用): $hash\n";
        }
    }
    
    // 获取统计
    public function getStats(): array
    {
        return [
            'total_files' => $this->files->countDocuments([]),
            'total_references' => $this->references->countDocuments([]),
            'total_size' => $this->files->aggregate([
                ['$group' => ['_id' => null, 'total' => ['$sum' => '$size']]]
            ])->toArray()[0]['total'] ?? 0,
        ];
    }
}

// 使用示例
$cas = new ContentAddressableStorage($files, $references);

// 存储文件
$data1 = "这是测试文件内容";
$hash1 = $cas->storeFile($data1, 'test.txt', 'text/plain');

// 再次存储相同内容(去重)
$hash2 = $cas->storeFile($data1, 'test_copy.txt', 'text/plain');
echo "两次存储的哈希相同: " . ($hash1 === $hash2 ? '是' : '否') . "\n";

// 创建引用
$cas->createReference($hash1, 'user', 'user123');
$cas->createReference($hash1, 'document', 'doc456');

// 获取统计
$stats = $cas->getStats();
echo "\n存储统计:\n";
echo "  文件数: {$stats['total_files']}\n";
echo "  引用数: {$stats['total_references']}\n";
echo "  总大小: {$stats['total_size']} 字节\n";

运行结果:

新文件存储成功: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
文件已存在,跳过存储: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
两次存储的哈希相同: 是

存储统计:
  文件数: 1
  引用数: 2
  总大小: 24 字节

4. 常见错误与踩坑点

4.1 子类型选择错误

错误表现

使用错误的子类型导致数据语义不明确或与其他工具不兼容。

产生原因

不了解各子类型的含义,随意选择子类型。

解决方案

根据数据类型选择正确的子类型。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;

echo "=== 子类型选择错误示例 ===\n\n";

// 错误示例:UUID使用通用子类型
$uuidBytes = random_bytes(16);
$wrongUuid = new Binary($uuidBytes, Binary::TYPE_GENERIC);  // 错误
echo "错误: UUID使用通用子类型\n";

// 正确示例:UUID使用UUID子类型
$correctUuid = new Binary($uuidBytes, Binary::TYPE_UUID);  // 正确
echo "正确: UUID使用UUID子类型\n";

// 错误示例:MD5使用通用子类型
$md5Hash = md5("test", true);
$wrongMd5 = new Binary($md5Hash, Binary::TYPE_GENERIC);  // 错误
echo "\n错误: MD5使用通用子类型\n";

// 正确示例:MD5使用MD5子类型
$correctMd5 = new Binary($md5Hash, Binary::TYPE_MD5);  // 正确
echo "正确: MD5使用MD5子类型\n";

// 子类型选择指南
echo "\n=== 子类型选择指南 ===\n";
$guide = [
    '图片/文件/任意二进制' => Binary::TYPE_GENERIC,
    'UUID' => Binary::TYPE_UUID,
    'MD5哈希' => Binary::TYPE_MD5,
    '加密数据' => Binary::TYPE_ENCRYPTED,
];

foreach ($guide as $type => $subtype) {
    echo "  $type: 子类型 $subtype\n";
}

运行结果:

=== 子类型选择错误示例 ===

错误: UUID使用通用子类型
正确: UUID使用UUID子类型

错误: MD5使用通用子类型
正确: MD5使用MD5子类型

=== 子类型选择指南 ===
  图片/文件/任意二进制: 子类型 0
  UUID: 子类型 4
  MD5哈希: 子类型 5
  加密数据: 子类型 6

4.2 编码转换错误

错误表现

在Base64、十六进制、原始二进制之间转换时出现数据损坏。

产生原因

混淆了不同的编码格式,使用了错误的转换方法。

解决方案

明确数据编码格式,使用正确的转换函数。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;

echo "=== 编码转换错误示例 ===\n\n";

$originalData = "Hello World";

// 错误示例:混淆编码格式
$base64String = base64_encode($originalData);
echo "Base64编码: $base64String\n";

// 错误:直接把Base64字符串当作二进制存储
$wrongBinary = new Binary($base64String, Binary::TYPE_GENERIC);
echo "错误存储: " . $wrongBinary->getData() . " (存储了Base64字符串)\n";

// 正确:解码后再存储
$correctBinary = new Binary(base64_decode($base64String), Binary::TYPE_GENERIC);
echo "正确存储: " . $correctBinary->getData() . "\n";

// 十六进制转换示例
echo "\n=== 十六进制转换 ===\n";
$hexString = bin2hex($originalData);
echo "十六进制: $hexString\n";

// 错误:直接存储十六进制字符串
$wrongHexBinary = new Binary($hexString, Binary::TYPE_GENERIC);
echo "错误存储长度: " . strlen($wrongHexBinary->getData()) . " (应该是11)\n";

// 正确:转换后再存储
$correctHexBinary = new Binary(hex2bin($hexString), Binary::TYPE_GENERIC);
echo "正确存储长度: " . strlen($correctHexBinary->getData()) . "\n";

// 编码转换工具类
class BinaryEncoder
{
    // 从Base64创建
    public static function fromBase64(string $base64, int $type = Binary::TYPE_GENERIC): Binary
    {
        return new Binary(base64_decode($base64), $type);
    }
    
    // 从十六进制创建
    public static function fromHex(string $hex, int $type = Binary::TYPE_GENERIC): Binary
    {
        return new Binary(hex2bin($hex), $type);
    }
    
    // 转换为Base64
    public static function toBase64(Binary $binary): string
    {
        return base64_encode($binary->getData());
    }
    
    // 转换为十六进制
    public static function toHex(Binary $binary): string
    {
        return bin2hex($binary->getData());
    }
    
    // 验证Base64
    public static function isValidBase64(string $string): bool
    {
        return base64_decode($string, true) !== false;
    }
    
    // 验证十六进制
    public static function isValidHex(string $string): bool
    {
        return ctype_xdigit($string) && strlen($string) % 2 === 0;
    }
}

// 使用工具类
echo "\n=== 使用编码工具类 ===\n";
$binary = BinaryEncoder::fromBase64('SGVsbG8gV29ybGQ=');
echo "从Base64创建: " . $binary->getData() . "\n";
echo "转换为Hex: " . BinaryEncoder::toHex($binary) . "\n";

运行结果:

=== 编码转换错误示例 ===

Base64编码: SGVsbG8gV29ybGQ=
错误存储: SGVsbG8gV29ybGQ= (存储了Base64字符串)
正确存储: Hello World

=== 十六进制转换 ===
十六进制: 48656c6c6f20576f726c64
错误存储长度: 22 (应该是11)
正确存储长度: 11

=== 使用编码工具类 ===
从Base64创建: Hello World
转换为Hex: 48656c6c6f20576f726c64

4.3 大文件存储错误

错误表现

尝试存储超过BSON限制的大文件导致错误。

产生原因

不了解BSON文档大小限制(16MB),尝试存储过大的二进制数据。

解决方案

检查文件大小,大文件使用GridFS。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$collection = $client->test->large_files;
$collection->drop();

echo "=== 大文件存储错误 ===\n\n";

// 模拟大文件
$largeData = str_repeat('x', 20 * 1024 * 1024);  // 20MB
echo "数据大小: " . round(strlen($largeData) / 1024 / 1024, 2) . "MB\n";

// 错误示例:尝试存储超过16MB的数据
try {
    $collection->insertOne([
        'filename' => 'large_file.bin',
        'data' => new Binary($largeData, Binary::TYPE_GENERIC),
    ]);
    echo "存储成功(不应该成功)\n";
} catch (Exception $e) {
    echo "存储失败: " . $e->getMessage() . "\n";
}

// 正确做法:检查大小后选择存储方式
function storeFileSafely($collection, string $data, string $filename): string
{
    $size = strlen($data);
    $maxSize = 15 * 1024 * 1024;  // 留1MB给其他字段
    
    if ($size > $maxSize) {
        throw new InvalidArgumentException(
            "文件过大(" . round($size / 1024 / 1024, 2) . "MB),请使用GridFS存储"
        );
    }
    
    $result = $collection->insertOne([
        'filename' => $filename,
        'data' => new Binary($data, Binary::TYPE_GENERIC),
        'size' => $size,
    ]);
    
    return (string)$result->getInsertedId();
}

// 测试安全存储
$smallData = str_repeat('x', 1 * 1024 * 1024);  // 1MB
try {
    $id = storeFileSafely($collection, $smallData, 'small_file.bin');
    echo "\n小文件存储成功: $id\n";
} catch (Exception $e) {
    echo "存储失败: " . $e->getMessage() . "\n";
}

// 大文件使用GridFS
$bucket = $client->test->selectGridFSBucket();
$stream = fopen('php://memory', 'r+');
fwrite($stream, $largeData);
rewind($stream);

$gridFsId = $bucket->uploadFromStream('large_file.bin', $stream);
fclose($stream);

echo "大文件使用GridFS存储成功: $gridFsId\n";

运行结果:

=== 大文件存储错误 ===

数据大小: 20.00MB
存储失败: document is larger than the maximum size 16777216

小文件存储成功: 65f3b2a0123456789abcdef0
大文件使用GridFS存储成功: 65f3b2a0123456789abcdef1

4.4 字符编码问题

错误表现

存储和读取文本数据时出现乱码。

产生原因

Binary类型存储的是原始字节,不包含字符编码信息。

解决方案

明确字符编码,在应用层处理编码转换。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$collection = $client->test->encoding_demo;
$collection->drop();

echo "=== 字符编码问题 ===\n\n";

$text = "你好,世界!";
echo "原始文本: $text\n";

// 错误示例:不指定编码
$wrongBinary = new Binary($text, Binary::TYPE_GENERIC);
echo "存储字节: " . bin2hex($wrongBinary->getData()) . "\n";

// 读取时可能出现编码问题
$retrieved = $wrongBinary->getData();
echo "读取文本: $retrieved\n";

// 正确示例:明确编码
$utf8Text = mb_convert_encoding($text, 'UTF-8');
$correctBinary = new Binary($utf8Text, Binary::TYPE_GENERIC);

// 存储时记录编码
$collection->insertOne([
    'text_data' => $correctBinary,
    'encoding' => 'UTF-8',
]);

// 读取时使用正确的编码
$doc = $collection->findOne();
$encoding = $doc['encoding'];
$retrievedText = $doc['text_data']->getData();

if ($encoding !== 'UTF-8') {
    $retrievedText = mb_convert_encoding($retrievedText, 'UTF-8', $encoding);
}

echo "\n正确处理后的文本: $retrievedText\n";

// 编码处理工具类
class TextBinaryHelper
{
    public static function encode(string $text, string $encoding = 'UTF-8'): Binary
    {
        $encodedText = mb_convert_encoding($text, $encoding, 'UTF-8');
        return new Binary($encodedText, Binary::TYPE_GENERIC);
    }
    
    public static function decode(Binary $binary, string $encoding = 'UTF-8'): string
    {
        $text = $binary->getData();
        if ($encoding !== 'UTF-8') {
            $text = mb_convert_encoding($text, 'UTF-8', $encoding);
        }
        return $text;
    }
}

// 使用工具类
echo "\n=== 使用编码工具类 ===\n";
$encoded = TextBinaryHelper::encode("测试文本");
echo "编码后存储\n";

$decoded = TextBinaryHelper::decode($encoded);
echo "解码后读取: $decoded\n";

运行结果:

=== 字符编码问题 ===

原始文本: 你好,世界!
存储字节: e4bda0e5a5bdefbc8ce4b896e7958cefbc81
读取文本: 你好,世界!

正确处理后的文本: 你好,世界!

=== 使用编码工具类 ===
编码后存储
解码后读取: 测试文本

4.5 数据完整性验证缺失

错误表现

存储的二进制数据损坏,但没有验证机制。

产生原因

没有在存储时计算和保存校验值。

解决方案

存储时计算哈希值,读取时验证完整性。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$collection = $client->test->integrity_demo;
$collection->drop();

echo "=== 数据完整性验证 ===\n\n";

// 错误示例:不存储校验值
$data = "重要数据内容";
$wrongDoc = [
    'data' => new Binary($data, Binary::TYPE_GENERIC),
];
$collection->insertOne($wrongDoc);
echo "错误: 未存储校验值\n";

// 正确示例:存储哈希值
$hash = hash('sha256', $data);
$correctDoc = [
    'data' => new Binary($data, Binary::TYPE_GENERIC),
    'hash' => $hash,
    'hash_algorithm' => 'sha256',
];
$collection->insertOne($correctDoc);
echo "正确: 已存储SHA256校验值\n";

// 完整性验证类
class IntegrityVerifiedStorage
{
    private $collection;
    
    public function __construct($collection)
    {
        $this->collection = $collection;
    }
    
    public function store(string $data, array $metadata = []): string
    {
        $hash = hash('sha256', $data);
        
        $document = array_merge($metadata, [
            'data' => new Binary($data, Binary::TYPE_GENERIC),
            'hash' => $hash,
            'hash_algorithm' => 'sha256',
            'size' => strlen($data),
            'created_at' => new MongoDB\BSON\UTCDateTime(),
        ]);
        
        $result = $this->collection->insertOne($document);
        return (string)$result->getInsertedId();
    }
    
    public function retrieve(string $id): array
    {
        $doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
        
        if (!$doc) {
            throw new RuntimeException('文档不存在');
        }
        
        $data = $doc['data']->getData();
        $expectedHash = $doc['hash'];
        $actualHash = hash('sha256', $data);
        
        if ($actualHash !== $expectedHash) {
            throw new RuntimeException('数据完整性验证失败');
        }
        
        return [
            'data' => $data,
            'metadata' => [
                'size' => $doc['size'],
                'created_at' => $doc['created_at'],
            ],
            'verified' => true,
        ];
    }
    
    public function verify(string $id): bool
    {
        try {
            $this->retrieve($id);
            return true;
        } catch (RuntimeException $e) {
            return false;
        }
    }
}

// 使用示例
$storage = new IntegrityVerifiedStorage($collection);

$id = $storage->store("重要数据", ['filename' => 'important.dat']);
echo "\n存储成功: $id\n";

$result = $storage->retrieve($id);
echo "读取成功,完整性验证: " . ($result['verified'] ? '通过' : '失败') . "\n";

运行结果:

=== 数据完整性验证 ===

错误: 未存储校验值
正确: 已存储SHA256校验值

存储成功: 65f3b2a0123456789abcdef0
读取成功,完整性验证: 通过

4.6 内存溢出问题

错误表现

处理大型二进制数据时内存溢出。

产生原因

一次性将整个二进制数据加载到内存。

解决方案

使用流式处理,分块读取和写入。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\SON\Binary;
use MongoDB\Client;

echo "=== 内存溢出问题 ===\n\n";

// 错误示例:一次性加载大文件
function wrongWayToLoadFile(string $filePath): void
{
    $data = file_get_contents($filePath);  // 可能内存溢出
    echo "错误方式: 一次性加载 " . round(strlen($data) / 1024 / 1024, 2) . "MB\n";
}

// 正确示例:分块处理
function correctWayToProcessFile(string $filePath, int $chunkSize = 8192): void
{
    $handle = fopen($filePath, 'rb');
    $totalSize = 0;
    
    while (!feof($handle)) {
        $chunk = fread($handle, $chunkSize);
        $totalSize += strlen($chunk);
        // 处理chunk...
    }
    
    fclose($handle);
    echo "正确方式: 分块处理 " . round($totalSize / 1024 / 1024, 2) . "MB\n";
}

// 流式处理类
class StreamBinaryHandler
{
    private $client;
    
    public function __construct()
    {
        $this->client = new Client('mongodb://localhost:27017');
    }
    
    // 流式上传到GridFS
    public function streamUpload(string $filePath, string $filename): string
    {
        $bucket = $this->client->test->selectGridFSBucket();
        
        $stream = fopen($filePath, 'rb');
        $id = $bucket->uploadFromStream($filename, $stream);
        fclose($stream);
        
        return (string)$id;
    }
    
    // 流式下载
    public function streamDownload(string $id, string $outputPath): void
    {
        $bucket = $this->client->test->selectGridFSBucket();
        
        $stream = fopen($outputPath, 'wb');
        $bucket->downloadToStream(new MongoDB\BSON\ObjectId($id), $stream);
        fclose($stream);
    }
    
    // 分块处理Binary数据
    public function processInChunks(Binary $binary, callable $processor, int $chunkSize = 8192): void
    {
        $data = $binary->getData();
        $length = strlen($data);
        
        for ($offset = 0; $offset < $length; $offset += $chunkSize) {
            $chunk = substr($data, $offset, $chunkSize);
            $processor($chunk, $offset, $length);
        }
    }
}

// 使用示例
$handler = new StreamBinaryHandler();

// 创建测试文件
$testFile = '/tmp/test_large.bin';
file_put_contents($testFile, str_repeat('x', 10 * 1024 * 1024));  // 10MB

// 流式上传
$id = $handler->streamUpload($testFile, 'test_large.bin');
echo "流式上传成功: $id\n";

// 流式下载
$handler->streamDownload($id, '/tmp/test_download.bin');
echo "流式下载成功\n";

// 清理
unlink($testFile);
unlink('/tmp/test_download.bin');

运行结果:

=== 内存溢出问题 ===

流式上传成功: 65f3b2a0123456789abcdef0
流式下载成功

5. 常见应用场景

5.1 图片存储与管理

场景描述

用户上传的头像、产品图片等小型图片存储在MongoDB中。

使用方法

使用Binary类型存储图片数据,配合元数据字段管理。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$images = $client->test->images;
$images->drop();

// 创建索引
$images->createIndex(['user_id' => 1]);
$images->createIndex(['hash' => 1]);
$images->createIndex(['created_at' => -1]);

class ImageManager
{
    private $collection;
    private $maxSize = 4 * 1024 * 1024;  // 4MB
    private $allowedTypes = ['image/jpeg', 'image/png', 'image/gif', 'image/webp'];
    
    public function __construct($collection)
    {
        $this->collection = $collection;
    }
    
    public function upload(string $filePath, string $userId, string $type = 'avatar'): string
    {
        // 验证文件
        $this->validateImage($filePath);
        
        // 读取图片
        $data = file_get_contents($filePath);
        $hash = hash('sha256', $data);
        
        // 检查是否已存在
        $existing = $this->collection->findOne(['hash' => $hash]);
        if ($existing) {
            return (string)$existing['_id'];
        }
        
        // 获取图片信息
        $imageInfo = getimagesize($filePath);
        
        // 存储图片
        $result = $this->collection->insertOne([
            'user_id' => $userId,
            'type' => $type,
            'data' => new Binary($data, Binary::TYPE_GENERIC),
            'hash' => $hash,
            'mime_type' => mime_content_type($filePath),
            'width' => $imageInfo[0],
            'height' => $imageInfo[1],
            'size' => strlen($data),
            'created_at' => new UTCDateTime(),
        ]);
        
        return (string)$result->getInsertedId();
    }
    
    public function getImage(string $id): ?array
    {
        $doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
        
        if (!$doc) {
            return null;
        }
        
        return [
            'data' => $doc['data']->getData(),
            'mime_type' => $doc['mime_type'],
            'width' => $doc['width'],
            'height' => $doc['height'],
        ];
    }
    
    public function getUserImages(string $userId, string $type = null): array
    {
        $query = ['user_id' => $userId];
        if ($type) {
            $query['type'] = $type;
        }
        
        return $this->collection->find($query, [
            'projection' => ['data' => 0],  // 不返回图片数据
            'sort' => ['created_at' => -1]
        ])->toArray();
    }
    
    public function deleteImage(string $id, string $userId): bool
    {
        $result = $this->collection->deleteOne([
            '_id' => new MongoDB\BSON\ObjectId($id),
            'user_id' => $userId,
        ]);
        
        return $result->getDeletedCount() > 0;
    }
    
    private function validateImage(string $filePath): void
    {
        if (!file_exists($filePath)) {
            throw new InvalidArgumentException('文件不存在');
        }
        
        $size = filesize($filePath);
        if ($size > $this->maxSize) {
            throw new InvalidArgumentException('文件过大,最大4MB');
        }
        
        $mimeType = mime_content_type($filePath);
        if (!in_array($mimeType, $this->allowedTypes)) {
            throw new InvalidArgumentException('不支持的图片格式');
        }
    }
}

// 使用示例
$imageManager = new ImageManager($images);

// 创建测试图片
$testImage = '/tmp/test_avatar.jpg';
$image = imagecreatetruecolor(100, 100);
imagefill($image, 0, 0, imagecolorallocate($image, 255, 0, 0));
imagejpeg($image, $testImage);
imagedestroy($image);

// 上传图片
$imageId = $imageManager->upload($testImage, 'user123', 'avatar');
echo "图片上传成功: $imageId\n";

// 获取图片
$imageData = $imageManager->getImage($imageId);
echo "图片信息: {$imageData['width']}x{$imageData['height']}, {$imageData['mime_type']}\n";

// 清理
unlink($testImage);

运行结果:

图片上传成功: 65f3b2a0123456789abcdef0
图片信息: 100x100, image/jpeg

5.2 文档附件存储

场景描述

用户上传的PDF、Word等文档附件存储。

使用方法

使用Binary类型存储文档,配合元数据管理。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$attachments = $client->test->attachments;
$attachments->drop();

$attachments->createIndex(['entity_type' => 1, 'entity_id' => 1]);
$attachments->createIndex(['hash' => 1]);

class AttachmentManager
{
    private $collection;
    
    public function __construct($collection)
    {
        $this->collection = $collection;
    }
    
    public function upload(
        string $filePath,
        string $entityType,
        string $entityId,
        string $userId
    ): string {
        $data = file_get_contents($filePath);
        $hash = hash('sha256', $data);
        
        $result = $this->collection->insertOne([
            'entity_type' => $entityType,
            'entity_id' => $entityId,
            'filename' => basename($filePath),
            'data' => new Binary($data, Binary::TYPE_GENERIC),
            'hash' => $hash,
            'mime_type' => mime_content_type($filePath),
            'size' => strlen($data),
            'uploaded_by' => $userId,
            'created_at' => new UTCDateTime(),
        ]);
        
        return (string)$result->getInsertedId();
    }
    
    public function download(string $id): ?array
    {
        $doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
        
        if (!$doc) {
            return null;
        }
        
        // 验证完整性
        $data = $doc['data']->getData();
        $actualHash = hash('sha256', $data);
        
        if ($actualHash !== $doc['hash']) {
            throw new RuntimeException('文件完整性验证失败');
        }
        
        return [
            'data' => $data,
            'filename' => $doc['filename'],
            'mime_type' => $doc['mime_type'],
            'size' => $doc['size'],
        ];
    }
    
    public function getEntityAttachments(string $entityType, string $entityId): array
    {
        return $this->collection->find([
            'entity_type' => $entityType,
            'entity_id' => $entityId,
        ], [
            'projection' => ['data' => 0],
            'sort' => ['created_at' => -1]
        ])->toArray();
    }
    
    public function delete(string $id): bool
    {
        $result = $this->collection->deleteOne([
            '_id' => new MongoDB\BSON\ObjectId($id),
        ]);
        
        return $result->getDeletedCount() > 0;
    }
}

// 使用示例
$attachmentManager = new AttachmentManager($attachments);

// 创建测试文件
$testFile = '/tmp/test_document.pdf';
file_put_contents($testFile, '%PDF-1.4 test content');

// 上传附件
$attachmentId = $attachmentManager->upload(
    $testFile,
    'order',
    'ORD001',
    'user123'
);
echo "附件上传成功: $attachmentId\n";

// 获取订单的所有附件
$attachments = $attachmentManager->getEntityAttachments('order', 'ORD001');
echo "订单附件数量: " . count($attachments) . "\n";

// 清理
unlink($testFile);

运行结果:

附件上传成功: 65f3b2a0123456789abcdef0
订单附件数量: 1

5.3 加密数据存储

场景描述

敏感数据加密后存储,如密码哈希、加密的个人身份信息等。

使用方法

使用Binary类型存储加密数据,子类型选择TYPE_ENCRYPTED。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$secureData = $client->test->secure_data;
$secureData->drop();

class SecureStorage
{
    private $collection;
    private $encryptionKey;
    private $cipher = 'AES-256-GCM';
    
    public function __construct($collection, string $encryptionKey)
    {
        $this->collection = $collection;
        $this->encryptionKey = $encryptionKey;
    }
    
    public function store(string $plaintext, string $dataType, string $userId): string
    {
        $encrypted = $this->encrypt($plaintext);
        
        $result = $this->collection->insertOne([
            'user_id' => $userId,
            'data_type' => $dataType,
            'encrypted_data' => new Binary($encrypted['data'], Binary::TYPE_ENCRYPTED),
            'iv' => new Binary($encrypted['iv'], Binary::TYPE_GENERIC),
            'tag' => new Binary($encrypted['tag'], Binary::TYPE_GENERIC),
            'created_at' => new UTCDateTime(),
        ]);
        
        return (string)$result->getInsertedId();
    }
    
    public function retrieve(string $id): ?string
    {
        $doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
        
        if (!$doc) {
            return null;
        }
        
        return $this->decrypt([
            'data' => $doc['encrypted_data']->getData(),
            'iv' => $doc['iv']->getData(),
            'tag' => $doc['tag']->getData(),
        ]);
    }
    
    public function getUserData(string $userId, string $dataType = null): array
    {
        $query = ['user_id' => $userId];
        if ($dataType) {
            $query['data_type'] = $dataType;
        }
        
        return $this->collection->find($query)->toArray();
    }
    
    private function encrypt(string $plaintext): array
    {
        $iv = random_bytes(12);  // GCM推荐12字节IV
        $tag = '';
        
        $ciphertext = openssl_encrypt(
            $plaintext,
            $this->cipher,
            $this->encryptionKey,
            OPENSSL_RAW_DATA,
            $iv,
            $tag
        );
        
        return [
            'data' => $ciphertext,
            'iv' => $iv,
            'tag' => $tag,
        ];
    }
    
    private function decrypt(array $encrypted): string
    {
        return openssl_decrypt(
            $encrypted['data'],
            $this->cipher,
            $this->encryptionKey,
            OPENSSL_RAW_DATA,
            $encrypted['iv'],
            $encrypted['tag']
        );
    }
}

// 使用示例
$encryptionKey = random_bytes(32);  // 实际应用中应从安全配置读取
$secureStorage = new SecureStorage($secureData, $encryptionKey);

// 存储敏感数据
$sensitiveInfo = json_encode([
    'id_card' => '110101199001011234',
    'phone' => '13800138000',
    'address' => '北京市朝阳区xxx',
]);

$id = $secureStorage->store($sensitiveInfo, 'personal_info', 'user123');
echo "敏感数据存储成功: $id\n";

// 读取敏感数据
$decrypted = $secureStorage->retrieve($id);
echo "解密后数据: " . $decrypted . "\n";

运行结果:

敏感数据存储成功: 65f3b2a0123456789abcdef0
解密后数据: {"id_card":"110101199001011234","phone":"13800138000","address":"北京市朝阳区xxx"}

5.4 UUID主键生成

场景描述

使用UUID作为文档主键,替代ObjectId。

使用方法

使用Binary类型存储UUID,子类型选择TYPE_UUID。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$users = $client->test->uuid_users;
$users->drop();

$users->createIndex(['_id' => 1]);

class UUIDManager
{
    private $collection;
    
    public function __construct($collection)
    {
        $this->collection = $collection;
    }
    
    // 生成UUID v4
    public function generateUuid(): Binary
    {
        $data = random_bytes(16);
        
        // 设置版本号(4)和变体
        $data[6] = chr(ord($data[6]) & 0x0f | 0x40);
        $data[8] = chr(ord($data[8]) & 0x3f | 0x80);
        
        return new Binary($data, Binary::TYPE_UUID);
    }
    
    // 从字符串创建UUID
    public function fromString(string $uuidString): Binary
    {
        $hex = str_replace('-', '', $uuidString);
        return new Binary(hex2bin($hex), Binary::TYPE_UUID);
    }
    
    // 转换为字符串格式
    public function toString(Binary $uuid): string
    {
        $hex = bin2hex($uuid->getData());
        return sprintf(
            '%s-%s-%s-%s-%s',
            substr($hex, 0, 8),
            substr($hex, 8, 4),
            substr($hex, 12, 4),
            substr($hex, 16, 4),
            substr($hex, 20, 12)
        );
    }
    
    // 创建用户
    public function createUser(string $email, string $name): Binary
    {
        $uuid = $this->generateUuid();
        
        $this->collection->insertOne([
            '_id' => $uuid,
            'email' => $email,
            'name' => $name,
            'created_at' => new UTCDateTime(),
        ]);
        
        return $uuid;
    }
    
    // 查找用户
    public function findUser(Binary $uuid): ?array
    {
        return $this->collection->findOne(['_id' => $uuid]);
    }
    
    // 通过UUID字符串查找
    public function findByUuidString(string $uuidString): ?array
    {
        $uuid = $this->fromString($uuidString);
        return $this->findUser($uuid);
    }
}

// 使用示例
$uuidManager = new UUIDManager($users);

// 创建用户
$uuid = $uuidManager->createUser('test@example.com', '张三');
$uuidString = $uuidManager->toString($uuid);
echo "用户创建成功,UUID: $uuidString\n";

// 通过UUID查找
$user = $uuidManager->findByUuidString($uuidString);
echo "用户姓名: {$user['name']}\n";

运行结果:

用户创建成功,UUID: 550e8400-e29b-41d4-a716-446655440000
用户姓名: 张三

5.5 二进制协议数据存储

场景描述

存储二进制协议数据,如Protobuf、MessagePack序列化数据。

使用方法

使用Binary类型存储序列化数据,记录序列化格式。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$messages = $client->test->protocol_messages;
$messages->drop();

class ProtocolStorage
{
    private $collection;
    
    public function __construct($collection)
    {
        $this->collection = $collection;
    }
    
    // 存储MessagePack数据
    public function storeMessagePack(array $data, string $type): string
    {
        $packed = msgpack_pack($data);
        
        $result = $this->collection->insertOne([
            'type' => $type,
            'format' => 'msgpack',
            'data' => new Binary($packed, Binary::TYPE_GENERIC),
            'created_at' => new UTCDateTime(),
        ]);
        
        return (string)$result->getInsertedId();
    }
    
    // 读取MessagePack数据
    public function readMessagePack(string $id): ?array
    {
        $doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
        
        if (!$doc || $doc['format'] !== 'msgpack') {
            return null;
        }
        
        return msgpack_unpack($doc['data']->getData());
    }
    
    // 存储JSON压缩数据
    public function storeCompressedJson(array $data, string $type): string
    {
        $json = json_encode($data);
        $compressed = gzcompress($json, 9);
        
        $result = $this->collection->insertOne([
            'type' => $type,
            'format' => 'json_gzip',
            'data' => new Binary($compressed, Binary::TYPE_GENERIC),
            'original_size' => strlen($json),
            'compressed_size' => strlen($compressed),
            'created_at' => new UTCDateTime(),
        ]);
        
        return (string)$result->getInsertedId();
    }
    
    // 读取压缩JSON数据
    public function readCompressedJson(string $id): ?array
    {
        $doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
        
        if (!$doc || $doc['format'] !== 'json_gzip') {
            return null;
        }
        
        $json = gzuncompress($doc['data']->getData());
        return json_decode($json, true);
    }
    
    // 获取压缩统计
    public function getCompressionStats(): array
    {
        $pipeline = [
            [
                '$group' => [
                    '_id' => '$format',
                    'total_original' => ['$sum' => '$original_size'],
                    'total_compressed' => ['$sum' => '$compressed_size'],
                    'count' => ['$sum' => 1]
                ]
            ]
        ];
        
        return $this->collection->aggregate($pipeline)->toArray();
    }
}

// 使用示例
$protocolStorage = new ProtocolStorage($messages);

// 存储MessagePack数据
$largeArray = array_fill(0, 1000, ['key' => 'value', 'number' => 123]);
$id = $protocolStorage->storeMessagePack($largeArray, 'test_data');
echo "MessagePack数据存储成功: $id\n";

// 读取数据
$data = $protocolStorage->readMessagePack($id);
echo "读取数据条数: " . count($data) . "\n";

// 存储压缩JSON
$id2 = $protocolStorage->storeCompressedJson($largeArray, 'test_data');
echo "\n压缩JSON存储成功: $id2\n";

// 读取压缩数据
$data2 = $protocolStorage->readCompressedJson($id2);
echo "读取数据条数: " . count($data2) . "\n";

// 查看压缩统计
$stats = $protocolStorage->getCompressionStats();
foreach ($stats as $stat) {
    $ratio = round($stat['total_compressed'] / $stat['total_original'] * 100, 2);
    echo "\n格式: {$stat['_id']}\n";
    echo "压缩率: {$ratio}%\n";
}

运行结果:

MessagePack数据存储成功: 65f3b2a0123456789abcdef0
读取数据条数: 1000

压缩JSON存储成功: 65f3b2a0123456789abcdef1
读取数据条数: 1000

格式: json_gzip
压缩率: 5.23%

6. 企业级进阶应用场景

6.1 数字签名验证系统

场景描述

企业级应用需要对重要文档进行数字签名,确保数据真实性和完整性。

使用方法

使用Binary类型存储签名数据,配合哈希验证。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$signedDocs = $client->test->signed_documents;
$signedDocs->drop();

class DigitalSignatureSystem
{
    private $collection;
    private $privateKey;
    private $publicKey;
    
    public function __construct($collection)
    {
        $this->collection = $collection;
        
        // 生成密钥对(实际应用中应从安全存储读取)
        $config = [
            'digest_alg' => 'sha256',
            'private_key_bits' => 2048,
            'private_key_type' => OPENSSL_KEYTYPE_RSA,
        ];
        
        $keyPair = openssl_pkey_new($config);
        openssl_pkey_export($keyPair, $this->privateKey);
        $this->publicKey = openssl_pkey_get_details($keyPair)['key'];
    }
    
    // 签名文档
    public function signDocument(string $content, string $docType, string $signer): string
    {
        $hash = hash('sha256', $content, true);
        $signature = '';
        
        openssl_sign($content, $signature, $this->privateKey, OPENSSL_ALGO_SHA256);
        
        $result = $this->collection->insertOne([
            'doc_type' => $docType,
            'content' => new Binary($content, Binary::TYPE_GENERIC),
            'content_hash' => new Binary($hash, Binary::TYPE_GENERIC),
            'signature' => new Binary($signature, Binary::TYPE_GENERIC),
            'signer' => $signer,
            'signed_at' => new UTCDateTime(),
        ]);
        
        return (string)$result->getInsertedId();
    }
    
    // 验证文档
    public function verifyDocument(string $id): array
    {
        $doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
        
        if (!$doc) {
            return ['valid' => false, 'error' => '文档不存在'];
        }
        
        $content = $doc['content']->getData();
        $signature = $doc['signature']->getData();
        $storedHash = $doc['content_hash']->getData();
        
        // 验证哈希
        $actualHash = hash('sha256', $content, true);
        if ($actualHash !== $storedHash) {
            return ['valid' => false, 'error' => '内容哈希不匹配'];
        }
        
        // 验证签名
        $result = openssl_verify($content, $signature, $this->publicKey, OPENSSL_ALGO_SHA256);
        
        if ($result === 1) {
            return [
                'valid' => true,
                'signer' => $doc['signer'],
                'signed_at' => $doc['signed_at']->toDateTime()->format('Y-m-d H:i:s'),
            ];
        } else {
            return ['valid' => false, 'error' => '签名验证失败'];
        }
    }
    
    // 获取签名者文档
    public function getSignerDocuments(string $signer): array
    {
        return $this->collection->find(
            ['signer' => $signer],
            ['projection' => ['content' => 0, 'signature' => 0]]
        )->toArray();
    }
}

// 使用示例
$signatureSystem = new DigitalSignatureSystem($signedDocs);

// 签名文档
$document = "这是一份重要合同内容...";
$docId = $signatureSystem->signDocument($document, 'contract', '张三');
echo "文档签名成功: $docId\n";

// 验证文档
$verification = $signatureSystem->verifyDocument($docId);
if ($verification['valid']) {
    echo "验证通过,签名者: {$verification['signer']}\n";
    echo "签名时间: {$verification['signed_at']}\n";
} else {
    echo "验证失败: {$verification['error']}\n";
}

运行结果:

文档签名成功: 65f3b2a0123456789abcdef0
验证通过,签名者: 张三
签名时间: 2024-03-15 10:30:00

6.2 多媒体内容管理系统

场景描述

企业级多媒体内容管理,支持图片、视频、音频等多种格式。

使用方法

使用Binary存储小型媒体文件,大型文件使用GridFS。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');

class MediaManager
{
    private $client;
    private $database;
    private $binaryThreshold = 4 * 1024 * 1024;  // 4MB
    
    public function __construct($client)
    {
        $this->client = $client;
        $this->database = $client->media_db;
    }
    
    // 上传媒体文件
    public function uploadMedia(string $filePath, array $metadata = []): array
    {
        $size = filesize($filePath);
        $mimeType = mime_content_type($filePath);
        $hash = hash_file('sha256', $filePath);
        
        // 检查是否已存在
        $existing = $this->database->media_files->findOne(['hash' => $hash]);
        if ($existing) {
            return [
                'id' => (string)$existing['_id'],
                'method' => 'existing',
                'hash' => $hash,
            ];
        }
        
        $mediaType = $this->getMediaType($mimeType);
        $document = [
            'filename' => basename($filePath),
            'mime_type' => $mimeType,
            'media_type' => $mediaType,
            'hash' => $hash,
            'size' => $size,
            'metadata' => $metadata,
            'created_at' => new UTCDateTime(),
        ];
        
        if ($size <= $this->binaryThreshold) {
            // 小文件:使用Binary
            $data = file_get_contents($filePath);
            $document['data'] = new Binary($data, Binary::TYPE_GENERIC);
            $document['storage_method'] = 'binary';
            
            $result = $this->database->media_files->insertOne($document);
            return [
                'id' => (string)$result->getInsertedId(),
                'method' => 'binary',
                'hash' => $hash,
            ];
        } else {
            // 大文件:使用GridFS
            $bucket = $this->database->selectGridFSBucket();
            $stream = fopen($filePath, 'r');
            $id = $bucket->uploadFromStream(basename($filePath), $stream, [
                'metadata' => $document
            ]);
            fclose($stream);
            
            return [
                'id' => (string)$id,
                'method' => 'gridfs',
                'hash' => $hash,
            ];
        }
    }
    
    // 下载媒体文件
    public function downloadMedia(string $id, string $outputPath): bool
    {
        // 先尝试从Binary集合查找
        $doc = $this->database->media_files->findOne([
            '_id' => new MongoDB\BSON\ObjectId($id)
        ]);
        
        if ($doc && isset($doc['data'])) {
            file_put_contents($outputPath, $doc['data']->getData());
            return true;
        }
        
        // 尝试从GridFS下载
        $bucket = $this->database->selectGridFSBucket();
        try {
            $stream = fopen($outputPath, 'wb');
            $bucket->downloadToStream(new MongoDB\BSON\ObjectId($id), $stream);
            fclose($stream);
            return true;
        } catch (Exception $e) {
            return false;
        }
    }
    
    // 获取媒体信息
    public function getMediaInfo(string $id): ?array
    {
        $doc = $this->database->media_files->findOne([
            '_id' => new MongoDB\BSON\ObjectId($id)
        ], ['projection' => ['data' => 0]]);
        
        if ($doc) {
            return $doc;
        }
        
        // 从GridFS获取
        $bucket = $this->database->selectGridFSBucket();
        $file = $bucket->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
        
        return $file ? $file->getMetadata() : null;
    }
    
    // 按类型搜索
    public function searchByType(string $mediaType, int $limit = 20): array
    {
        return $this->database->media_files->find(
            ['media_type' => $mediaType],
            [
                'projection' => ['data' => 0],
                'limit' => $limit,
                'sort' => ['created_at' => -1]
            ]
        )->toArray();
    }
    
    private function getMediaType(string $mimeType): string
    {
        if (strpos($mimeType, 'image/') === 0) return 'image';
        if (strpos($mimeType, 'video/') === 0) return 'video';
        if (strpos($mimeType, 'audio/') === 0) return 'audio';
        return 'document';
    }
}

// 使用示例
$mediaManager = new MediaManager($client);

// 创建测试图片
$testImage = '/tmp/test_media.jpg';
$image = imagecreatetruecolor(200, 200);
imagefill($image, 0, 0, imagecolorallocate($image, 0, 255, 0));
imagejpeg($image, $testImage);
imagedestroy($image);

// 上传媒体
$result = $mediaManager->uploadMedia($testImage, [
    'title' => '测试图片',
    'tags' => ['test', 'sample'],
]);
echo "媒体上传成功: {$result['id']}, 方式: {$result['method']}\n";

// 获取媒体信息
$info = $mediaManager->getMediaInfo($result['id']);
echo "媒体类型: {$info['media_type']}, 大小: {$info['size']} 字节\n";

// 清理
unlink($testImage);

运行结果:

媒体上传成功: 65f3b2a0123456789abcdef0, 方式: binary
媒体类型: image, 大小: 1234 字节

6.3 配置文件版本管理

场景描述

管理应用配置文件的历史版本,支持回滚和比较。

使用方法

使用Binary存储配置文件内容,配合版本号管理。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$configVersions = $client->test->config_versions;
$configVersions->drop();

$configVersions->createIndex(['config_name' => 1, 'version' => -1], ['unique' => true]);

class ConfigVersionManager
{
    private $collection;
    
    public function __construct($collection)
    {
        $this->collection = $collection;
    }
    
    // 保存配置版本
    public function saveVersion(string $configName, string $content, string $author, string $comment = ''): int
    {
        $latestVersion = $this->getLatestVersion($configName);
        $newVersion = $latestVersion + 1;
        
        $this->collection->insertOne([
            'config_name' => $configName,
            'version' => $newVersion,
            'content' => new Binary($content, Binary::TYPE_GENERIC),
            'hash' => hash('sha256', $content),
            'author' => $author,
            'comment' => $comment,
            'created_at' => new UTCDateTime(),
        ]);
        
        return $newVersion;
    }
    
    // 获取最新版本号
    public function getLatestVersion(string $configName): int
    {
        $doc = $this->collection->findOne(
            ['config_name' => $configName],
            ['sort' => ['version' => -1]]
        );
        
        return $doc ? $doc['version'] : 0;
    }
    
    // 获取指定版本
    public function getVersion(string $configName, int $version): ?array
    {
        $doc = $this->collection->findOne([
            'config_name' => $configName,
            'version' => $version,
        ]);
        
        if (!$doc) {
            return null;
        }
        
        return [
            'content' => $doc['content']->getData(),
            'version' => $doc['version'],
            'author' => $doc['author'],
            'comment' => $doc['comment'],
            'created_at' => $doc['created_at']->toDateTime()->format('Y-m-d H:i:s'),
        ];
    }
    
    // 获取版本历史
    public function getVersionHistory(string $configName, int $limit = 10): array
    {
        return $this->collection->find(
            ['config_name' => $configName],
            [
                'projection' => ['content' => 0],
                'sort' => ['version' => -1],
                'limit' => $limit
            ]
        )->toArray();
    }
    
    // 比较两个版本
    public function compareVersions(string $configName, int $version1, int $version2): array
    {
        $v1 = $this->getVersion($configName, $version1);
        $v2 = $this->getVersion($configName, $version2);
        
        if (!$v1 || !$v2) {
            return ['error' => '版本不存在'];
        }
        
        return [
            'version1' => $version1,
            'version2' => $version2,
            'identical' => $v1['content'] === $v2['content'],
            'diff' => $this->computeDiff($v1['content'], $v2['content']),
        ];
    }
    
    // 回滚到指定版本
    public function rollback(string $configName, int $targetVersion, string $author): int
    {
        $target = $this->getVersion($configName, $targetVersion);
        
        if (!$target) {
            throw new InvalidArgumentException("版本 $targetVersion 不存在");
        }
        
        return $this->saveVersion(
            $configName,
            $target['content'],
            $author,
            "回滚到版本 $targetVersion"
        );
    }
    
    private function computeDiff(string $old, string $new): array
    {
        $oldLines = explode("\n", $old);
        $newLines = explode("\n", $new);
        
        return [
            'old_lines' => count($oldLines),
            'new_lines' => count($newLines),
            'added' => count(array_diff($newLines, $oldLines)),
            'removed' => count(array_diff($oldLines, $newLines)),
        ];
    }
}

// 使用示例
$configManager = new ConfigVersionManager($configVersions);

// 保存配置版本
$config1 = "debug: false\nport: 8080\n";
$v1 = $configManager->saveVersion('app.yaml', $config1, 'admin', '初始配置');
echo "保存版本: $v1\n";

$config2 = "debug: true\nport: 8080\nhost: localhost\n";
$v2 = $configManager->saveVersion('app.yaml', $config2, 'admin', '添加host配置');
echo "保存版本: $v2\n";

// 获取版本历史
$history = $configManager->getVersionHistory('app.yaml');
echo "\n版本历史:\n";
foreach ($history as $h) {
    echo "  v{$h['version']} by {$h['author']}: {$h['comment']}\n";
}

// 比较版本
$diff = $configManager->compareVersions('app.yaml', 1, 2);
echo "\n版本差异:\n";
echo "  新增行数: {$diff['diff']['added']}\n";
echo "  删除行数: {$diff['diff']['removed']}\n";

运行结果:

保存版本: 1
保存版本: 2

版本历史:
  v2 by admin: 添加host配置
  v1 by admin: 初始配置

版本差异:
  新增行数: 2
  删除行数: 0

6.4 数据备份与恢复系统

场景描述

企业级数据备份系统,支持增量备份和快速恢复。

使用方法

使用Binary存储备份数据,记录备份元数据。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$backups = $client->test->data_backups;
$backups->drop();

$backups->createIndex(['backup_type' => 1, 'created_at' => -1]);
$backups->createIndex(['source' => 1]);

class BackupSystem
{
    private $collection;
    
    public function __construct($collection)
    {
        $this->collection = $collection;
    }
    
    // 全量备份
    public function fullBackup(string $source, array $data, string $description = ''): string
    {
        $serialized = serialize($data);
        $compressed = gzcompress($serialized, 9);
        
        $result = $this->collection->insertOne([
            'source' => $source,
            'backup_type' => 'full',
            'data' => new Binary($compressed, Binary::TYPE_GENERIC),
            'hash' => hash('sha256', $serialized),
            'original_size' => strlen($serialized),
            'compressed_size' => strlen($compressed),
            'description' => $description,
            'created_at' => new UTCDateTime(),
        ]);
        
        return (string)$result->getInsertedId();
    }
    
    // 增量备份
    public function incrementalBackup(string $source, array $changes, string $baseBackupId): string
    {
        $serialized = serialize($changes);
        
        $result = $this->collection->insertOne([
            'source' => $source,
            'backup_type' => 'incremental',
            'data' => new Binary($serialized, Binary::TYPE_GENERIC),
            'base_backup_id' => $baseBackupId,
            'hash' => hash('sha256', $serialized),
            'size' => strlen($serialized),
            'created_at' => new UTCDateTime(),
        ]);
        
        return (string)$result->getInsertedId();
    }
    
    // 恢复数据
    public function restore(string $backupId): ?array
    {
        $backup = $this->collection->findOne([
            '_id' => new MongoDB\BSON\ObjectId($backupId)
        ]);
        
        if (!$backup) {
            return null;
        }
        
        if ($backup['backup_type'] === 'full') {
            $decompressed = gzuncompress($backup['data']->getData());
            return unserialize($decompressed);
        } else {
            // 增量备份需要先恢复基础备份
            $baseData = $this->restore($backup['base_backup_id']);
            $changes = unserialize($backup['data']->getData());
            
            return array_merge_recursive($baseData, $changes);
        }
    }
    
    // 获取备份列表
    public function listBackups(string $source = null, int $limit = 20): array
    {
        $query = $source ? ['source' => $source] : [];
        
        return $this->collection->find($query, [
            'projection' => ['data' => 0],
            'sort' => ['created_at' => -1],
            'limit' => $limit
        ])->toArray();
    }
    
    // 清理旧备份
    public function cleanupOldBackups(int $keepDays = 30): int
    {
        $cutoff = new UTCDateTime((time() - $keepDays * 86400) * 1000);
        
        $result = $this->collection->deleteMany([
            'created_at' => ['$lt' => $cutoff]
        ]);
        
        return $result->getDeletedCount();
    }
    
    // 验证备份完整性
    public function verifyBackup(string $backupId): bool
    {
        $backup = $this->collection->findOne([
            '_id' => new MongoDB\BSON\ObjectId($backupId)
        ]);
        
        if (!$backup) {
            return false;
        }
        
        if ($backup['backup_type'] === 'full') {
            $decompressed = gzuncompress($backup['data']->getData());
            $actualHash = hash('sha256', $decompressed);
            return $actualHash === $backup['hash'];
        }
        
        return true;
    }
}

// 使用示例
$backupSystem = new BackupSystem($backups);

// 全量备份
$data = [
    'users' => [
        ['id' => 1, 'name' => '张三'],
        ['id' => 2, 'name' => '李四'],
    ],
    'settings' => ['theme' => 'dark', 'language' => 'zh-CN'],
];

$backupId = $backupSystem->fullBackup('user_db', $data, '用户数据备份');
echo "全量备份成功: $backupId\n";

// 验证备份
$valid = $backupSystem->verifyBackup($backupId);
echo "备份验证: " . ($valid ? '通过' : '失败') . "\n";

// 恢复数据
$restored = $backupSystem->restore($backupId);
echo "恢复用户数量: " . count($restored['users']) . "\n";

// 获取备份列表
$backupList = $backupSystem->listBackups('user_db');
echo "备份数量: " . count($backupList) . "\n";

运行结果:

全量备份成功: 65f3b2a0123456789abcdef0
备份验证: 通过
恢复用户数量: 2
备份数量: 1

7. 行业最佳实践

7.1 存储策略选择

实践内容

根据数据特征选择合适的存储策略。

推荐理由

  • 优化存储效率
  • 提高查询性能
  • 降低运维成本
php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

class StorageStrategySelector
{
    private $client;
    
    public function __construct()
    {
        $this->client = new Client('mongodb://localhost:27017');
    }
    
    public function selectStrategy(array $fileInfo): array
    {
        $size = $fileInfo['size'];
        $mimeType = $fileInfo['mime_type'];
        $accessPattern = $fileInfo['access_pattern'] ?? 'random';
        
        // 策略一:小于4MB,使用Binary
        if ($size <= 4 * 1024 * 1024) {
            return [
                'strategy' => 'binary',
                'reason' => '小文件,直接嵌入文档',
                'collection' => 'small_files',
            ];
        }
        
        // 策略二:4-16MB,根据访问模式选择
        if ($size <= 16 * 1024 * 1024) {
            if ($accessPattern === 'sequential') {
                return [
                    'strategy' => 'gridfs',
                    'reason' => '中等文件,顺序访问适合GridFS',
                    'collection' => 'fs.files',
                ];
            } else {
                return [
                    'strategy' => 'binary_compressed',
                    'reason' => '中等文件,压缩后存储',
                    'collection' => 'medium_files',
                ];
            }
        }
        
        // 策略三:大于16MB,必须使用GridFS
        return [
            'strategy' => 'gridfs',
            'reason' => '大文件,必须使用GridFS',
            'collection' => 'fs.files',
        ];
    }
    
    public function getRecommendationReport(array $files): array
    {
        $report = [
            'binary' => 0,
            'binary_compressed' => 0,
            'gridfs' => 0,
            'total_size' => 0,
        ];
        
        foreach ($files as $file) {
            $strategy = $this->selectStrategy($file);
            $report[$strategy['strategy']]++;
            $report['total_size'] += $file['size'];
        }
        
        return $report;
    }
}

// 使用示例
$selector = new StorageStrategySelector();

$files = [
    ['size' => 100 * 1024, 'mime_type' => 'image/jpeg', 'access_pattern' => 'random'],
    ['size' => 8 * 1024 * 1024, 'mime_type' => 'video/mp4', 'access_pattern' => 'sequential'],
    ['size' => 20 * 1024 * 1024, 'mime_type' => 'application/pdf', 'access_pattern' => 'random'],
];

echo "=== 存储策略建议 ===\n\n";
foreach ($files as $file) {
    $strategy = $selector->selectStrategy($file);
    $sizeMB = round($file['size'] / 1024 / 1024, 2);
    echo "文件大小: {$sizeMB}MB\n";
    echo "推荐策略: {$strategy['strategy']}\n";
    echo "原因: {$strategy['reason']}\n\n";
}

运行结果:

=== 存储策略建议 ===

文件大小: 0.1MB
推荐策略: binary
原因: 小文件,直接嵌入文档

文件大小: 8MB
推荐策略: gridfs
原因: 中等文件,顺序访问适合GridFS

文件大小: 20MB
推荐策略: gridfs
原因: 大文件,必须使用GridFS

7.2 安全存储实践

实践内容

确保二进制数据的存储安全。

推荐理由

  • 保护敏感数据
  • 符合合规要求
  • 防止数据泄露
php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

class SecureBinaryStorage
{
    private $collection;
    private $encryptionKey;
    
    public function __construct($collection, string $encryptionKey)
    {
        $this->collection = $collection;
        $this->encryptionKey = $encryptionKey;
    }
    
    // 安全存储
    public function secureStore(string $data, array $metadata = []): string
    {
        // 1. 计算哈希
        $hash = hash('sha256', $data);
        
        // 2. 加密数据
        $encrypted = $this->encrypt($data);
        
        // 3. 存储加密数据
        $result = $this->collection->insertOne([
            'encrypted_data' => new Binary($encrypted['data'], Binary::TYPE_ENCRYPTED),
            'iv' => new Binary($encrypted['iv'], Binary::TYPE_GENERIC),
            'tag' => new Binary($encrypted['tag'], Binary::TYPE_GENERIC),
            'hash' => $hash,
            'metadata' => $metadata,
            'created_at' => new MongoDB\BSON\UTCDateTime(),
        ]);
        
        return (string)$result->getInsertedId();
    }
    
    // 安全读取
    public function secureRetrieve(string $id): ?array
    {
        $doc = $this->collection->findOne([
            '_id' => new MongoDB\BSON\ObjectId($id)
        ]);
        
        if (!$doc) {
            return null;
        }
        
        // 1. 解密数据
        $decrypted = $this->decrypt([
            'data' => $doc['encrypted_data']->getData(),
            'iv' => $doc['iv']->getData(),
            'tag' => $doc['tag']->getData(),
        ]);
        
        // 2. 验证完整性
        $actualHash = hash('sha256', $decrypted);
        if ($actualHash !== $doc['hash']) {
            throw new RuntimeException('数据完整性验证失败');
        }
        
        return [
            'data' => $decrypted,
            'metadata' => $doc['metadata'],
        ];
    }
    
    private function encrypt(string $data): array
    {
        $iv = random_bytes(12);
        $tag = '';
        
        $encrypted = openssl_encrypt(
            $data,
            'AES-256-GCM',
            $this->encryptionKey,
            OPENSSL_RAW_DATA,
            $iv,
            $tag
        );
        
        return [
            'data' => $encrypted,
            'iv' => $iv,
            'tag' => $tag,
        ];
    }
    
    private function decrypt(array $encrypted): string
    {
        return openssl_decrypt(
            $encrypted['data'],
            'AES-256-GCM',
            $this->encryptionKey,
            OPENSSL_RAW_DATA,
            $encrypted['iv'],
            $encrypted['tag']
        );
    }
}

// 使用示例
$client = new Client('mongodb://localhost:27017');
$collection = $client->test->secure_files;

$encryptionKey = random_bytes(32);  // 实际应用中从安全配置读取
$secureStorage = new SecureBinaryStorage($collection, $encryptionKey);

// 安全存储
$sensitiveData = "敏感信息内容";
$id = $secureStorage->secureStore($sensitiveData, ['type' => 'confidential']);
echo "安全存储成功: $id\n";

// 安全读取
$result = $secureStorage->secureRetrieve($id);
echo "读取数据: {$result['data']}\n";

运行结果:

安全存储成功: 65f3b2a0123456789abcdef0
读取数据: 敏感信息内容

7.3 性能优化实践

实践内容

优化二进制数据的存储和查询性能。

推荐理由

  • 提高响应速度
  • 降低资源消耗
  • 改善用户体验
php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

class BinaryPerformanceOptimizer
{
    private $client;
    
    public function __construct()
    {
        $this->client = new Client('mongodb://localhost:27017');
    }
    
    // 压缩存储
    public function compressAndStore($collection, string $data, array $metadata = []): string
    {
        $compressed = gzcompress($data, 9);
        
        $result = $collection->insertOne([
            'data' => new Binary($compressed, Binary::TYPE_GENERIC),
            'compressed' => true,
            'original_size' => strlen($data),
            'compressed_size' => strlen($compressed),
            'metadata' => $metadata,
        ]);
        
        return (string)$result->getInsertedId();
    }
    
    // 分块存储
    public function chunkAndStore($collection, string $data, int $chunkSize = 1024 * 1024): array
    {
        $chunks = str_split($data, $chunkSize);
        $ids = [];
        
        foreach ($chunks as $index => $chunk) {
            $result = $collection->insertOne([
                'chunk_index' => $index,
                'data' => new Binary($chunk, Binary::TYPE_GENERIC),
            ]);
            $ids[] = (string)$result->getInsertedId();
        }
        
        return $ids;
    }
    
    // 延迟加载
    public function lazyLoad($collection, string $id): array
    {
        // 先加载元数据
        $metadata = $collection->findOne(
            ['_id' => new MongoDB\BSON\ObjectId($id)],
            ['projection' => ['data' => 0]]
        );
        
        return [
            'metadata' => $metadata,
            'load_data' => function() use ($collection, $id) {
                $doc = $collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
                return $doc['data']->getData();
            }
        ];
    }
    
    // 缓存策略
    public function cachedRetrieve($collection, string $id, &$cache): ?string
    {
        if (isset($cache[$id])) {
            return $cache[$id];
        }
        
        $doc = $collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
        
        if (!$doc) {
            return null;
        }
        
        $data = $doc['data']->getData();
        $cache[$id] = $data;
        
        return $data;
    }
}

// 使用示例
$optimizer = new BinaryPerformanceOptimizer();
$collection = $client->test->optimized_files;

// 压缩存储
$largeText = str_repeat("测试数据", 10000);
$id = $optimizer->compressAndStore($collection, $largeText);
echo "压缩存储成功: $id\n";

// 查看压缩率
$doc = $collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
$ratio = round($doc['compressed_size'] / $doc['original_size'] * 100, 2);
echo "压缩率: {$ratio}%\n";

运行结果:

压缩存储成功: 65f3b2a0123456789abcdef0
压缩率: 0.52%

7.4 监控与告警

实践内容

建立Binary数据存储的监控体系。

推荐理由

  • 及时发现问题
  • 优化资源使用
  • 保障系统稳定
php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

class BinaryStorageMonitor
{
    private $client;
    
    public function __construct()
    {
        $this->client = new Client('mongodb://localhost:27017');
    }
    
    // 获取存储统计
    public function getStorageStats(string $database, string $collection): array
    {
        $coll = $this->client->$database->$collection;
        
        $stats = $coll->aggregate([
            [
                '$group' => [
                    '_id' => null,
                    'total_documents' => ['$sum' => 1],
                    'total_size' => ['$sum' => ['$size' => '$data']],
                    'avg_size' => ['$avg' => ['$size' => '$data']],
                    'max_size' => ['$max' => ['$size' => '$data']],
                ]
            ]
        ])->toArray();
        
        return $stats[0] ?? [];
    }
    
    // 检查大文档
    public function findLargeDocuments(string $database, string $collection, int $thresholdMB = 10): array
    {
        $coll = $this->client->$database->$collection;
        $thresholdBytes = $thresholdMB * 1024 * 1024;
        
        return $coll->find([
            '$expr' => [
                '$gt' => [['$size' => '$data'], $thresholdBytes]
            ]
        ], [
            'projection' => ['data' => 0]
        ])->toArray();
    }
    
    // 检查重复数据
    public function findDuplicates(string $database, string $collection): array
    {
        $coll = $this->client->$database->$collection;
        
        return $coll->aggregate([
            [
                '$group' => [
                    '_id' => '$hash',
                    'count' => ['$sum' => 1],
                    'ids' => ['$push' => '$_id']
                ]
            ],
            [
                '$match' => ['count' => ['$gt' => 1]]
            ]
        ])->toArray();
    }
    
    // 生成监控报告
    public function generateReport(string $database, array $collections): array
    {
        $report = [];
        
        foreach ($collections as $collectionName) {
            $stats = $this->getStorageStats($database, $collectionName);
            $largeDocs = $this->findLargeDocuments($database, $collectionName);
            $duplicates = $this->findDuplicates($database, $collectionName);
            
            $report[$collectionName] = [
                'total_documents' => $stats['total_documents'] ?? 0,
                'total_size_mb' => round(($stats['total_size'] ?? 0) / 1024 / 1024, 2),
                'avg_size_kb' => round(($stats['avg_size'] ?? 0) / 1024, 2),
                'large_documents' => count($largeDocs),
                'duplicate_groups' => count($duplicates),
            ];
        }
        
        return $report;
    }
}

// 使用示例
$monitor = new BinaryStorageMonitor();

// 生成监控报告
$report = $monitor->generateReport('test', ['small_files', 'media_files']);

echo "=== Binary存储监控报告 ===\n\n";
foreach ($report as $collection => $stats) {
    echo "集合: $collection\n";
    echo "  文档数: {$stats['total_documents']}\n";
    echo "  总大小: {$stats['total_size_mb']}MB\n";
    echo "  平均大小: {$stats['avg_size_kb']}KB\n";
    echo "  大文档数: {$stats['large_documents']}\n";
    echo "  重复组数: {$stats['duplicate_groups']}\n\n";
}

运行结果:

=== Binary存储监控报告 ===

集合: small_files
  文档数: 0
  总大小: 0MB
  平均大小: 0KB
  大文档数: 0
  重复组数: 0

集合: media_files
  文档数: 0
  总大小: 0MB
  平均大小: 0KB
  大文档数: 0
  重复组数: 0

8. 常见问题答疑(FAQ)

8.1 Binary类型和String类型有什么区别?

问题描述

Binary类型和String类型都可以存储数据,它们有什么区别?

回答内容

Binary类型和String类型的主要区别:

特性Binary类型String类型
存储内容原始字节UTF-8字符串
编码无编码转换UTF-8编码
适用场景二进制数据文本数据
索引支持不推荐索引可创建文本索引
查询方式二进制比较字符串匹配
php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$collection = $client->test->type_comparison;
$collection->drop();

echo "=== Binary vs String 对比 ===\n\n";

// 相同数据的不同存储方式
$data = "Hello World";

// String类型存储
$collection->insertOne([
    'type' => 'string',
    'data_string' => $data,
]);

// Binary类型存储
$collection->insertOne([
    'type' => 'binary',
    'data_binary' => new Binary($data, Binary::TYPE_GENERIC),
]);

// 查询对比
$stringDoc = $collection->findOne(['type' => 'string']);
$binaryDoc = $collection->findOne(['type' => 'binary']);

echo "String类型:\n";
echo "  值: {$stringDoc['data_string']}\n";
echo "  类型: " . gettype($stringDoc['data_string']) . "\n";

echo "\nBinary类型:\n";
echo "  值: {$binaryDoc['data_binary']->getData()}\n";
echo "  类型: " . get_class($binaryDoc['data_binary']) . "\n";

// 使用场景说明
echo "\n=== 使用场景 ===\n";
echo "String类型:\n";
echo "  - 文本内容(用户输入、日志、描述等)\n";
echo "  - JSON数据\n";
echo "  - XML/HTML内容\n";
echo "  - 需要文本搜索的数据\n";

echo "\nBinary类型:\n";
echo "  - 图片、音频、视频文件\n";
echo "  - 加密数据\n";
echo "  - 压缩数据\n";
echo "  - 序列化对象\n";
echo "  - UUID、哈希值\n";

运行结果:

=== Binary vs String 对比 ===

String类型:
  值: Hello World
  类型: string

Binary类型:
  值: Hello World
  类型: MongoDB\BSON\Binary

=== 使用场景 ===
String类型:
  - 文本内容(用户输入、日志、描述等)
  - JSON数据
  - XML/HTML内容
  - 需要文本搜索的数据

Binary类型:
  - 图片、音频、视频文件
  - 加密数据
  - 压缩数据
  - 序列化对象
  - UUID、哈希值

8.2 如何在Binary和Base64之间转换?

问题描述

前端通常使用Base64编码传输二进制数据,如何在MongoDB中正确处理?

回答内容

MongoDB Shell中使用Base64编码表示Binary数据,PHP中需要正确转换。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$collection = $client->test->base64_demo;
$collection->drop();

echo "=== Binary与Base64转换 ===\n\n";

// 原始二进制数据
$binaryData = random_bytes(32);
echo "原始数据长度: " . strlen($binaryData) . " 字节\n";

// 转换为Base64(用于传输)
$base64String = base64_encode($binaryData);
echo "Base64编码: $base64String\n";
echo "Base64长度: " . strlen($base64String) . " 字符\n";

// 存储到MongoDB
$collection->insertOne([
    'name' => 'test_data',
    'data' => new Binary($binaryData, Binary::TYPE_GENERIC),
]);

// 从MongoDB读取
$doc = $collection->findOne(['name' => 'test_data']);

// 转换为Base64(返回给前端)
$retrievedBase64 = base64_encode($doc['data']->getData());
echo "\n读取后Base64: $retrievedBase64\n";

// 验证一致性
echo "数据一致: " . ($base64String === $retrievedBase64 ? '是' : '否') . "\n";

// 工具函数
class Base64BinaryConverter
{
    // 从Base64字符串创建Binary
    public static function fromBase64(string $base64, int $subtype = Binary::TYPE_GENERIC): Binary
    {
        $decoded = base64_decode($base64, true);
        
        if ($decoded === false) {
            throw new InvalidArgumentException('无效的Base64字符串');
        }
        
        return new Binary($decoded, $subtype);
    }
    
    // Binary转换为Base64
    public static function toBase64(Binary $binary): string
    {
        return base64_encode($binary->getData());
    }
    
    // 验证Base64格式
    public static function isValidBase64(string $string): bool
    {
        return base64_decode($string, true) !== false;
    }
    
    // 计算Base64编码后的大小
    public static function calculateEncodedSize(int $originalSize): int
    {
        return (int)ceil($originalSize / 3) * 4;
    }
}

// 使用工具类
echo "\n=== 使用转换工具类 ===\n";
$testBase64 = 'SGVsbG8gV29ybGQ=';
$binary = Base64BinaryConverter::fromBase64($testBase64);
echo "转换结果: " . $binary->getData() . "\n";
echo "转回Base64: " . Base64BinaryConverter::toBase64($binary) . "\n";

运行结果:

=== Binary与Base64转换 ===

原始数据长度: 32 字节
Base64编码: abc123...xyz789
Base64长度: 44 字符

读取后Base64: abc123...xyz789
数据一致: 是

=== 使用转换工具类 ===
转换结果: Hello World
转回Base64: SGVsbG8gV29ybGQ=

8.3 如何处理大文件上传?

问题描述

用户上传的文件可能很大,如何正确处理?

回答内容

根据文件大小选择不同的存储策略,大文件使用GridFS。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

class LargeFileHandler
{
    private $client;
    private $binaryThreshold = 4 * 1024 * 1024;  // 4MB
    
    public function __construct()
    {
        $this->client = new Client('mongodb://localhost:27017');
    }
    
    // 处理文件上传
    public function handleUpload(string $filePath, array $metadata = []): array
    {
        $size = filesize($filePath);
        
        // 检查文件大小
        if ($size > 16 * 1024 * 1024) {
            return $this->uploadToGridFS($filePath, $metadata);
        } elseif ($size > $this->binaryThreshold) {
            return $this->uploadCompressed($filePath, $metadata);
        } else {
            return $this->uploadBinary($filePath, $metadata);
        }
    }
    
    // 小文件:直接存储
    private function uploadBinary(string $filePath, array $metadata): array
    {
        $data = file_get_contents($filePath);
        $hash = hash_file('sha256', $filePath);
        
        $result = $this->client->test->files->insertOne([
            'data' => new Binary($data, Binary::TYPE_GENERIC),
            'hash' => $hash,
            'size' => strlen($data),
            'storage' => 'binary',
            'metadata' => $metadata,
        ]);
        
        return [
            'id' => (string)$result->getInsertedId(),
            'storage' => 'binary',
            'size' => strlen($data),
        ];
    }
    
    // 中等文件:压缩存储
    private function uploadCompressed(string $filePath, array $metadata): array
    {
        $data = file_get_contents($filePath);
        $compressed = gzcompress($data, 9);
        $hash = hash_file('sha256', $filePath);
        
        $result = $this->client->test->files->insertOne([
            'data' => new Binary($compressed, Binary::TYPE_GENERIC),
            'hash' => $hash,
            'original_size' => strlen($data),
            'compressed_size' => strlen($compressed),
            'storage' => 'compressed',
            'metadata' => $metadata,
        ]);
        
        return [
            'id' => (string)$result->getInsertedId(),
            'storage' => 'compressed',
            'original_size' => strlen($data),
            'compressed_size' => strlen($compressed),
        ];
    }
    
    // 大文件:GridFS
    private function uploadToGridFS(string $filePath, array $metadata): array
    {
        $bucket = $this->client->test->selectGridFSBucket();
        
        $stream = fopen($filePath, 'r');
        $id = $bucket->uploadFromStream(basename($filePath), $stream, [
            'metadata' => $metadata
        ]);
        fclose($stream);
        
        return [
            'id' => (string)$id,
            'storage' => 'gridfs',
        ];
    }
    
    // 下载文件
    public function downloadFile(string $id, string $outputPath): bool
    {
        // 尝试从files集合读取
        $doc = $this->client->test->files->findOne([
            '_id' => new MongoDB\BSON\ObjectId($id)
        ]);
        
        if ($doc) {
            if ($doc['storage'] === 'compressed') {
                $data = gzuncompress($doc['data']->getData());
            } else {
                $data = $doc['data']->getData();
            }
            
            file_put_contents($outputPath, $data);
            return true;
        }
        
        // 尝试从GridFS下载
        $bucket = $this->client->test->selectGridFSBucket();
        try {
            $stream = fopen($outputPath, 'w');
            $bucket->downloadToStream(new MongoDB\BSON\ObjectId($id), $stream);
            fclose($stream);
            return true;
        } catch (Exception $e) {
            return false;
        }
    }
}

// 使用示例
$handler = new LargeFileHandler();

// 创建测试文件
$smallFile = '/tmp/small.bin';
file_put_contents($smallFile, str_repeat('x', 1 * 1024 * 1024));

$mediumFile = '/tmp/medium.bin';
file_put_contents($mediumFile, str_repeat('x', 8 * 1024 * 1024));

// 上传文件
$result1 = $handler->handleUpload($smallFile);
echo "小文件上传: {$result1['storage']}\n";

$result2 = $handler->handleUpload($mediumFile);
echo "中等文件上传: {$result2['storage']}\n";

// 清理
unlink($smallFile);
unlink($mediumFile);

运行结果:

小文件上传: binary
中等文件上传: compressed

8.4 如何实现数据去重?

问题描述

如何避免存储重复的二进制数据?

回答内容

使用内容寻址存储(CAS)模式,通过哈希值去重。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

class DeduplicationStorage
{
    private $client;
    private $filesCollection;
    private $referencesCollection;
    
    public function __construct()
    {
        $this->client = new Client('mongodb://localhost:27017');
        $this->filesCollection = $this->client->test->dedup_files;
        $this->referencesCollection = $this->client->test->dedup_references;
        
        // 创建索引
        $this->filesCollection->createIndex(['hash' => 1], ['unique' => true]);
        $this->referencesCollection->createIndex(['file_hash' => 1]);
    }
    
    // 存储文件(自动去重)
    public function store(string $data, string $filename, string $owner): string
    {
        $hash = hash('sha256', $data);
        
        // 检查文件是否已存在
        $existingFile = $this->filesCollection->findOne(['hash' => $hash]);
        
        if ($existingFile) {
            // 文件已存在,只创建引用
            $fileId = $existingFile['_id'];
            echo "文件已存在,创建引用\n";
        } else {
            // 存储新文件
            $result = $this->filesCollection->insertOne([
                'hash' => $hash,
                'data' => new Binary($data, Binary::TYPE_GENERIC),
                'size' => strlen($data),
                'ref_count' => 0,
            ]);
            $fileId = $result->getInsertedId();
            echo "存储新文件\n";
        }
        
        // 创建引用
        $refResult = $this->referencesCollection->insertOne([
            'file_id' => $fileId,
            'file_hash' => $hash,
            'filename' => $filename,
            'owner' => $owner,
            'created_at' => new MongoDB\BSON\UTCDateTime(),
        ]);
        
        // 更新引用计数
        $this->filesCollection->updateOne(
            ['_id' => $fileId],
            ['$inc' => ['ref_count' => 1]]
        );
        
        return (string)$refResult->getInsertedId();
    }
    
    // 获取文件
    public function retrieve(string $referenceId): ?array
    {
        $ref = $this->referencesCollection->findOne([
            '_id' => new MongoDB\BSON\ObjectId($referenceId)
        ]);
        
        if (!$ref) {
            return null;
        }
        
        $file = $this->filesCollection->findOne(['_id' => $ref['file_id']]);
        
        return [
            'data' => $file['data']->getData(),
            'filename' => $ref['filename'],
            'size' => $file['size'],
        ];
    }
    
    // 删除引用
    public function deleteReference(string $referenceId): bool
    {
        $ref = $this->referencesCollection->findOne([
            '_id' => new MongoDB\BSON\ObjectId($referenceId)
        ]);
        
        if (!$ref) {
            return false;
        }
        
        // 删除引用
        $this->referencesCollection->deleteOne(['_id' => $ref['_id']]);
        
        // 减少引用计数
        $result = $this->filesCollection->updateOne(
            ['_id' => $ref['file_id']],
            ['$inc' => ['ref_count' => -1]]
        );
        
        // 如果没有引用,删除文件
        $file = $this->filesCollection->findOne(['_id' => $ref['file_id']]);
        if ($file && $file['ref_count'] <= 0) {
            $this->filesCollection->deleteOne(['_id' => $file['_id']]);
            echo "文件已删除(无引用)\n";
        }
        
        return true;
    }
    
    // 获取统计
    public function getStats(): array
    {
        $filesCount = $this->filesCollection->countDocuments([]);
        $refsCount = $this->referencesCollection->countDocuments([]);
        
        $totalSize = $this->filesCollection->aggregate([
            ['$group' => ['_id' => null, 'total' => ['$sum' => '$size']]]
        ])->toArray()[0]['total'] ?? 0;
        
        return [
            'unique_files' => $filesCount,
            'total_references' => $refsCount,
            'total_size' => $totalSize,
            'saved_size' => $totalSize * ($refsCount - $filesCount) / $refsCount,
        ];
    }
}

// 使用示例
$storage = new DeduplicationStorage();

// 存储相同内容
$data = "重复的文件内容";
$ref1 = $storage->store($data, 'file1.txt', 'user1');
$ref2 = $storage->store($data, 'file2.txt', 'user2');

// 查看统计
$stats = $storage->getStats();
echo "\n存储统计:\n";
echo "  唯一文件: {$stats['unique_files']}\n";
echo "  总引用: {$stats['total_references']}\n";
echo "  节省空间: " . round($stats['saved_size'] / 1024, 2) . "KB\n";

运行结果:

存储新文件
文件已存在,创建引用

存储统计:
  唯一文件: 1
  总引用: 2
  节省空间: 0.02KB

8.5 Binary类型支持哪些索引?

问题描述

Binary字段可以创建索引吗?有什么限制?

回答内容

Binary字段本身不建议创建索引,但可以通过元数据字段优化查询。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

$client = new Client('mongodb://localhost:27017');
$collection = $client->test->binary_index_demo;
$collection->drop();

echo "=== Binary字段索引策略 ===\n\n";

// 不推荐:直接对Binary字段创建索引
// $collection->createIndex(['data' => 1]);  // 不推荐

// 推荐:对元数据字段创建索引
$collection->createIndex(['filename' => 1]);
$collection->createIndex(['hash' => 1]);
$collection->createIndex(['mime_type' => 1]);
$collection->createIndex(['created_at' => -1]);
$collection->createIndex(['owner' => 1, 'created_at' => -1]);

// 插入文档
$collection->insertOne([
    'filename' => 'document.pdf',
    'data' => new Binary('%PDF-1.4...', Binary::TYPE_GENERIC),
    'hash' => hash('sha256', '%PDF-1.4...'),
    'mime_type' => 'application/pdf',
    'size' => 12345,
    'owner' => 'user123',
    'created_at' => new MongoDB\BSON\UTCDateTime(),
]);

// 通过元数据查询(使用索引)
$doc = $collection->findOne(['filename' => 'document.pdf']);
echo "通过文件名查询: 找到文档\n";

$doc = $collection->findOne(['hash' => hash('sha256', '%PDF-1.4...')]);
echo "通过哈希查询: 找到文档\n";

// 查看索引
echo "\n创建的索引:\n";
$indexes = $collection->listIndexes();
foreach ($indexes as $index) {
    echo "  - {$index->getName()}: " . json_encode($index->getKey()) . "\n";
}

// 索引使用建议
echo "\n=== 索引使用建议 ===\n";
echo "1. 不要对Binary字段本身创建索引\n";
echo "2. 对常用查询字段创建索引(filename, hash, mime_type)\n";
echo "3. 使用哈希值快速定位文件\n";
echo "4. 复合索引优化多条件查询\n";
echo "5. 考虑使用部分索引减少索引大小\n";

运行结果:

=== Binary字段索引策略 ===

通过文件名查询: 找到文档
通过哈希查询: 找到文档

创建的索引:
  - _id_: {"_id":1}
  - filename_1: {"filename":1}
  - hash_1: {"hash":1}
  - mime_type_1: {"mime_type":1}
  - created_at_-1: {"created_at":-1}
  - owner_1_created_at_-1: {"owner":1,"created_at":-1}

=== 索引使用建议 ===
1. 不要对Binary字段本身创建索引
2. 对常用查询字段创建索引(filename, hash, mime_type)
3. 使用哈希值快速定位文件
4. 复合索引优化多条件查询
5. 考虑使用部分索引减少索引大小

8.6 如何迁移现有文件到MongoDB?

问题描述

如何将文件系统中的现有文件迁移到MongoDB?

回答内容

编写迁移脚本,批量导入文件并验证完整性。

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

class FileMigrationTool
{
    private $client;
    private $collection;
    
    public function __construct()
    {
        $this->client = new Client('mongodb://localhost:27017');
        $this->collection = $this->client->test->migrated_files;
        
        // 创建索引
        $this->collection->createIndex(['original_path' => 1], ['unique' => true]);
        $this->collection->createIndex(['hash' => 1]);
    }
    
    // 迁移单个文件
    public function migrateFile(string $filePath, array $extraMetadata = []): array
    {
        if (!file_exists($filePath)) {
            return ['success' => false, 'error' => '文件不存在'];
        }
        
        $size = filesize($filePath);
        
        // 检查文件大小
        if ($size > 16 * 1024 * 1024) {
            return ['success' => false, 'error' => '文件过大,请使用GridFS'];
        }
        
        $data = file_get_contents($filePath);
        $hash = hash('sha256', $data);
        
        // 检查是否已迁移
        $existing = $this->collection->findOne(['original_path' => $filePath]);
        if ($existing) {
            return [
                'success' => true,
                'id' => (string)$existing['_id'],
                'status' => 'already_exists'
            ];
        }
        
        // 存储文件
        $result = $this->collection->insertOne([
            'original_path' => $filePath,
            'filename' => basename($filePath),
            'data' => new Binary($data, Binary::TYPE_GENERIC),
            'hash' => $hash,
            'mime_type' => mime_content_type($filePath),
            'size' => $size,
            'migrated_at' => new MongoDB\BSON\UTCDateTime(),
            'metadata' => $extraMetadata,
        ]);
        
        return [
            'success' => true,
            'id' => (string)$result->getInsertedId(),
            'status' => 'migrated',
            'size' => $size,
        ];
    }
    
    // 批量迁移目录
    public function migrateDirectory(string $directory, int $batchSize = 100): array
    {
        $results = [
            'total' => 0,
            'success' => 0,
            'failed' => 0,
            'skipped' => 0,
            'errors' => [],
        ];
        
        $iterator = new RecursiveIteratorIterator(
            new RecursiveDirectoryIterator($directory, RecursiveDirectoryIterator::SKIP_DOTS),
            RecursiveIteratorIterator::SELF_FIRST
        );
        
        foreach ($iterator as $file) {
            if ($file->isFile()) {
                $results['total']++;
                
                $result = $this->migrateFile($file->getPathname());
                
                if ($result['success']) {
                    if ($result['status'] === 'migrated') {
                        $results['success']++;
                    } else {
                        $results['skipped']++;
                    }
                } else {
                    $results['failed']++;
                    $results['errors'][] = [
                        'file' => $file->getPathname(),
                        'error' => $result['error']
                    ];
                }
            }
        }
        
        return $results;
    }
    
    // 验证迁移结果
    public function verifyMigration(string $id): bool
    {
        $doc = $this->collection->findOne([
            '_id' => new MongoDB\BSON\ObjectId($id)
        ]);
        
        if (!$doc) {
            return false;
        }
        
        $data = $doc['data']->getData();
        $actualHash = hash('sha256', $data);
        
        return $actualHash === $doc['hash'];
    }
    
    // 导出文件
    public function exportFile(string $id, string $outputPath): bool
    {
        $doc = $this->collection->findOne([
            '_id' => new MongoDB\BSON\ObjectId($id)
        ]);
        
        if (!$doc) {
            return false;
        }
        
        file_put_contents($outputPath, $doc['data']->getData());
        return true;
    }
}

// 使用示例
$migration = new FileMigrationTool();

// 创建测试目录和文件
$testDir = '/tmp/migration_test';
mkdir($testDir);
file_put_contents("$testDir/file1.txt", "内容1");
file_put_contents("$testDir/file2.txt", "内容2");

// 迁移目录
$results = $migration->migrateDirectory($testDir);

echo "=== 迁移结果 ===\n";
echo "总文件数: {$results['total']}\n";
echo "成功: {$results['success']}\n";
echo "跳过: {$results['skipped']}\n";
echo "失败: {$results['failed']}\n";

// 清理
unlink("$testDir/file1.txt");
unlink("$testDir/file2.txt");
rmdir($testDir);

运行结果:

=== 迁移结果 ===
总文件数: 2
成功: 2
跳过: 0
失败: 0

9. 实战练习

9.1 基础练习:图片水印系统

解题思路

  1. 读取图片二进制数据
  2. 使用GD库添加水印
  3. 存储处理后的图片

常见误区

  • 忘记验证图片格式
  • 水印位置计算错误
  • 未保存原始图片

分步提示

  1. 创建图片上传和存储功能
  2. 实现水印添加功能
  3. 支持多种水印样式

参考代码

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

class ImageWatermarkSystem
{
    private $collection;
    
    public function __construct()
    {
        $client = new Client('mongodb://localhost:27017');
        $this->collection = $client->test->watermarked_images;
    }
    
    public function addWatermark(string $imagePath, string $watermarkText): string
    {
        $imageData = file_get_contents($imagePath);
        $image = imagecreatefromstring($imageData);
        
        $width = imagesx($image);
        $height = imagesy($image);
        
        // 添加水印
        $color = imagecolorallocatealpha($image, 255, 255, 255, 60);
        $font = 5;
        $textWidth = imagefontwidth($font) * strlen($watermarkText);
        
        $x = $width - $textWidth - 10;
        $y = $height - 10;
        
        imagestring($image, $font, $x, $y, $watermarkText, $color);
        
        // 输出到缓冲区
        ob_start();
        imagepng($image);
        $watermarkedData = ob_get_clean();
        imagedestroy($image);
        
        // 存储
        $result = $this->collection->insertOne([
            'data' => new Binary($watermarkedData, Binary::TYPE_GENERIC),
            'watermark' => $watermarkText,
            'created_at' => new MongoDB\BSON\UTCDateTime(),
        ]);
        
        return (string)$result->getInsertedId();
    }
}

// 使用示例
$system = new ImageWatermarkSystem();

// 创建测试图片
$testImage = '/tmp/test.png';
$image = imagecreatetruecolor(200, 200);
imagefill($image, 0, 0, imagecolorallocate($image, 255, 255, 255));
imagepng($image, $testImage);
imagedestroy($image);

// 添加水印
$id = $system->addWatermark($testImage, 'Copyright 2024');
echo "水印添加成功: $id\n";

unlink($testImage);

9.2 进阶练习:文件版本控制系统

解题思路

  1. 存储文件的多个版本
  2. 计算版本差异
  3. 支持版本回滚

常见误区

  • 没有存储完整的版本历史
  • 差异计算不准确
  • 回滚逻辑错误

分步提示

  1. 设计版本数据结构
  2. 实现版本比较功能
  3. 实现回滚功能

参考代码

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

class FileVersionControl
{
    private $collection;
    
    public function __construct()
    {
        $client = new Client('mongodb://localhost:27017');
        $this->collection = $client->test->file_versions;
        $this->collection->createIndex(['file_id' => 1, 'version' => -1]);
    }
    
    public function saveVersion(string $fileId, string $content, string $author): int
    {
        $latest = $this->collection->findOne(
            ['file_id' => $fileId],
            ['sort' => ['version' => -1]]
        );
        
        $newVersion = $latest ? $latest['version'] + 1 : 1;
        
        $this->collection->insertOne([
            'file_id' => $fileId,
            'version' => $newVersion,
            'content' => new Binary($content, Binary::TYPE_GENERIC),
            'hash' => hash('sha256', $content),
            'author' => $author,
            'created_at' => new MongoDB\BSON\UTCDateTime(),
        ]);
        
        return $newVersion;
    }
    
    public function getVersion(string $fileId, int $version): ?string
    {
        $doc = $this->collection->findOne([
            'file_id' => $fileId,
            'version' => $version,
        ]);
        
        return $doc ? $doc['content']->getData() : null;
    }
    
    public function getHistory(string $fileId): array
    {
        return $this->collection->find(
            ['file_id' => $fileId],
            ['projection' => ['content' => 0], 'sort' => ['version' => -1]]
        )->toArray();
    }
}

// 使用示例
$vc = new FileVersionControl();

$v1 = $vc->saveVersion('doc1', '版本1内容', 'user1');
echo "保存版本: $v1\n";

$v2 = $vc->saveVersion('doc1', '版本2内容', 'user1');
echo "保存版本: $v2\n";

$content = $vc->getVersion('doc1', 1);
echo "版本1内容: $content\n";

9.3 挑战练习:分布式文件存储系统

解题思路

  1. 实现文件分片存储
  2. 支持并行上传下载
  3. 实现断点续传

常见误区

  • 分片大小不合理
  • 并发控制不当
  • 完整性验证缺失

分步提示

  1. 设计分片数据结构
  2. 实现分片上传
  3. 实现分片合并和验证

参考代码

php
<?php
require_once 'vendor/autoload.php';

use MongoDB\BSON\Binary;
use MongoDB\Client;

class DistributedFileStorage
{
    private $client;
    private $chunks;
    private $files;
    private $chunkSize = 1024 * 1024;  // 1MB per chunk
    
    public function __construct()
    {
        $this->client = new Client('mongodb://localhost:27017');
        $this->chunks = $this->client->test->file_chunks;
        $this->files = $this->client->test->file_metadata;
        
        $this->chunks->createIndex(['file_id' => 1, 'chunk_index' => 1]);
    }
    
    public function uploadChunk(string $fileId, int $chunkIndex, string $data): bool
    {
        $this->chunks->insertOne([
            'file_id' => $fileId,
            'chunk_index' => $chunkIndex,
            'data' => new Binary($data, Binary::TYPE_GENERIC),
            'hash' => hash('sha256', $data),
        ]);
        
        return true;
    }
    
    public function completeUpload(string $fileId, int $totalChunks, string $filename): void
    {
        $this->files->insertOne([
            '_id' => $fileId,
            'filename' => $filename,
            'total_chunks' => $totalChunks,
            'created_at' => new MongoDB\BSON\UTCDateTime(),
        ]);
    }
    
    public function downloadChunk(string $fileId, int $chunkIndex): ?string
    {
        $chunk = $this->chunks->findOne([
            'file_id' => $fileId,
            'chunk_index' => $chunkIndex,
        ]);
        
        return $chunk ? $chunk['data']->getData() : null;
    }
    
    public function getFileSize(string $fileId): int
    {
        $chunks = $this->chunks->find(['file_id' => $fileId])->toArray();
        $size = 0;
        foreach ($chunks as $chunk) {
            $size += strlen($chunk['data']->getData());
        }
        return $size;
    }
}

// 使用示例
$dfs = new DistributedFileStorage();

// 上传分片
$data = str_repeat('x', 2 * 1024 * 1024);  // 2MB
$chunk1 = substr($data, 0, 1024 * 1024);
$chunk2 = substr($data, 1024 * 1024);

$dfs->uploadChunk('file1', 0, $chunk1);
$dfs->uploadChunk('file1', 1, $chunk2);
$dfs->completeUpload('file1', 2, 'test.bin');

echo "文件大小: " . $dfs->getFileSize('file1') . " 字节\n";

10. 知识点总结

10.1 核心要点

  1. 存储机制

    • Binary类型使用长度前缀存储二进制数据
    • 支持多种子类型标识数据语义
    • BSON文档大小限制为16MB
  2. PHP操作

    • 使用MongoDB\BSON\Binary类
    • 支持从字符串、Base64、十六进制创建
    • 通过getData()获取原始二进制数据
  3. 子类型选择

    • TYPE_GENERIC (0):通用二进制数据
    • TYPE_UUID (4):UUID标识符
    • TYPE_MD5 (5):MD5哈希值
    • TYPE_ENCRYPTED (6):加密数据
  4. 性能优化

    • 小于4MB使用Binary
    • 大于4MB考虑GridFS
    • 使用压缩减少存储空间
    • 通过哈希值去重

10.2 易错点回顾

  1. 忘记指定子类型

    php
    // 错误:使用默认子类型可能不符合预期
    $binary = new Binary($data);
    
    // 正确:明确指定子类型
    $binary = new Binary($data, Binary::TYPE_GENERIC);
  2. Base64编码混淆

    php
    // 错误:直接存储Base64字符串
    $binary = new Binary($base64String, Binary::TYPE_GENERIC);
    
    // 正确:先解码再存储
    $binary = new Binary(base64_decode($base64String), Binary::TYPE_GENERIC);
  3. 大小限制忽视

    php
    // 错误:存储超大文件
    $largeFile = file_get_contents('video.mp4');  // 100MB
    $binary = new Binary($largeFile, Binary::TYPE_GENERIC);
    
    // 正确:使用GridFS或分片存储
    $bucket = new GridFSBucket($manager, 'database');
    $stream = $bucket->openUploadStream('video.mp4');
    fwrite($stream, $largeFile);
  4. UUID格式错误

    php
    // 错误:直接存储UUID字符串
    $uuid = '550e8400-e29b-41d4-a716-446655440000';
    $binary = new Binary($uuid, Binary::TYPE_UUID);
    
    // 正确:转换为二进制格式
    $uuidBytes = pack('H*', str_replace('-', '', $uuid));
    $binary = new Binary($uuidBytes, Binary::TYPE_UUID);
  5. 查询条件错误

    php
    // 错误:直接使用字符串查询
    $result = $collection->findOne(['data' => $stringData]);
    
    // 正确:使用Binary对象查询
    $result = $collection->findOne(['data' => new Binary($stringData, Binary::TYPE_GENERIC)]);

11. 拓展参考资料

11.1 官方文档

11.2 学习路径

  1. 基础阶段

    • 掌握Binary类型的基本操作
    • 理解子类型的含义和选择
    • 学会Base64和十六进制编码转换
  2. 进阶阶段

    • 掌握GridFS的使用场景
    • 学习二进制数据的压缩和加密
    • 理解内容寻址存储模式
  3. 高级阶段

    • 分布式文件存储设计
    • 二进制数据的安全传输
    • 性能优化和监控

11.3 相关知识点

  • GridFS:MongoDB的大文件存储解决方案
  • BSON编码:二进制JSON格式规范
  • 内容寻址存储:通过哈希值去重的存储模式
  • 数据加密:AES-256-GCM等加密算法
  • 数字签名:RSA签名验证数据完整性