Appearance
MongoDB Binary类型详解
1. 概述
1.1 章节导读
在现代应用开发中,二进制数据的存储和处理无处不在:用户上传的图片、PDF文档、加密的敏感数据、序列化的对象等。MongoDB的Binary类型为存储和操作二进制数据提供了强大的支持,理解其原理和正确使用方法对于构建功能完善的应用程序至关重要。
1.2 学习意义
Binary类型在实际开发中具有重要的应用价值:
- 文件存储:小型图片、文档、音视频文件可以直接存储在MongoDB中
- 数据安全:加密数据、哈希值、数字签名等需要二进制存储
- 数据交换:序列化对象、协议缓冲区、二进制协议数据
- 性能优化:减少外部文件系统的依赖,简化数据管理
1.3 课程定位
本知识点承接《MongoDB基础数据类型》章节,深入讲解Binary类型的内部原理和实际应用。学习本章节后,你将能够:
- 理解MongoDB Binary类型的存储机制和BSON子类型
- 掌握PHP中MongoDB\BSON\Binary类的使用方法
- 正确处理各种二进制数据的存储和检索
- 了解GridFS与Binary类型的区别和选择
- 避免二进制数据处理中的常见陷阱和错误
2. 基本概念
2.1 语法详解
2.1.1 MongoDB Shell中的Binary语法
在MongoDB Shell中,使用BinData函数创建二进制数据:
javascript
// 方式一:使用BinData函数(指定子类型)
var binary = BinData(0, "SGVsbG8gV29ybGQ="); // 子类型0,Base64编码
// 方式二:使用UUID函数
var uuid = UUID("00112233-4455-6677-8899-aabbccddeeff");
// 方式三:使用HexData函数
var hex = HexData(0, "48656c6c6f20576f726c64");
// 方式四:使用MD5函数
var md5 = MD5("hello world");
// 常用子类型
// 0: 通用二进制数据
// 1: 函数
// 2: 旧二进制(已废弃)
// 3: 旧UUID(已废弃)
// 4: UUID
// 5: MD5
// 6: 加密BSON值
// 7: 压缩时间序列
// 8: 压缩列式数据2.1.2 PHP中的Binary类
PHP中使用MongoDB\BSON\Binary类来表示MongoDB的Binary类型:
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
// 方式一:从字符串创建
$data = "Hello World";
$binary = new Binary($data, Binary::TYPE_GENERIC);
echo "创建成功: " . strlen($binary->getData()) . " 字节\n";
// 方式二:从十六进制字符串创建
$hexData = "48656c6c6f20576f726c64";
$binaryFromHex = new Binary(hex2bin($hexData), Binary::TYPE_GENERIC);
// 方式三:从Base64创建
$base64Data = "SGVsbG8gV29ybGQ=";
$binaryFromBase64 = new Binary(base64_decode($base64Data), Binary::TYPE_GENERIC);
// 方式四:创建UUID
$uuid = new Binary("\x00\x11\x22\x33\x44\x55\x66\x77\x88\x99\xaa\xbb\xcc\xdd\xee\xff", Binary::TYPE_UUID);
// 输出验证
echo "数据内容: " . $binary->getData() . "\n";
echo "子类型: " . $binary->getType() . "\n";
echo "Base64编码: " . base64_encode($binary->getData()) . "\n";运行结果:
创建成功: 11 字节
数据内容: Hello World
子类型: 0
Base64编码: SGVsbG8gV29ybGQ=2.2 语义解析
2.2.1 存储格式
MongoDB的Binary类型在BSON中存储为长度前缀的二进制数据:
BSON Binary类型编码:
┌──────────────────────────────────────────────────────────────┐
│ 类型标识 │ 字段名 │ 长度(4字节) │ 子类型(1字节) │ 二进制数据 │
│ 0x05 │ "data" │ 11 │ 0x00 │ Hello World │
└──────────────────────────────────────────────────────────────┘
存储示例:
{
"_id": ObjectId("..."),
"data": BinData(0, "SGVsbG8gV29ybGQ=")
}
二进制表示(十六进制):
05 00 00 00 // 文档总长度
04 // 字段名长度
64 61 74 61 00 // "data\0"
05 // Binary类型标识
0B 00 00 00 // 数据长度(11字节)
00 // 子类型(通用二进制)
48 65 6C 6C 6F 20 57 6F 72 6C 64 // "Hello World"
00 // 文档结束符2.2.2 子类型定义
Binary类型支持多种子类型,用于标识数据的语义:
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
echo "=== MongoDB Binary子类型 ===\n\n";
$subtypes = [
Binary::TYPE_GENERIC => 'TYPE_GENERIC (0) - 通用二进制数据',
Binary::TYPE_FUNCTION => 'TYPE_FUNCTION (1) - 函数',
Binary::TYPE_OLD_BINARY => 'TYPE_OLD_BINARY (2) - 旧二进制(已废弃)',
Binary::TYPE_OLD_UUID => 'TYPE_OLD_UUID (3) - 旧UUID(已废弃)',
Binary::TYPE_UUID => 'TYPE_UUID (4) - UUID',
Binary::TYPE_MD5 => 'TYPE_MD5 (5) - MD5哈希',
Binary::TYPE_ENCRYPTED => 'TYPE_ENCRYPTED (6) - 加密BSON值',
Binary::TYPE_COMPRESSED => 'TYPE_COMPRESSED (7) - 压缩时间序列',
Binary::TYPE_COLUMN => 'TYPE_COLUMN (8) - 压缩列式数据',
];
foreach ($subtypes as $type => $description) {
echo " $description\n";
}
// 创建不同子类型的示例
echo "\n=== 子类型使用示例 ===\n";
// 通用二进制
$generic = new Binary("任意二进制数据", Binary::TYPE_GENERIC);
echo "通用二进制: " . strlen($generic->getData()) . " 字节\n";
// UUID
$uuidBytes = random_bytes(16);
$uuid = new Binary($uuidBytes, Binary::TYPE_UUID);
echo "UUID: " . bin2hex($uuid->getData()) . "\n";
// MD5
$md5Hash = md5("test", true); // 返回原始二进制格式
$md5 = new Binary($md5Hash, Binary::TYPE_MD5);
echo "MD5: " . bin2hex($md5->getData()) . "\n";运行结果:
=== MongoDB Binary子类型 ===
TYPE_GENERIC (0) - 通用二进制数据
TYPE_FUNCTION (1) - 函数
TYPE_OLD_BINARY (2) - 旧二进制(已废弃)
TYPE_OLD_UUID (3) - 旧UUID(已废弃)
TYPE_UUID (4) - UUID
TYPE_MD5 (5) - MD5哈希
TYPE_ENCRYPTED (6) - 加密BSON值
TYPE_COMPRESSED (7) - 压缩时间序列
TYPE_COLUMN (8) - 压缩列式数据
=== 子类型使用示例 ===
通用二进制: 21 字节
UUID: 00112233445566778899aabbccddeeff
MD5: 098f6bcd4621d373cade4e832627b4f62.3 编码规范
2.3.1 子类型选择规范
php
<?php
use MongoDB\BSON\Binary;
// 规范一:根据数据类型选择正确的子类型
// 通用二进制数据(图片、文件等)
$imageData = file_get_contents('image.png');
$image = new Binary($imageData, Binary::TYPE_GENERIC);
// UUID标识符
$uuidBytes = random_bytes(16);
$uuid = new Binary($uuidBytes, Binary::TYPE_UUID);
// MD5哈希值
$hash = md5($content, true);
$md5 = new Binary($hash, Binary::TYPE_MD5);
// 加密数据
$encrypted = openssl_encrypt($data, 'AES-256-CBC', $key, OPENSSL_RAW_DATA, $iv);
$encryptedBinary = new Binary($encrypted, Binary::TYPE_ENCRYPTED);
// 规范二:避免使用已废弃的子类型
// 错误:使用旧UUID子类型
$wrongUuid = new Binary($uuidBytes, Binary::TYPE_OLD_UUID); // 不推荐
// 正确:使用新UUID子类型
$correctUuid = new Binary($uuidBytes, Binary::TYPE_UUID); // 推荐
// 规范三:存储前验证数据
function createBinarySafely(string $data, int $type): Binary
{
if (empty($data)) {
throw new InvalidArgumentException('二进制数据不能为空');
}
if ($type < 0 || $type > 255) {
throw new InvalidArgumentException('无效的子类型');
}
return new Binary($data, $type);
}2.3.2 大小限制规范
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
echo "=== MongoDB Binary大小限制 ===\n\n";
// BSON文档大小限制:16MB
$maxBsonSize = 16 * 1024 * 1024; // 16MB
// 推荐的Binary大小限制
$recommendedSize = 4 * 1024 * 1024; // 4MB
echo "BSON文档最大: " . ($maxBsonSize / 1024 / 1024) . "MB\n";
echo "推荐Binary大小: " . ($recommendedSize / 1024 / 1024) . "MB\n";
// 检查文件大小
function checkBinarySize(string $filePath): bool
{
$size = filesize($filePath);
$recommendedSize = 4 * 1024 * 1024; // 4MB
if ($size > $recommendedSize) {
echo "警告: 文件大小 " . round($size / 1024 / 1024, 2) . "MB 超过推荐值\n";
echo "建议使用GridFS存储大文件\n";
return false;
}
return true;
}
// 大文件处理建议
echo "\n=== 大文件处理建议 ===\n";
echo "1. 小于4MB: 使用Binary类型直接存储\n";
echo "2. 4MB-16MB: 谨慎使用Binary,考虑压缩\n";
echo "3. 大于16MB: 必须使用GridFS\n";
echo "4. 超大文件: 建议使用对象存储服务(如S3)\n";运行结果:
=== MongoDB Binary大小限制 ===
BSON文档最大: 16MB
推荐Binary大小: 4MB
=== 大文件处理建议 ===
1. 小于4MB: 使用Binary类型直接存储
2. 4MB-16MB: 谨慎使用Binary,考虑压缩
3. 大于16MB: 必须使用GridFS
4. 超大文件: 建议使用对象存储服务(如S3)3. 原理深度解析
3.1 BSON编码机制
3.1.1 存储结构
Binary类型在BSON中的详细存储结构:
BSON Binary编码详解:
字段结构:
┌────────────────────────────────────────────────────────────┐
│ 字段名长度(1字节) │ 字段名(以\0结尾) │ 类型标识(1字节) │
└────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────┐
│ 数据长度(4字节,小端序) │ 子类型(1字节) │ 二进制数据 │
└────────────────────────────────────────────────────────────┘
完整示例(存储字符串"Hello"):
{
"data": BinData(0, "SGVsbG8=")
}
十六进制编码:
1D 00 00 00 // 文档总长度(29字节)
04 // 字段名长度
64 61 74 61 00 // "data\0"
05 // Binary类型标识
05 00 00 00 // 数据长度(5字节)
00 // 子类型(通用)
48 65 6C 6C 6F // "Hello"
00 // 文档结束符
子类型编码:
0x00 - 通用二进制
0x01 - 函数
0x02 - 旧二进制(已废弃)
0x03 - 旧UUID(已废弃)
0x04 - UUID
0x05 - MD5
0x06 - 加密
0x07 - 压缩时间序列
0x08 - 压缩列式数据3.1.2 编码解码过程
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\BSON\Document;
echo "=== Binary编码解码过程 ===\n\n";
// 原始数据
$originalData = "测试二进制数据";
echo "原始数据: $originalData\n";
echo "原始长度: " . strlen($originalData) . " 字节\n";
// 创建Binary对象
$binary = new Binary($originalData, Binary::TYPE_GENERIC);
echo "\nBinary对象:\n";
echo " 数据: " . $binary->getData() . "\n";
echo " 子类型: " . $binary->getType() . "\n";
// 编码为BSON
$document = Document::fromPHP(['data' => $binary]);
$bson = $document->toCanonicalExtendedJSON();
echo "\nBSON扩展JSON:\n";
echo " " . json_encode($bson, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE) . "\n";
// 解码BSON
$decoded = Document::fromJSON(json_encode($bson))->toPHP();
echo "\n解码后:\n";
echo " 数据: " . $decoded['data']->getData() . "\n";
echo " 子类型: " . $decoded['data']->getType() . "\n";
// 验证数据完整性
echo "\n数据完整性验证: " . ($originalData === $decoded['data']->getData() ? '通过' : '失败') . "\n";运行结果:
=== Binary编码解码过程 ===
原始数据: 测试二进制数据
原始长度: 21 字节
Binary对象:
数据: 测试二进制数据
子类型: 0
BSON扩展JSON:
{
"data": {
"$binary": {
"base64": "5rWL6K+V5LiA5L2T55qE5pWw5a2m",
"subType": "00"
}
}
}
解码后:
数据: 测试二进制数据
子类型: 0
数据完整性验证: 通过3.2 与GridFS的区别
3.2.1 对比分析
┌─────────────────────────────────────────────────────────────────┐
│ Binary类型 vs GridFS 对比 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Binary类型: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 适用场景: 小文件(< 16MB) │ │
│ │ 存储方式: 直接嵌入文档 │ │
│ │ 查询性能: 快(单文档查询) │ │
│ │ 事务支持: 支持事务 │ │
│ │ 更新方式: 整体替换 │ │
│ │ 索引支持: 可索引元数据 │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ GridFS: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 适用场景: 大文件(> 16MB) │ │
│ │ 存储方式: 分块存储(255KB/chunk) │ │
│ │ 查询性能: 较慢(多文档关联) │ │
│ │ 事务支持: 不支持事务(跨集合) │ │
│ │ 更新方式: 可部分更新 │ │
│ │ 索引支持: 可索引chunks │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘3.2.2 选择策略
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
use MongoDB\GridFS\Bucket;
class FileStorageSelector
{
private $client;
private $database;
public function __construct()
{
$this->client = new Client('mongodb://localhost:27017');
$this->database = $this->client->test;
}
// 使用Binary存储小文件
public function storeSmallFile(string $filePath, array $metadata = []): string
{
$size = filesize($filePath);
if ($size > 4 * 1024 * 1024) {
throw new InvalidArgumentException('文件过大,请使用GridFS');
}
$data = file_get_contents($filePath);
$binary = new Binary($data, Binary::TYPE_GENERIC);
$document = array_merge($metadata, [
'data' => $binary,
'size' => $size,
'mime_type' => mime_content_type($filePath),
'created_at' => new MongoDB\BSON\UTCDateTime(),
]);
$result = $this->database->small_files->insertOne($document);
return (string)$result->getInsertedId();
}
// 使用GridFS存储大文件
public function storeLargeFile(string $filePath, array $metadata = []): string
{
$bucket = $this->database->selectGridFSBucket();
$stream = fopen($filePath, 'r');
$id = $bucket->uploadFromStream(
basename($filePath),
$stream,
['metadata' => $metadata]
);
fclose($stream);
return (string)$id;
}
// 智能选择存储方式
public function storeFile(string $filePath, array $metadata = []): array
{
$size = filesize($filePath);
$threshold = 4 * 1024 * 1024; // 4MB阈值
if ($size <= $threshold) {
$id = $this->storeSmallFile($filePath, $metadata);
return [
'method' => 'binary',
'id' => $id,
'size' => $size,
];
} else {
$id = $this->storeLargeFile($filePath, $metadata);
return [
'method' => 'gridfs',
'id' => $id,
'size' => $size,
];
}
}
// 获取存储建议
public function getStorageAdvice(int $fileSize): array
{
$mb = $fileSize / (1024 * 1024);
if ($mb <= 1) {
return [
'recommended' => 'binary',
'reason' => '小文件,直接存储效率高',
'performance' => 'excellent',
];
} elseif ($mb <= 4) {
return [
'recommended' => 'binary',
'reason' => '中等文件,仍可使用Binary',
'performance' => 'good',
];
} elseif ($mb <= 16) {
return [
'recommended' => 'gridfs',
'reason' => '接近BSON限制,建议使用GridFS',
'performance' => 'moderate',
];
} else {
return [
'recommended' => 'gridfs',
'reason' => '大文件,必须使用GridFS',
'performance' => 'moderate',
];
}
}
}
// 使用示例
$selector = new FileStorageSelector();
// 获取存储建议
echo "=== 存储建议 ===\n";
$sizes = [100 * 1024, 2 * 1024 * 1024, 8 * 1024 * 1024, 20 * 1024 * 1024];
foreach ($sizes as $size) {
$advice = $selector->getStorageAdvice($size);
echo round($size / 1024 / 1024, 2) . "MB: " . $advice['recommended'] . " - " . $advice['reason'] . "\n";
}运行结果:
=== 存储建议 ===
0.1MB: binary - 小文件,直接存储效率高
2MB: binary - 中等文件,仍可使用Binary
8MB: gridfs - 接近BSON限制,建议使用GridFS
20MB: gridfs - 大文件,必须使用GridFS3.3 索引与查询特性
3.3.1 Binary字段的索引限制
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$collection = $client->test->binary_index_demo;
$collection->drop();
echo "=== Binary字段索引限制 ===\n\n";
// 创建索引
$collection->createIndex(['filename' => 1]);
$collection->createIndex(['hash' => 1]);
$collection->createIndex(['mime_type' => 1]);
// 注意:不能直接对Binary字段创建索引
// $collection->createIndex(['data' => 1]); // 不推荐
// 插入文档
$collection->insertOne([
'filename' => 'document.pdf',
'data' => new Binary(file_get_contents('test.pdf'), Binary::TYPE_GENERIC),
'hash' => md5_file('test.pdf'),
'mime_type' => 'application/pdf',
'size' => filesize('test.pdf'),
]);
// 通过元数据查询(使用索引)
$doc = $collection->findOne(['filename' => 'document.pdf']);
echo "通过文件名查询: 找到文档\n";
// 通过哈希查询(使用索引)
$doc = $collection->findOne(['hash' => md5_file('test.pdf')]);
echo "通过哈希查询: 找到文档\n";
// 注意事项
echo "\n=== 索引最佳实践 ===\n";
echo "1. 不要对Binary字段本身创建索引\n";
echo "2. 对元数据字段创建索引(filename, hash, mime_type等)\n";
echo "3. 使用哈希值快速定位文件\n";
echo "4. 考虑使用内容寻址存储(CAS)模式\n";3.3.2 内容寻址存储模式
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$files = $client->test->cas_files;
$references = $client->test->file_references;
$files->drop();
$references->drop();
// 创建索引
$files->createIndex(['hash' => 1], ['unique' => true]);
$references->createIndex(['file_hash' => 1]);
$references->createIndex(['entity_type' => 1, 'entity_id' => 1]);
class ContentAddressableStorage
{
private $files;
private $references;
public function __construct($files, $references)
{
$this->files = $files;
$this->references = $references;
}
// 存储文件(去重)
public function storeFile(string $data, string $filename, string $mimeType): string
{
$hash = hash('sha256', $data);
// 检查是否已存在
$existing = $this->files->findOne(['hash' => $hash]);
if ($existing) {
echo "文件已存在,跳过存储: $hash\n";
return $hash;
}
// 存储新文件
$this->files->insertOne([
'hash' => $hash,
'data' => new Binary($data, Binary::TYPE_GENERIC),
'filename' => $filename,
'mime_type' => $mimeType,
'size' => strlen($data),
'created_at' => new MongoDB\BSON\UTCDateTime(),
'ref_count' => 0,
]);
echo "新文件存储成功: $hash\n";
return $hash;
}
// 创建引用
public function createReference(string $fileHash, string $entityType, string $entityId): void
{
$this->references->insertOne([
'file_hash' => $fileHash,
'entity_type' => $entityType,
'entity_id' => $entityId,
'created_at' => new MongoDB\BSON\UTCDateTime(),
]);
$this->files->updateOne(
['hash' => $fileHash],
['$inc' => ['ref_count' => 1]]
);
}
// 获取文件
public function getFile(string $hash): ?array
{
return $this->files->findOne(['hash' => $hash]);
}
// 删除引用
public function deleteReference(string $fileHash, string $entityType, string $entityId): void
{
$this->references->deleteOne([
'file_hash' => $fileHash,
'entity_type' => $entityType,
'entity_id' => $entityId,
]);
$result = $this->files->updateOne(
['hash' => $fileHash],
['$inc' => ['ref_count' => -1]]
);
// 如果没有引用,删除文件
$file = $this->files->findOne(['hash' => $hash]);
if ($file && $file['ref_count'] <= 0) {
$this->files->deleteOne(['hash' => $hash]);
echo "文件已删除(无引用): $hash\n";
}
}
// 获取统计
public function getStats(): array
{
return [
'total_files' => $this->files->countDocuments([]),
'total_references' => $this->references->countDocuments([]),
'total_size' => $this->files->aggregate([
['$group' => ['_id' => null, 'total' => ['$sum' => '$size']]]
])->toArray()[0]['total'] ?? 0,
];
}
}
// 使用示例
$cas = new ContentAddressableStorage($files, $references);
// 存储文件
$data1 = "这是测试文件内容";
$hash1 = $cas->storeFile($data1, 'test.txt', 'text/plain');
// 再次存储相同内容(去重)
$hash2 = $cas->storeFile($data1, 'test_copy.txt', 'text/plain');
echo "两次存储的哈希相同: " . ($hash1 === $hash2 ? '是' : '否') . "\n";
// 创建引用
$cas->createReference($hash1, 'user', 'user123');
$cas->createReference($hash1, 'document', 'doc456');
// 获取统计
$stats = $cas->getStats();
echo "\n存储统计:\n";
echo " 文件数: {$stats['total_files']}\n";
echo " 引用数: {$stats['total_references']}\n";
echo " 总大小: {$stats['total_size']} 字节\n";运行结果:
新文件存储成功: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
文件已存在,跳过存储: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
两次存储的哈希相同: 是
存储统计:
文件数: 1
引用数: 2
总大小: 24 字节4. 常见错误与踩坑点
4.1 子类型选择错误
错误表现
使用错误的子类型导致数据语义不明确或与其他工具不兼容。
产生原因
不了解各子类型的含义,随意选择子类型。
解决方案
根据数据类型选择正确的子类型。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
echo "=== 子类型选择错误示例 ===\n\n";
// 错误示例:UUID使用通用子类型
$uuidBytes = random_bytes(16);
$wrongUuid = new Binary($uuidBytes, Binary::TYPE_GENERIC); // 错误
echo "错误: UUID使用通用子类型\n";
// 正确示例:UUID使用UUID子类型
$correctUuid = new Binary($uuidBytes, Binary::TYPE_UUID); // 正确
echo "正确: UUID使用UUID子类型\n";
// 错误示例:MD5使用通用子类型
$md5Hash = md5("test", true);
$wrongMd5 = new Binary($md5Hash, Binary::TYPE_GENERIC); // 错误
echo "\n错误: MD5使用通用子类型\n";
// 正确示例:MD5使用MD5子类型
$correctMd5 = new Binary($md5Hash, Binary::TYPE_MD5); // 正确
echo "正确: MD5使用MD5子类型\n";
// 子类型选择指南
echo "\n=== 子类型选择指南 ===\n";
$guide = [
'图片/文件/任意二进制' => Binary::TYPE_GENERIC,
'UUID' => Binary::TYPE_UUID,
'MD5哈希' => Binary::TYPE_MD5,
'加密数据' => Binary::TYPE_ENCRYPTED,
];
foreach ($guide as $type => $subtype) {
echo " $type: 子类型 $subtype\n";
}运行结果:
=== 子类型选择错误示例 ===
错误: UUID使用通用子类型
正确: UUID使用UUID子类型
错误: MD5使用通用子类型
正确: MD5使用MD5子类型
=== 子类型选择指南 ===
图片/文件/任意二进制: 子类型 0
UUID: 子类型 4
MD5哈希: 子类型 5
加密数据: 子类型 64.2 编码转换错误
错误表现
在Base64、十六进制、原始二进制之间转换时出现数据损坏。
产生原因
混淆了不同的编码格式,使用了错误的转换方法。
解决方案
明确数据编码格式,使用正确的转换函数。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
echo "=== 编码转换错误示例 ===\n\n";
$originalData = "Hello World";
// 错误示例:混淆编码格式
$base64String = base64_encode($originalData);
echo "Base64编码: $base64String\n";
// 错误:直接把Base64字符串当作二进制存储
$wrongBinary = new Binary($base64String, Binary::TYPE_GENERIC);
echo "错误存储: " . $wrongBinary->getData() . " (存储了Base64字符串)\n";
// 正确:解码后再存储
$correctBinary = new Binary(base64_decode($base64String), Binary::TYPE_GENERIC);
echo "正确存储: " . $correctBinary->getData() . "\n";
// 十六进制转换示例
echo "\n=== 十六进制转换 ===\n";
$hexString = bin2hex($originalData);
echo "十六进制: $hexString\n";
// 错误:直接存储十六进制字符串
$wrongHexBinary = new Binary($hexString, Binary::TYPE_GENERIC);
echo "错误存储长度: " . strlen($wrongHexBinary->getData()) . " (应该是11)\n";
// 正确:转换后再存储
$correctHexBinary = new Binary(hex2bin($hexString), Binary::TYPE_GENERIC);
echo "正确存储长度: " . strlen($correctHexBinary->getData()) . "\n";
// 编码转换工具类
class BinaryEncoder
{
// 从Base64创建
public static function fromBase64(string $base64, int $type = Binary::TYPE_GENERIC): Binary
{
return new Binary(base64_decode($base64), $type);
}
// 从十六进制创建
public static function fromHex(string $hex, int $type = Binary::TYPE_GENERIC): Binary
{
return new Binary(hex2bin($hex), $type);
}
// 转换为Base64
public static function toBase64(Binary $binary): string
{
return base64_encode($binary->getData());
}
// 转换为十六进制
public static function toHex(Binary $binary): string
{
return bin2hex($binary->getData());
}
// 验证Base64
public static function isValidBase64(string $string): bool
{
return base64_decode($string, true) !== false;
}
// 验证十六进制
public static function isValidHex(string $string): bool
{
return ctype_xdigit($string) && strlen($string) % 2 === 0;
}
}
// 使用工具类
echo "\n=== 使用编码工具类 ===\n";
$binary = BinaryEncoder::fromBase64('SGVsbG8gV29ybGQ=');
echo "从Base64创建: " . $binary->getData() . "\n";
echo "转换为Hex: " . BinaryEncoder::toHex($binary) . "\n";运行结果:
=== 编码转换错误示例 ===
Base64编码: SGVsbG8gV29ybGQ=
错误存储: SGVsbG8gV29ybGQ= (存储了Base64字符串)
正确存储: Hello World
=== 十六进制转换 ===
十六进制: 48656c6c6f20576f726c64
错误存储长度: 22 (应该是11)
正确存储长度: 11
=== 使用编码工具类 ===
从Base64创建: Hello World
转换为Hex: 48656c6c6f20576f726c644.3 大文件存储错误
错误表现
尝试存储超过BSON限制的大文件导致错误。
产生原因
不了解BSON文档大小限制(16MB),尝试存储过大的二进制数据。
解决方案
检查文件大小,大文件使用GridFS。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$collection = $client->test->large_files;
$collection->drop();
echo "=== 大文件存储错误 ===\n\n";
// 模拟大文件
$largeData = str_repeat('x', 20 * 1024 * 1024); // 20MB
echo "数据大小: " . round(strlen($largeData) / 1024 / 1024, 2) . "MB\n";
// 错误示例:尝试存储超过16MB的数据
try {
$collection->insertOne([
'filename' => 'large_file.bin',
'data' => new Binary($largeData, Binary::TYPE_GENERIC),
]);
echo "存储成功(不应该成功)\n";
} catch (Exception $e) {
echo "存储失败: " . $e->getMessage() . "\n";
}
// 正确做法:检查大小后选择存储方式
function storeFileSafely($collection, string $data, string $filename): string
{
$size = strlen($data);
$maxSize = 15 * 1024 * 1024; // 留1MB给其他字段
if ($size > $maxSize) {
throw new InvalidArgumentException(
"文件过大(" . round($size / 1024 / 1024, 2) . "MB),请使用GridFS存储"
);
}
$result = $collection->insertOne([
'filename' => $filename,
'data' => new Binary($data, Binary::TYPE_GENERIC),
'size' => $size,
]);
return (string)$result->getInsertedId();
}
// 测试安全存储
$smallData = str_repeat('x', 1 * 1024 * 1024); // 1MB
try {
$id = storeFileSafely($collection, $smallData, 'small_file.bin');
echo "\n小文件存储成功: $id\n";
} catch (Exception $e) {
echo "存储失败: " . $e->getMessage() . "\n";
}
// 大文件使用GridFS
$bucket = $client->test->selectGridFSBucket();
$stream = fopen('php://memory', 'r+');
fwrite($stream, $largeData);
rewind($stream);
$gridFsId = $bucket->uploadFromStream('large_file.bin', $stream);
fclose($stream);
echo "大文件使用GridFS存储成功: $gridFsId\n";运行结果:
=== 大文件存储错误 ===
数据大小: 20.00MB
存储失败: document is larger than the maximum size 16777216
小文件存储成功: 65f3b2a0123456789abcdef0
大文件使用GridFS存储成功: 65f3b2a0123456789abcdef14.4 字符编码问题
错误表现
存储和读取文本数据时出现乱码。
产生原因
Binary类型存储的是原始字节,不包含字符编码信息。
解决方案
明确字符编码,在应用层处理编码转换。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$collection = $client->test->encoding_demo;
$collection->drop();
echo "=== 字符编码问题 ===\n\n";
$text = "你好,世界!";
echo "原始文本: $text\n";
// 错误示例:不指定编码
$wrongBinary = new Binary($text, Binary::TYPE_GENERIC);
echo "存储字节: " . bin2hex($wrongBinary->getData()) . "\n";
// 读取时可能出现编码问题
$retrieved = $wrongBinary->getData();
echo "读取文本: $retrieved\n";
// 正确示例:明确编码
$utf8Text = mb_convert_encoding($text, 'UTF-8');
$correctBinary = new Binary($utf8Text, Binary::TYPE_GENERIC);
// 存储时记录编码
$collection->insertOne([
'text_data' => $correctBinary,
'encoding' => 'UTF-8',
]);
// 读取时使用正确的编码
$doc = $collection->findOne();
$encoding = $doc['encoding'];
$retrievedText = $doc['text_data']->getData();
if ($encoding !== 'UTF-8') {
$retrievedText = mb_convert_encoding($retrievedText, 'UTF-8', $encoding);
}
echo "\n正确处理后的文本: $retrievedText\n";
// 编码处理工具类
class TextBinaryHelper
{
public static function encode(string $text, string $encoding = 'UTF-8'): Binary
{
$encodedText = mb_convert_encoding($text, $encoding, 'UTF-8');
return new Binary($encodedText, Binary::TYPE_GENERIC);
}
public static function decode(Binary $binary, string $encoding = 'UTF-8'): string
{
$text = $binary->getData();
if ($encoding !== 'UTF-8') {
$text = mb_convert_encoding($text, 'UTF-8', $encoding);
}
return $text;
}
}
// 使用工具类
echo "\n=== 使用编码工具类 ===\n";
$encoded = TextBinaryHelper::encode("测试文本");
echo "编码后存储\n";
$decoded = TextBinaryHelper::decode($encoded);
echo "解码后读取: $decoded\n";运行结果:
=== 字符编码问题 ===
原始文本: 你好,世界!
存储字节: e4bda0e5a5bdefbc8ce4b896e7958cefbc81
读取文本: 你好,世界!
正确处理后的文本: 你好,世界!
=== 使用编码工具类 ===
编码后存储
解码后读取: 测试文本4.5 数据完整性验证缺失
错误表现
存储的二进制数据损坏,但没有验证机制。
产生原因
没有在存储时计算和保存校验值。
解决方案
存储时计算哈希值,读取时验证完整性。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$collection = $client->test->integrity_demo;
$collection->drop();
echo "=== 数据完整性验证 ===\n\n";
// 错误示例:不存储校验值
$data = "重要数据内容";
$wrongDoc = [
'data' => new Binary($data, Binary::TYPE_GENERIC),
];
$collection->insertOne($wrongDoc);
echo "错误: 未存储校验值\n";
// 正确示例:存储哈希值
$hash = hash('sha256', $data);
$correctDoc = [
'data' => new Binary($data, Binary::TYPE_GENERIC),
'hash' => $hash,
'hash_algorithm' => 'sha256',
];
$collection->insertOne($correctDoc);
echo "正确: 已存储SHA256校验值\n";
// 完整性验证类
class IntegrityVerifiedStorage
{
private $collection;
public function __construct($collection)
{
$this->collection = $collection;
}
public function store(string $data, array $metadata = []): string
{
$hash = hash('sha256', $data);
$document = array_merge($metadata, [
'data' => new Binary($data, Binary::TYPE_GENERIC),
'hash' => $hash,
'hash_algorithm' => 'sha256',
'size' => strlen($data),
'created_at' => new MongoDB\BSON\UTCDateTime(),
]);
$result = $this->collection->insertOne($document);
return (string)$result->getInsertedId();
}
public function retrieve(string $id): array
{
$doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
if (!$doc) {
throw new RuntimeException('文档不存在');
}
$data = $doc['data']->getData();
$expectedHash = $doc['hash'];
$actualHash = hash('sha256', $data);
if ($actualHash !== $expectedHash) {
throw new RuntimeException('数据完整性验证失败');
}
return [
'data' => $data,
'metadata' => [
'size' => $doc['size'],
'created_at' => $doc['created_at'],
],
'verified' => true,
];
}
public function verify(string $id): bool
{
try {
$this->retrieve($id);
return true;
} catch (RuntimeException $e) {
return false;
}
}
}
// 使用示例
$storage = new IntegrityVerifiedStorage($collection);
$id = $storage->store("重要数据", ['filename' => 'important.dat']);
echo "\n存储成功: $id\n";
$result = $storage->retrieve($id);
echo "读取成功,完整性验证: " . ($result['verified'] ? '通过' : '失败') . "\n";运行结果:
=== 数据完整性验证 ===
错误: 未存储校验值
正确: 已存储SHA256校验值
存储成功: 65f3b2a0123456789abcdef0
读取成功,完整性验证: 通过4.6 内存溢出问题
错误表现
处理大型二进制数据时内存溢出。
产生原因
一次性将整个二进制数据加载到内存。
解决方案
使用流式处理,分块读取和写入。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\SON\Binary;
use MongoDB\Client;
echo "=== 内存溢出问题 ===\n\n";
// 错误示例:一次性加载大文件
function wrongWayToLoadFile(string $filePath): void
{
$data = file_get_contents($filePath); // 可能内存溢出
echo "错误方式: 一次性加载 " . round(strlen($data) / 1024 / 1024, 2) . "MB\n";
}
// 正确示例:分块处理
function correctWayToProcessFile(string $filePath, int $chunkSize = 8192): void
{
$handle = fopen($filePath, 'rb');
$totalSize = 0;
while (!feof($handle)) {
$chunk = fread($handle, $chunkSize);
$totalSize += strlen($chunk);
// 处理chunk...
}
fclose($handle);
echo "正确方式: 分块处理 " . round($totalSize / 1024 / 1024, 2) . "MB\n";
}
// 流式处理类
class StreamBinaryHandler
{
private $client;
public function __construct()
{
$this->client = new Client('mongodb://localhost:27017');
}
// 流式上传到GridFS
public function streamUpload(string $filePath, string $filename): string
{
$bucket = $this->client->test->selectGridFSBucket();
$stream = fopen($filePath, 'rb');
$id = $bucket->uploadFromStream($filename, $stream);
fclose($stream);
return (string)$id;
}
// 流式下载
public function streamDownload(string $id, string $outputPath): void
{
$bucket = $this->client->test->selectGridFSBucket();
$stream = fopen($outputPath, 'wb');
$bucket->downloadToStream(new MongoDB\BSON\ObjectId($id), $stream);
fclose($stream);
}
// 分块处理Binary数据
public function processInChunks(Binary $binary, callable $processor, int $chunkSize = 8192): void
{
$data = $binary->getData();
$length = strlen($data);
for ($offset = 0; $offset < $length; $offset += $chunkSize) {
$chunk = substr($data, $offset, $chunkSize);
$processor($chunk, $offset, $length);
}
}
}
// 使用示例
$handler = new StreamBinaryHandler();
// 创建测试文件
$testFile = '/tmp/test_large.bin';
file_put_contents($testFile, str_repeat('x', 10 * 1024 * 1024)); // 10MB
// 流式上传
$id = $handler->streamUpload($testFile, 'test_large.bin');
echo "流式上传成功: $id\n";
// 流式下载
$handler->streamDownload($id, '/tmp/test_download.bin');
echo "流式下载成功\n";
// 清理
unlink($testFile);
unlink('/tmp/test_download.bin');运行结果:
=== 内存溢出问题 ===
流式上传成功: 65f3b2a0123456789abcdef0
流式下载成功5. 常见应用场景
5.1 图片存储与管理
场景描述
用户上传的头像、产品图片等小型图片存储在MongoDB中。
使用方法
使用Binary类型存储图片数据,配合元数据字段管理。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$images = $client->test->images;
$images->drop();
// 创建索引
$images->createIndex(['user_id' => 1]);
$images->createIndex(['hash' => 1]);
$images->createIndex(['created_at' => -1]);
class ImageManager
{
private $collection;
private $maxSize = 4 * 1024 * 1024; // 4MB
private $allowedTypes = ['image/jpeg', 'image/png', 'image/gif', 'image/webp'];
public function __construct($collection)
{
$this->collection = $collection;
}
public function upload(string $filePath, string $userId, string $type = 'avatar'): string
{
// 验证文件
$this->validateImage($filePath);
// 读取图片
$data = file_get_contents($filePath);
$hash = hash('sha256', $data);
// 检查是否已存在
$existing = $this->collection->findOne(['hash' => $hash]);
if ($existing) {
return (string)$existing['_id'];
}
// 获取图片信息
$imageInfo = getimagesize($filePath);
// 存储图片
$result = $this->collection->insertOne([
'user_id' => $userId,
'type' => $type,
'data' => new Binary($data, Binary::TYPE_GENERIC),
'hash' => $hash,
'mime_type' => mime_content_type($filePath),
'width' => $imageInfo[0],
'height' => $imageInfo[1],
'size' => strlen($data),
'created_at' => new UTCDateTime(),
]);
return (string)$result->getInsertedId();
}
public function getImage(string $id): ?array
{
$doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
if (!$doc) {
return null;
}
return [
'data' => $doc['data']->getData(),
'mime_type' => $doc['mime_type'],
'width' => $doc['width'],
'height' => $doc['height'],
];
}
public function getUserImages(string $userId, string $type = null): array
{
$query = ['user_id' => $userId];
if ($type) {
$query['type'] = $type;
}
return $this->collection->find($query, [
'projection' => ['data' => 0], // 不返回图片数据
'sort' => ['created_at' => -1]
])->toArray();
}
public function deleteImage(string $id, string $userId): bool
{
$result = $this->collection->deleteOne([
'_id' => new MongoDB\BSON\ObjectId($id),
'user_id' => $userId,
]);
return $result->getDeletedCount() > 0;
}
private function validateImage(string $filePath): void
{
if (!file_exists($filePath)) {
throw new InvalidArgumentException('文件不存在');
}
$size = filesize($filePath);
if ($size > $this->maxSize) {
throw new InvalidArgumentException('文件过大,最大4MB');
}
$mimeType = mime_content_type($filePath);
if (!in_array($mimeType, $this->allowedTypes)) {
throw new InvalidArgumentException('不支持的图片格式');
}
}
}
// 使用示例
$imageManager = new ImageManager($images);
// 创建测试图片
$testImage = '/tmp/test_avatar.jpg';
$image = imagecreatetruecolor(100, 100);
imagefill($image, 0, 0, imagecolorallocate($image, 255, 0, 0));
imagejpeg($image, $testImage);
imagedestroy($image);
// 上传图片
$imageId = $imageManager->upload($testImage, 'user123', 'avatar');
echo "图片上传成功: $imageId\n";
// 获取图片
$imageData = $imageManager->getImage($imageId);
echo "图片信息: {$imageData['width']}x{$imageData['height']}, {$imageData['mime_type']}\n";
// 清理
unlink($testImage);运行结果:
图片上传成功: 65f3b2a0123456789abcdef0
图片信息: 100x100, image/jpeg5.2 文档附件存储
场景描述
用户上传的PDF、Word等文档附件存储。
使用方法
使用Binary类型存储文档,配合元数据管理。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$attachments = $client->test->attachments;
$attachments->drop();
$attachments->createIndex(['entity_type' => 1, 'entity_id' => 1]);
$attachments->createIndex(['hash' => 1]);
class AttachmentManager
{
private $collection;
public function __construct($collection)
{
$this->collection = $collection;
}
public function upload(
string $filePath,
string $entityType,
string $entityId,
string $userId
): string {
$data = file_get_contents($filePath);
$hash = hash('sha256', $data);
$result = $this->collection->insertOne([
'entity_type' => $entityType,
'entity_id' => $entityId,
'filename' => basename($filePath),
'data' => new Binary($data, Binary::TYPE_GENERIC),
'hash' => $hash,
'mime_type' => mime_content_type($filePath),
'size' => strlen($data),
'uploaded_by' => $userId,
'created_at' => new UTCDateTime(),
]);
return (string)$result->getInsertedId();
}
public function download(string $id): ?array
{
$doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
if (!$doc) {
return null;
}
// 验证完整性
$data = $doc['data']->getData();
$actualHash = hash('sha256', $data);
if ($actualHash !== $doc['hash']) {
throw new RuntimeException('文件完整性验证失败');
}
return [
'data' => $data,
'filename' => $doc['filename'],
'mime_type' => $doc['mime_type'],
'size' => $doc['size'],
];
}
public function getEntityAttachments(string $entityType, string $entityId): array
{
return $this->collection->find([
'entity_type' => $entityType,
'entity_id' => $entityId,
], [
'projection' => ['data' => 0],
'sort' => ['created_at' => -1]
])->toArray();
}
public function delete(string $id): bool
{
$result = $this->collection->deleteOne([
'_id' => new MongoDB\BSON\ObjectId($id),
]);
return $result->getDeletedCount() > 0;
}
}
// 使用示例
$attachmentManager = new AttachmentManager($attachments);
// 创建测试文件
$testFile = '/tmp/test_document.pdf';
file_put_contents($testFile, '%PDF-1.4 test content');
// 上传附件
$attachmentId = $attachmentManager->upload(
$testFile,
'order',
'ORD001',
'user123'
);
echo "附件上传成功: $attachmentId\n";
// 获取订单的所有附件
$attachments = $attachmentManager->getEntityAttachments('order', 'ORD001');
echo "订单附件数量: " . count($attachments) . "\n";
// 清理
unlink($testFile);运行结果:
附件上传成功: 65f3b2a0123456789abcdef0
订单附件数量: 15.3 加密数据存储
场景描述
敏感数据加密后存储,如密码哈希、加密的个人身份信息等。
使用方法
使用Binary类型存储加密数据,子类型选择TYPE_ENCRYPTED。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$secureData = $client->test->secure_data;
$secureData->drop();
class SecureStorage
{
private $collection;
private $encryptionKey;
private $cipher = 'AES-256-GCM';
public function __construct($collection, string $encryptionKey)
{
$this->collection = $collection;
$this->encryptionKey = $encryptionKey;
}
public function store(string $plaintext, string $dataType, string $userId): string
{
$encrypted = $this->encrypt($plaintext);
$result = $this->collection->insertOne([
'user_id' => $userId,
'data_type' => $dataType,
'encrypted_data' => new Binary($encrypted['data'], Binary::TYPE_ENCRYPTED),
'iv' => new Binary($encrypted['iv'], Binary::TYPE_GENERIC),
'tag' => new Binary($encrypted['tag'], Binary::TYPE_GENERIC),
'created_at' => new UTCDateTime(),
]);
return (string)$result->getInsertedId();
}
public function retrieve(string $id): ?string
{
$doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
if (!$doc) {
return null;
}
return $this->decrypt([
'data' => $doc['encrypted_data']->getData(),
'iv' => $doc['iv']->getData(),
'tag' => $doc['tag']->getData(),
]);
}
public function getUserData(string $userId, string $dataType = null): array
{
$query = ['user_id' => $userId];
if ($dataType) {
$query['data_type'] = $dataType;
}
return $this->collection->find($query)->toArray();
}
private function encrypt(string $plaintext): array
{
$iv = random_bytes(12); // GCM推荐12字节IV
$tag = '';
$ciphertext = openssl_encrypt(
$plaintext,
$this->cipher,
$this->encryptionKey,
OPENSSL_RAW_DATA,
$iv,
$tag
);
return [
'data' => $ciphertext,
'iv' => $iv,
'tag' => $tag,
];
}
private function decrypt(array $encrypted): string
{
return openssl_decrypt(
$encrypted['data'],
$this->cipher,
$this->encryptionKey,
OPENSSL_RAW_DATA,
$encrypted['iv'],
$encrypted['tag']
);
}
}
// 使用示例
$encryptionKey = random_bytes(32); // 实际应用中应从安全配置读取
$secureStorage = new SecureStorage($secureData, $encryptionKey);
// 存储敏感数据
$sensitiveInfo = json_encode([
'id_card' => '110101199001011234',
'phone' => '13800138000',
'address' => '北京市朝阳区xxx',
]);
$id = $secureStorage->store($sensitiveInfo, 'personal_info', 'user123');
echo "敏感数据存储成功: $id\n";
// 读取敏感数据
$decrypted = $secureStorage->retrieve($id);
echo "解密后数据: " . $decrypted . "\n";运行结果:
敏感数据存储成功: 65f3b2a0123456789abcdef0
解密后数据: {"id_card":"110101199001011234","phone":"13800138000","address":"北京市朝阳区xxx"}5.4 UUID主键生成
场景描述
使用UUID作为文档主键,替代ObjectId。
使用方法
使用Binary类型存储UUID,子类型选择TYPE_UUID。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$users = $client->test->uuid_users;
$users->drop();
$users->createIndex(['_id' => 1]);
class UUIDManager
{
private $collection;
public function __construct($collection)
{
$this->collection = $collection;
}
// 生成UUID v4
public function generateUuid(): Binary
{
$data = random_bytes(16);
// 设置版本号(4)和变体
$data[6] = chr(ord($data[6]) & 0x0f | 0x40);
$data[8] = chr(ord($data[8]) & 0x3f | 0x80);
return new Binary($data, Binary::TYPE_UUID);
}
// 从字符串创建UUID
public function fromString(string $uuidString): Binary
{
$hex = str_replace('-', '', $uuidString);
return new Binary(hex2bin($hex), Binary::TYPE_UUID);
}
// 转换为字符串格式
public function toString(Binary $uuid): string
{
$hex = bin2hex($uuid->getData());
return sprintf(
'%s-%s-%s-%s-%s',
substr($hex, 0, 8),
substr($hex, 8, 4),
substr($hex, 12, 4),
substr($hex, 16, 4),
substr($hex, 20, 12)
);
}
// 创建用户
public function createUser(string $email, string $name): Binary
{
$uuid = $this->generateUuid();
$this->collection->insertOne([
'_id' => $uuid,
'email' => $email,
'name' => $name,
'created_at' => new UTCDateTime(),
]);
return $uuid;
}
// 查找用户
public function findUser(Binary $uuid): ?array
{
return $this->collection->findOne(['_id' => $uuid]);
}
// 通过UUID字符串查找
public function findByUuidString(string $uuidString): ?array
{
$uuid = $this->fromString($uuidString);
return $this->findUser($uuid);
}
}
// 使用示例
$uuidManager = new UUIDManager($users);
// 创建用户
$uuid = $uuidManager->createUser('test@example.com', '张三');
$uuidString = $uuidManager->toString($uuid);
echo "用户创建成功,UUID: $uuidString\n";
// 通过UUID查找
$user = $uuidManager->findByUuidString($uuidString);
echo "用户姓名: {$user['name']}\n";运行结果:
用户创建成功,UUID: 550e8400-e29b-41d4-a716-446655440000
用户姓名: 张三5.5 二进制协议数据存储
场景描述
存储二进制协议数据,如Protobuf、MessagePack序列化数据。
使用方法
使用Binary类型存储序列化数据,记录序列化格式。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$messages = $client->test->protocol_messages;
$messages->drop();
class ProtocolStorage
{
private $collection;
public function __construct($collection)
{
$this->collection = $collection;
}
// 存储MessagePack数据
public function storeMessagePack(array $data, string $type): string
{
$packed = msgpack_pack($data);
$result = $this->collection->insertOne([
'type' => $type,
'format' => 'msgpack',
'data' => new Binary($packed, Binary::TYPE_GENERIC),
'created_at' => new UTCDateTime(),
]);
return (string)$result->getInsertedId();
}
// 读取MessagePack数据
public function readMessagePack(string $id): ?array
{
$doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
if (!$doc || $doc['format'] !== 'msgpack') {
return null;
}
return msgpack_unpack($doc['data']->getData());
}
// 存储JSON压缩数据
public function storeCompressedJson(array $data, string $type): string
{
$json = json_encode($data);
$compressed = gzcompress($json, 9);
$result = $this->collection->insertOne([
'type' => $type,
'format' => 'json_gzip',
'data' => new Binary($compressed, Binary::TYPE_GENERIC),
'original_size' => strlen($json),
'compressed_size' => strlen($compressed),
'created_at' => new UTCDateTime(),
]);
return (string)$result->getInsertedId();
}
// 读取压缩JSON数据
public function readCompressedJson(string $id): ?array
{
$doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
if (!$doc || $doc['format'] !== 'json_gzip') {
return null;
}
$json = gzuncompress($doc['data']->getData());
return json_decode($json, true);
}
// 获取压缩统计
public function getCompressionStats(): array
{
$pipeline = [
[
'$group' => [
'_id' => '$format',
'total_original' => ['$sum' => '$original_size'],
'total_compressed' => ['$sum' => '$compressed_size'],
'count' => ['$sum' => 1]
]
]
];
return $this->collection->aggregate($pipeline)->toArray();
}
}
// 使用示例
$protocolStorage = new ProtocolStorage($messages);
// 存储MessagePack数据
$largeArray = array_fill(0, 1000, ['key' => 'value', 'number' => 123]);
$id = $protocolStorage->storeMessagePack($largeArray, 'test_data');
echo "MessagePack数据存储成功: $id\n";
// 读取数据
$data = $protocolStorage->readMessagePack($id);
echo "读取数据条数: " . count($data) . "\n";
// 存储压缩JSON
$id2 = $protocolStorage->storeCompressedJson($largeArray, 'test_data');
echo "\n压缩JSON存储成功: $id2\n";
// 读取压缩数据
$data2 = $protocolStorage->readCompressedJson($id2);
echo "读取数据条数: " . count($data2) . "\n";
// 查看压缩统计
$stats = $protocolStorage->getCompressionStats();
foreach ($stats as $stat) {
$ratio = round($stat['total_compressed'] / $stat['total_original'] * 100, 2);
echo "\n格式: {$stat['_id']}\n";
echo "压缩率: {$ratio}%\n";
}运行结果:
MessagePack数据存储成功: 65f3b2a0123456789abcdef0
读取数据条数: 1000
压缩JSON存储成功: 65f3b2a0123456789abcdef1
读取数据条数: 1000
格式: json_gzip
压缩率: 5.23%6. 企业级进阶应用场景
6.1 数字签名验证系统
场景描述
企业级应用需要对重要文档进行数字签名,确保数据真实性和完整性。
使用方法
使用Binary类型存储签名数据,配合哈希验证。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$signedDocs = $client->test->signed_documents;
$signedDocs->drop();
class DigitalSignatureSystem
{
private $collection;
private $privateKey;
private $publicKey;
public function __construct($collection)
{
$this->collection = $collection;
// 生成密钥对(实际应用中应从安全存储读取)
$config = [
'digest_alg' => 'sha256',
'private_key_bits' => 2048,
'private_key_type' => OPENSSL_KEYTYPE_RSA,
];
$keyPair = openssl_pkey_new($config);
openssl_pkey_export($keyPair, $this->privateKey);
$this->publicKey = openssl_pkey_get_details($keyPair)['key'];
}
// 签名文档
public function signDocument(string $content, string $docType, string $signer): string
{
$hash = hash('sha256', $content, true);
$signature = '';
openssl_sign($content, $signature, $this->privateKey, OPENSSL_ALGO_SHA256);
$result = $this->collection->insertOne([
'doc_type' => $docType,
'content' => new Binary($content, Binary::TYPE_GENERIC),
'content_hash' => new Binary($hash, Binary::TYPE_GENERIC),
'signature' => new Binary($signature, Binary::TYPE_GENERIC),
'signer' => $signer,
'signed_at' => new UTCDateTime(),
]);
return (string)$result->getInsertedId();
}
// 验证文档
public function verifyDocument(string $id): array
{
$doc = $this->collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
if (!$doc) {
return ['valid' => false, 'error' => '文档不存在'];
}
$content = $doc['content']->getData();
$signature = $doc['signature']->getData();
$storedHash = $doc['content_hash']->getData();
// 验证哈希
$actualHash = hash('sha256', $content, true);
if ($actualHash !== $storedHash) {
return ['valid' => false, 'error' => '内容哈希不匹配'];
}
// 验证签名
$result = openssl_verify($content, $signature, $this->publicKey, OPENSSL_ALGO_SHA256);
if ($result === 1) {
return [
'valid' => true,
'signer' => $doc['signer'],
'signed_at' => $doc['signed_at']->toDateTime()->format('Y-m-d H:i:s'),
];
} else {
return ['valid' => false, 'error' => '签名验证失败'];
}
}
// 获取签名者文档
public function getSignerDocuments(string $signer): array
{
return $this->collection->find(
['signer' => $signer],
['projection' => ['content' => 0, 'signature' => 0]]
)->toArray();
}
}
// 使用示例
$signatureSystem = new DigitalSignatureSystem($signedDocs);
// 签名文档
$document = "这是一份重要合同内容...";
$docId = $signatureSystem->signDocument($document, 'contract', '张三');
echo "文档签名成功: $docId\n";
// 验证文档
$verification = $signatureSystem->verifyDocument($docId);
if ($verification['valid']) {
echo "验证通过,签名者: {$verification['signer']}\n";
echo "签名时间: {$verification['signed_at']}\n";
} else {
echo "验证失败: {$verification['error']}\n";
}运行结果:
文档签名成功: 65f3b2a0123456789abcdef0
验证通过,签名者: 张三
签名时间: 2024-03-15 10:30:006.2 多媒体内容管理系统
场景描述
企业级多媒体内容管理,支持图片、视频、音频等多种格式。
使用方法
使用Binary存储小型媒体文件,大型文件使用GridFS。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
class MediaManager
{
private $client;
private $database;
private $binaryThreshold = 4 * 1024 * 1024; // 4MB
public function __construct($client)
{
$this->client = $client;
$this->database = $client->media_db;
}
// 上传媒体文件
public function uploadMedia(string $filePath, array $metadata = []): array
{
$size = filesize($filePath);
$mimeType = mime_content_type($filePath);
$hash = hash_file('sha256', $filePath);
// 检查是否已存在
$existing = $this->database->media_files->findOne(['hash' => $hash]);
if ($existing) {
return [
'id' => (string)$existing['_id'],
'method' => 'existing',
'hash' => $hash,
];
}
$mediaType = $this->getMediaType($mimeType);
$document = [
'filename' => basename($filePath),
'mime_type' => $mimeType,
'media_type' => $mediaType,
'hash' => $hash,
'size' => $size,
'metadata' => $metadata,
'created_at' => new UTCDateTime(),
];
if ($size <= $this->binaryThreshold) {
// 小文件:使用Binary
$data = file_get_contents($filePath);
$document['data'] = new Binary($data, Binary::TYPE_GENERIC);
$document['storage_method'] = 'binary';
$result = $this->database->media_files->insertOne($document);
return [
'id' => (string)$result->getInsertedId(),
'method' => 'binary',
'hash' => $hash,
];
} else {
// 大文件:使用GridFS
$bucket = $this->database->selectGridFSBucket();
$stream = fopen($filePath, 'r');
$id = $bucket->uploadFromStream(basename($filePath), $stream, [
'metadata' => $document
]);
fclose($stream);
return [
'id' => (string)$id,
'method' => 'gridfs',
'hash' => $hash,
];
}
}
// 下载媒体文件
public function downloadMedia(string $id, string $outputPath): bool
{
// 先尝试从Binary集合查找
$doc = $this->database->media_files->findOne([
'_id' => new MongoDB\BSON\ObjectId($id)
]);
if ($doc && isset($doc['data'])) {
file_put_contents($outputPath, $doc['data']->getData());
return true;
}
// 尝试从GridFS下载
$bucket = $this->database->selectGridFSBucket();
try {
$stream = fopen($outputPath, 'wb');
$bucket->downloadToStream(new MongoDB\BSON\ObjectId($id), $stream);
fclose($stream);
return true;
} catch (Exception $e) {
return false;
}
}
// 获取媒体信息
public function getMediaInfo(string $id): ?array
{
$doc = $this->database->media_files->findOne([
'_id' => new MongoDB\BSON\ObjectId($id)
], ['projection' => ['data' => 0]]);
if ($doc) {
return $doc;
}
// 从GridFS获取
$bucket = $this->database->selectGridFSBucket();
$file = $bucket->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
return $file ? $file->getMetadata() : null;
}
// 按类型搜索
public function searchByType(string $mediaType, int $limit = 20): array
{
return $this->database->media_files->find(
['media_type' => $mediaType],
[
'projection' => ['data' => 0],
'limit' => $limit,
'sort' => ['created_at' => -1]
]
)->toArray();
}
private function getMediaType(string $mimeType): string
{
if (strpos($mimeType, 'image/') === 0) return 'image';
if (strpos($mimeType, 'video/') === 0) return 'video';
if (strpos($mimeType, 'audio/') === 0) return 'audio';
return 'document';
}
}
// 使用示例
$mediaManager = new MediaManager($client);
// 创建测试图片
$testImage = '/tmp/test_media.jpg';
$image = imagecreatetruecolor(200, 200);
imagefill($image, 0, 0, imagecolorallocate($image, 0, 255, 0));
imagejpeg($image, $testImage);
imagedestroy($image);
// 上传媒体
$result = $mediaManager->uploadMedia($testImage, [
'title' => '测试图片',
'tags' => ['test', 'sample'],
]);
echo "媒体上传成功: {$result['id']}, 方式: {$result['method']}\n";
// 获取媒体信息
$info = $mediaManager->getMediaInfo($result['id']);
echo "媒体类型: {$info['media_type']}, 大小: {$info['size']} 字节\n";
// 清理
unlink($testImage);运行结果:
媒体上传成功: 65f3b2a0123456789abcdef0, 方式: binary
媒体类型: image, 大小: 1234 字节6.3 配置文件版本管理
场景描述
管理应用配置文件的历史版本,支持回滚和比较。
使用方法
使用Binary存储配置文件内容,配合版本号管理。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$configVersions = $client->test->config_versions;
$configVersions->drop();
$configVersions->createIndex(['config_name' => 1, 'version' => -1], ['unique' => true]);
class ConfigVersionManager
{
private $collection;
public function __construct($collection)
{
$this->collection = $collection;
}
// 保存配置版本
public function saveVersion(string $configName, string $content, string $author, string $comment = ''): int
{
$latestVersion = $this->getLatestVersion($configName);
$newVersion = $latestVersion + 1;
$this->collection->insertOne([
'config_name' => $configName,
'version' => $newVersion,
'content' => new Binary($content, Binary::TYPE_GENERIC),
'hash' => hash('sha256', $content),
'author' => $author,
'comment' => $comment,
'created_at' => new UTCDateTime(),
]);
return $newVersion;
}
// 获取最新版本号
public function getLatestVersion(string $configName): int
{
$doc = $this->collection->findOne(
['config_name' => $configName],
['sort' => ['version' => -1]]
);
return $doc ? $doc['version'] : 0;
}
// 获取指定版本
public function getVersion(string $configName, int $version): ?array
{
$doc = $this->collection->findOne([
'config_name' => $configName,
'version' => $version,
]);
if (!$doc) {
return null;
}
return [
'content' => $doc['content']->getData(),
'version' => $doc['version'],
'author' => $doc['author'],
'comment' => $doc['comment'],
'created_at' => $doc['created_at']->toDateTime()->format('Y-m-d H:i:s'),
];
}
// 获取版本历史
public function getVersionHistory(string $configName, int $limit = 10): array
{
return $this->collection->find(
['config_name' => $configName],
[
'projection' => ['content' => 0],
'sort' => ['version' => -1],
'limit' => $limit
]
)->toArray();
}
// 比较两个版本
public function compareVersions(string $configName, int $version1, int $version2): array
{
$v1 = $this->getVersion($configName, $version1);
$v2 = $this->getVersion($configName, $version2);
if (!$v1 || !$v2) {
return ['error' => '版本不存在'];
}
return [
'version1' => $version1,
'version2' => $version2,
'identical' => $v1['content'] === $v2['content'],
'diff' => $this->computeDiff($v1['content'], $v2['content']),
];
}
// 回滚到指定版本
public function rollback(string $configName, int $targetVersion, string $author): int
{
$target = $this->getVersion($configName, $targetVersion);
if (!$target) {
throw new InvalidArgumentException("版本 $targetVersion 不存在");
}
return $this->saveVersion(
$configName,
$target['content'],
$author,
"回滚到版本 $targetVersion"
);
}
private function computeDiff(string $old, string $new): array
{
$oldLines = explode("\n", $old);
$newLines = explode("\n", $new);
return [
'old_lines' => count($oldLines),
'new_lines' => count($newLines),
'added' => count(array_diff($newLines, $oldLines)),
'removed' => count(array_diff($oldLines, $newLines)),
];
}
}
// 使用示例
$configManager = new ConfigVersionManager($configVersions);
// 保存配置版本
$config1 = "debug: false\nport: 8080\n";
$v1 = $configManager->saveVersion('app.yaml', $config1, 'admin', '初始配置');
echo "保存版本: $v1\n";
$config2 = "debug: true\nport: 8080\nhost: localhost\n";
$v2 = $configManager->saveVersion('app.yaml', $config2, 'admin', '添加host配置');
echo "保存版本: $v2\n";
// 获取版本历史
$history = $configManager->getVersionHistory('app.yaml');
echo "\n版本历史:\n";
foreach ($history as $h) {
echo " v{$h['version']} by {$h['author']}: {$h['comment']}\n";
}
// 比较版本
$diff = $configManager->compareVersions('app.yaml', 1, 2);
echo "\n版本差异:\n";
echo " 新增行数: {$diff['diff']['added']}\n";
echo " 删除行数: {$diff['diff']['removed']}\n";运行结果:
保存版本: 1
保存版本: 2
版本历史:
v2 by admin: 添加host配置
v1 by admin: 初始配置
版本差异:
新增行数: 2
删除行数: 06.4 数据备份与恢复系统
场景描述
企业级数据备份系统,支持增量备份和快速恢复。
使用方法
使用Binary存储备份数据,记录备份元数据。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$backups = $client->test->data_backups;
$backups->drop();
$backups->createIndex(['backup_type' => 1, 'created_at' => -1]);
$backups->createIndex(['source' => 1]);
class BackupSystem
{
private $collection;
public function __construct($collection)
{
$this->collection = $collection;
}
// 全量备份
public function fullBackup(string $source, array $data, string $description = ''): string
{
$serialized = serialize($data);
$compressed = gzcompress($serialized, 9);
$result = $this->collection->insertOne([
'source' => $source,
'backup_type' => 'full',
'data' => new Binary($compressed, Binary::TYPE_GENERIC),
'hash' => hash('sha256', $serialized),
'original_size' => strlen($serialized),
'compressed_size' => strlen($compressed),
'description' => $description,
'created_at' => new UTCDateTime(),
]);
return (string)$result->getInsertedId();
}
// 增量备份
public function incrementalBackup(string $source, array $changes, string $baseBackupId): string
{
$serialized = serialize($changes);
$result = $this->collection->insertOne([
'source' => $source,
'backup_type' => 'incremental',
'data' => new Binary($serialized, Binary::TYPE_GENERIC),
'base_backup_id' => $baseBackupId,
'hash' => hash('sha256', $serialized),
'size' => strlen($serialized),
'created_at' => new UTCDateTime(),
]);
return (string)$result->getInsertedId();
}
// 恢复数据
public function restore(string $backupId): ?array
{
$backup = $this->collection->findOne([
'_id' => new MongoDB\BSON\ObjectId($backupId)
]);
if (!$backup) {
return null;
}
if ($backup['backup_type'] === 'full') {
$decompressed = gzuncompress($backup['data']->getData());
return unserialize($decompressed);
} else {
// 增量备份需要先恢复基础备份
$baseData = $this->restore($backup['base_backup_id']);
$changes = unserialize($backup['data']->getData());
return array_merge_recursive($baseData, $changes);
}
}
// 获取备份列表
public function listBackups(string $source = null, int $limit = 20): array
{
$query = $source ? ['source' => $source] : [];
return $this->collection->find($query, [
'projection' => ['data' => 0],
'sort' => ['created_at' => -1],
'limit' => $limit
])->toArray();
}
// 清理旧备份
public function cleanupOldBackups(int $keepDays = 30): int
{
$cutoff = new UTCDateTime((time() - $keepDays * 86400) * 1000);
$result = $this->collection->deleteMany([
'created_at' => ['$lt' => $cutoff]
]);
return $result->getDeletedCount();
}
// 验证备份完整性
public function verifyBackup(string $backupId): bool
{
$backup = $this->collection->findOne([
'_id' => new MongoDB\BSON\ObjectId($backupId)
]);
if (!$backup) {
return false;
}
if ($backup['backup_type'] === 'full') {
$decompressed = gzuncompress($backup['data']->getData());
$actualHash = hash('sha256', $decompressed);
return $actualHash === $backup['hash'];
}
return true;
}
}
// 使用示例
$backupSystem = new BackupSystem($backups);
// 全量备份
$data = [
'users' => [
['id' => 1, 'name' => '张三'],
['id' => 2, 'name' => '李四'],
],
'settings' => ['theme' => 'dark', 'language' => 'zh-CN'],
];
$backupId = $backupSystem->fullBackup('user_db', $data, '用户数据备份');
echo "全量备份成功: $backupId\n";
// 验证备份
$valid = $backupSystem->verifyBackup($backupId);
echo "备份验证: " . ($valid ? '通过' : '失败') . "\n";
// 恢复数据
$restored = $backupSystem->restore($backupId);
echo "恢复用户数量: " . count($restored['users']) . "\n";
// 获取备份列表
$backupList = $backupSystem->listBackups('user_db');
echo "备份数量: " . count($backupList) . "\n";运行结果:
全量备份成功: 65f3b2a0123456789abcdef0
备份验证: 通过
恢复用户数量: 2
备份数量: 17. 行业最佳实践
7.1 存储策略选择
实践内容
根据数据特征选择合适的存储策略。
推荐理由
- 优化存储效率
- 提高查询性能
- 降低运维成本
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
class StorageStrategySelector
{
private $client;
public function __construct()
{
$this->client = new Client('mongodb://localhost:27017');
}
public function selectStrategy(array $fileInfo): array
{
$size = $fileInfo['size'];
$mimeType = $fileInfo['mime_type'];
$accessPattern = $fileInfo['access_pattern'] ?? 'random';
// 策略一:小于4MB,使用Binary
if ($size <= 4 * 1024 * 1024) {
return [
'strategy' => 'binary',
'reason' => '小文件,直接嵌入文档',
'collection' => 'small_files',
];
}
// 策略二:4-16MB,根据访问模式选择
if ($size <= 16 * 1024 * 1024) {
if ($accessPattern === 'sequential') {
return [
'strategy' => 'gridfs',
'reason' => '中等文件,顺序访问适合GridFS',
'collection' => 'fs.files',
];
} else {
return [
'strategy' => 'binary_compressed',
'reason' => '中等文件,压缩后存储',
'collection' => 'medium_files',
];
}
}
// 策略三:大于16MB,必须使用GridFS
return [
'strategy' => 'gridfs',
'reason' => '大文件,必须使用GridFS',
'collection' => 'fs.files',
];
}
public function getRecommendationReport(array $files): array
{
$report = [
'binary' => 0,
'binary_compressed' => 0,
'gridfs' => 0,
'total_size' => 0,
];
foreach ($files as $file) {
$strategy = $this->selectStrategy($file);
$report[$strategy['strategy']]++;
$report['total_size'] += $file['size'];
}
return $report;
}
}
// 使用示例
$selector = new StorageStrategySelector();
$files = [
['size' => 100 * 1024, 'mime_type' => 'image/jpeg', 'access_pattern' => 'random'],
['size' => 8 * 1024 * 1024, 'mime_type' => 'video/mp4', 'access_pattern' => 'sequential'],
['size' => 20 * 1024 * 1024, 'mime_type' => 'application/pdf', 'access_pattern' => 'random'],
];
echo "=== 存储策略建议 ===\n\n";
foreach ($files as $file) {
$strategy = $selector->selectStrategy($file);
$sizeMB = round($file['size'] / 1024 / 1024, 2);
echo "文件大小: {$sizeMB}MB\n";
echo "推荐策略: {$strategy['strategy']}\n";
echo "原因: {$strategy['reason']}\n\n";
}运行结果:
=== 存储策略建议 ===
文件大小: 0.1MB
推荐策略: binary
原因: 小文件,直接嵌入文档
文件大小: 8MB
推荐策略: gridfs
原因: 中等文件,顺序访问适合GridFS
文件大小: 20MB
推荐策略: gridfs
原因: 大文件,必须使用GridFS7.2 安全存储实践
实践内容
确保二进制数据的存储安全。
推荐理由
- 保护敏感数据
- 符合合规要求
- 防止数据泄露
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
class SecureBinaryStorage
{
private $collection;
private $encryptionKey;
public function __construct($collection, string $encryptionKey)
{
$this->collection = $collection;
$this->encryptionKey = $encryptionKey;
}
// 安全存储
public function secureStore(string $data, array $metadata = []): string
{
// 1. 计算哈希
$hash = hash('sha256', $data);
// 2. 加密数据
$encrypted = $this->encrypt($data);
// 3. 存储加密数据
$result = $this->collection->insertOne([
'encrypted_data' => new Binary($encrypted['data'], Binary::TYPE_ENCRYPTED),
'iv' => new Binary($encrypted['iv'], Binary::TYPE_GENERIC),
'tag' => new Binary($encrypted['tag'], Binary::TYPE_GENERIC),
'hash' => $hash,
'metadata' => $metadata,
'created_at' => new MongoDB\BSON\UTCDateTime(),
]);
return (string)$result->getInsertedId();
}
// 安全读取
public function secureRetrieve(string $id): ?array
{
$doc = $this->collection->findOne([
'_id' => new MongoDB\BSON\ObjectId($id)
]);
if (!$doc) {
return null;
}
// 1. 解密数据
$decrypted = $this->decrypt([
'data' => $doc['encrypted_data']->getData(),
'iv' => $doc['iv']->getData(),
'tag' => $doc['tag']->getData(),
]);
// 2. 验证完整性
$actualHash = hash('sha256', $decrypted);
if ($actualHash !== $doc['hash']) {
throw new RuntimeException('数据完整性验证失败');
}
return [
'data' => $decrypted,
'metadata' => $doc['metadata'],
];
}
private function encrypt(string $data): array
{
$iv = random_bytes(12);
$tag = '';
$encrypted = openssl_encrypt(
$data,
'AES-256-GCM',
$this->encryptionKey,
OPENSSL_RAW_DATA,
$iv,
$tag
);
return [
'data' => $encrypted,
'iv' => $iv,
'tag' => $tag,
];
}
private function decrypt(array $encrypted): string
{
return openssl_decrypt(
$encrypted['data'],
'AES-256-GCM',
$this->encryptionKey,
OPENSSL_RAW_DATA,
$encrypted['iv'],
$encrypted['tag']
);
}
}
// 使用示例
$client = new Client('mongodb://localhost:27017');
$collection = $client->test->secure_files;
$encryptionKey = random_bytes(32); // 实际应用中从安全配置读取
$secureStorage = new SecureBinaryStorage($collection, $encryptionKey);
// 安全存储
$sensitiveData = "敏感信息内容";
$id = $secureStorage->secureStore($sensitiveData, ['type' => 'confidential']);
echo "安全存储成功: $id\n";
// 安全读取
$result = $secureStorage->secureRetrieve($id);
echo "读取数据: {$result['data']}\n";运行结果:
安全存储成功: 65f3b2a0123456789abcdef0
读取数据: 敏感信息内容7.3 性能优化实践
实践内容
优化二进制数据的存储和查询性能。
推荐理由
- 提高响应速度
- 降低资源消耗
- 改善用户体验
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
class BinaryPerformanceOptimizer
{
private $client;
public function __construct()
{
$this->client = new Client('mongodb://localhost:27017');
}
// 压缩存储
public function compressAndStore($collection, string $data, array $metadata = []): string
{
$compressed = gzcompress($data, 9);
$result = $collection->insertOne([
'data' => new Binary($compressed, Binary::TYPE_GENERIC),
'compressed' => true,
'original_size' => strlen($data),
'compressed_size' => strlen($compressed),
'metadata' => $metadata,
]);
return (string)$result->getInsertedId();
}
// 分块存储
public function chunkAndStore($collection, string $data, int $chunkSize = 1024 * 1024): array
{
$chunks = str_split($data, $chunkSize);
$ids = [];
foreach ($chunks as $index => $chunk) {
$result = $collection->insertOne([
'chunk_index' => $index,
'data' => new Binary($chunk, Binary::TYPE_GENERIC),
]);
$ids[] = (string)$result->getInsertedId();
}
return $ids;
}
// 延迟加载
public function lazyLoad($collection, string $id): array
{
// 先加载元数据
$metadata = $collection->findOne(
['_id' => new MongoDB\BSON\ObjectId($id)],
['projection' => ['data' => 0]]
);
return [
'metadata' => $metadata,
'load_data' => function() use ($collection, $id) {
$doc = $collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
return $doc['data']->getData();
}
];
}
// 缓存策略
public function cachedRetrieve($collection, string $id, &$cache): ?string
{
if (isset($cache[$id])) {
return $cache[$id];
}
$doc = $collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
if (!$doc) {
return null;
}
$data = $doc['data']->getData();
$cache[$id] = $data;
return $data;
}
}
// 使用示例
$optimizer = new BinaryPerformanceOptimizer();
$collection = $client->test->optimized_files;
// 压缩存储
$largeText = str_repeat("测试数据", 10000);
$id = $optimizer->compressAndStore($collection, $largeText);
echo "压缩存储成功: $id\n";
// 查看压缩率
$doc = $collection->findOne(['_id' => new MongoDB\BSON\ObjectId($id)]);
$ratio = round($doc['compressed_size'] / $doc['original_size'] * 100, 2);
echo "压缩率: {$ratio}%\n";运行结果:
压缩存储成功: 65f3b2a0123456789abcdef0
压缩率: 0.52%7.4 监控与告警
实践内容
建立Binary数据存储的监控体系。
推荐理由
- 及时发现问题
- 优化资源使用
- 保障系统稳定
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
class BinaryStorageMonitor
{
private $client;
public function __construct()
{
$this->client = new Client('mongodb://localhost:27017');
}
// 获取存储统计
public function getStorageStats(string $database, string $collection): array
{
$coll = $this->client->$database->$collection;
$stats = $coll->aggregate([
[
'$group' => [
'_id' => null,
'total_documents' => ['$sum' => 1],
'total_size' => ['$sum' => ['$size' => '$data']],
'avg_size' => ['$avg' => ['$size' => '$data']],
'max_size' => ['$max' => ['$size' => '$data']],
]
]
])->toArray();
return $stats[0] ?? [];
}
// 检查大文档
public function findLargeDocuments(string $database, string $collection, int $thresholdMB = 10): array
{
$coll = $this->client->$database->$collection;
$thresholdBytes = $thresholdMB * 1024 * 1024;
return $coll->find([
'$expr' => [
'$gt' => [['$size' => '$data'], $thresholdBytes]
]
], [
'projection' => ['data' => 0]
])->toArray();
}
// 检查重复数据
public function findDuplicates(string $database, string $collection): array
{
$coll = $this->client->$database->$collection;
return $coll->aggregate([
[
'$group' => [
'_id' => '$hash',
'count' => ['$sum' => 1],
'ids' => ['$push' => '$_id']
]
],
[
'$match' => ['count' => ['$gt' => 1]]
]
])->toArray();
}
// 生成监控报告
public function generateReport(string $database, array $collections): array
{
$report = [];
foreach ($collections as $collectionName) {
$stats = $this->getStorageStats($database, $collectionName);
$largeDocs = $this->findLargeDocuments($database, $collectionName);
$duplicates = $this->findDuplicates($database, $collectionName);
$report[$collectionName] = [
'total_documents' => $stats['total_documents'] ?? 0,
'total_size_mb' => round(($stats['total_size'] ?? 0) / 1024 / 1024, 2),
'avg_size_kb' => round(($stats['avg_size'] ?? 0) / 1024, 2),
'large_documents' => count($largeDocs),
'duplicate_groups' => count($duplicates),
];
}
return $report;
}
}
// 使用示例
$monitor = new BinaryStorageMonitor();
// 生成监控报告
$report = $monitor->generateReport('test', ['small_files', 'media_files']);
echo "=== Binary存储监控报告 ===\n\n";
foreach ($report as $collection => $stats) {
echo "集合: $collection\n";
echo " 文档数: {$stats['total_documents']}\n";
echo " 总大小: {$stats['total_size_mb']}MB\n";
echo " 平均大小: {$stats['avg_size_kb']}KB\n";
echo " 大文档数: {$stats['large_documents']}\n";
echo " 重复组数: {$stats['duplicate_groups']}\n\n";
}运行结果:
=== Binary存储监控报告 ===
集合: small_files
文档数: 0
总大小: 0MB
平均大小: 0KB
大文档数: 0
重复组数: 0
集合: media_files
文档数: 0
总大小: 0MB
平均大小: 0KB
大文档数: 0
重复组数: 08. 常见问题答疑(FAQ)
8.1 Binary类型和String类型有什么区别?
问题描述
Binary类型和String类型都可以存储数据,它们有什么区别?
回答内容
Binary类型和String类型的主要区别:
| 特性 | Binary类型 | String类型 |
|---|---|---|
| 存储内容 | 原始字节 | UTF-8字符串 |
| 编码 | 无编码转换 | UTF-8编码 |
| 适用场景 | 二进制数据 | 文本数据 |
| 索引支持 | 不推荐索引 | 可创建文本索引 |
| 查询方式 | 二进制比较 | 字符串匹配 |
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$collection = $client->test->type_comparison;
$collection->drop();
echo "=== Binary vs String 对比 ===\n\n";
// 相同数据的不同存储方式
$data = "Hello World";
// String类型存储
$collection->insertOne([
'type' => 'string',
'data_string' => $data,
]);
// Binary类型存储
$collection->insertOne([
'type' => 'binary',
'data_binary' => new Binary($data, Binary::TYPE_GENERIC),
]);
// 查询对比
$stringDoc = $collection->findOne(['type' => 'string']);
$binaryDoc = $collection->findOne(['type' => 'binary']);
echo "String类型:\n";
echo " 值: {$stringDoc['data_string']}\n";
echo " 类型: " . gettype($stringDoc['data_string']) . "\n";
echo "\nBinary类型:\n";
echo " 值: {$binaryDoc['data_binary']->getData()}\n";
echo " 类型: " . get_class($binaryDoc['data_binary']) . "\n";
// 使用场景说明
echo "\n=== 使用场景 ===\n";
echo "String类型:\n";
echo " - 文本内容(用户输入、日志、描述等)\n";
echo " - JSON数据\n";
echo " - XML/HTML内容\n";
echo " - 需要文本搜索的数据\n";
echo "\nBinary类型:\n";
echo " - 图片、音频、视频文件\n";
echo " - 加密数据\n";
echo " - 压缩数据\n";
echo " - 序列化对象\n";
echo " - UUID、哈希值\n";运行结果:
=== Binary vs String 对比 ===
String类型:
值: Hello World
类型: string
Binary类型:
值: Hello World
类型: MongoDB\BSON\Binary
=== 使用场景 ===
String类型:
- 文本内容(用户输入、日志、描述等)
- JSON数据
- XML/HTML内容
- 需要文本搜索的数据
Binary类型:
- 图片、音频、视频文件
- 加密数据
- 压缩数据
- 序列化对象
- UUID、哈希值8.2 如何在Binary和Base64之间转换?
问题描述
前端通常使用Base64编码传输二进制数据,如何在MongoDB中正确处理?
回答内容
MongoDB Shell中使用Base64编码表示Binary数据,PHP中需要正确转换。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$collection = $client->test->base64_demo;
$collection->drop();
echo "=== Binary与Base64转换 ===\n\n";
// 原始二进制数据
$binaryData = random_bytes(32);
echo "原始数据长度: " . strlen($binaryData) . " 字节\n";
// 转换为Base64(用于传输)
$base64String = base64_encode($binaryData);
echo "Base64编码: $base64String\n";
echo "Base64长度: " . strlen($base64String) . " 字符\n";
// 存储到MongoDB
$collection->insertOne([
'name' => 'test_data',
'data' => new Binary($binaryData, Binary::TYPE_GENERIC),
]);
// 从MongoDB读取
$doc = $collection->findOne(['name' => 'test_data']);
// 转换为Base64(返回给前端)
$retrievedBase64 = base64_encode($doc['data']->getData());
echo "\n读取后Base64: $retrievedBase64\n";
// 验证一致性
echo "数据一致: " . ($base64String === $retrievedBase64 ? '是' : '否') . "\n";
// 工具函数
class Base64BinaryConverter
{
// 从Base64字符串创建Binary
public static function fromBase64(string $base64, int $subtype = Binary::TYPE_GENERIC): Binary
{
$decoded = base64_decode($base64, true);
if ($decoded === false) {
throw new InvalidArgumentException('无效的Base64字符串');
}
return new Binary($decoded, $subtype);
}
// Binary转换为Base64
public static function toBase64(Binary $binary): string
{
return base64_encode($binary->getData());
}
// 验证Base64格式
public static function isValidBase64(string $string): bool
{
return base64_decode($string, true) !== false;
}
// 计算Base64编码后的大小
public static function calculateEncodedSize(int $originalSize): int
{
return (int)ceil($originalSize / 3) * 4;
}
}
// 使用工具类
echo "\n=== 使用转换工具类 ===\n";
$testBase64 = 'SGVsbG8gV29ybGQ=';
$binary = Base64BinaryConverter::fromBase64($testBase64);
echo "转换结果: " . $binary->getData() . "\n";
echo "转回Base64: " . Base64BinaryConverter::toBase64($binary) . "\n";运行结果:
=== Binary与Base64转换 ===
原始数据长度: 32 字节
Base64编码: abc123...xyz789
Base64长度: 44 字符
读取后Base64: abc123...xyz789
数据一致: 是
=== 使用转换工具类 ===
转换结果: Hello World
转回Base64: SGVsbG8gV29ybGQ=8.3 如何处理大文件上传?
问题描述
用户上传的文件可能很大,如何正确处理?
回答内容
根据文件大小选择不同的存储策略,大文件使用GridFS。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
class LargeFileHandler
{
private $client;
private $binaryThreshold = 4 * 1024 * 1024; // 4MB
public function __construct()
{
$this->client = new Client('mongodb://localhost:27017');
}
// 处理文件上传
public function handleUpload(string $filePath, array $metadata = []): array
{
$size = filesize($filePath);
// 检查文件大小
if ($size > 16 * 1024 * 1024) {
return $this->uploadToGridFS($filePath, $metadata);
} elseif ($size > $this->binaryThreshold) {
return $this->uploadCompressed($filePath, $metadata);
} else {
return $this->uploadBinary($filePath, $metadata);
}
}
// 小文件:直接存储
private function uploadBinary(string $filePath, array $metadata): array
{
$data = file_get_contents($filePath);
$hash = hash_file('sha256', $filePath);
$result = $this->client->test->files->insertOne([
'data' => new Binary($data, Binary::TYPE_GENERIC),
'hash' => $hash,
'size' => strlen($data),
'storage' => 'binary',
'metadata' => $metadata,
]);
return [
'id' => (string)$result->getInsertedId(),
'storage' => 'binary',
'size' => strlen($data),
];
}
// 中等文件:压缩存储
private function uploadCompressed(string $filePath, array $metadata): array
{
$data = file_get_contents($filePath);
$compressed = gzcompress($data, 9);
$hash = hash_file('sha256', $filePath);
$result = $this->client->test->files->insertOne([
'data' => new Binary($compressed, Binary::TYPE_GENERIC),
'hash' => $hash,
'original_size' => strlen($data),
'compressed_size' => strlen($compressed),
'storage' => 'compressed',
'metadata' => $metadata,
]);
return [
'id' => (string)$result->getInsertedId(),
'storage' => 'compressed',
'original_size' => strlen($data),
'compressed_size' => strlen($compressed),
];
}
// 大文件:GridFS
private function uploadToGridFS(string $filePath, array $metadata): array
{
$bucket = $this->client->test->selectGridFSBucket();
$stream = fopen($filePath, 'r');
$id = $bucket->uploadFromStream(basename($filePath), $stream, [
'metadata' => $metadata
]);
fclose($stream);
return [
'id' => (string)$id,
'storage' => 'gridfs',
];
}
// 下载文件
public function downloadFile(string $id, string $outputPath): bool
{
// 尝试从files集合读取
$doc = $this->client->test->files->findOne([
'_id' => new MongoDB\BSON\ObjectId($id)
]);
if ($doc) {
if ($doc['storage'] === 'compressed') {
$data = gzuncompress($doc['data']->getData());
} else {
$data = $doc['data']->getData();
}
file_put_contents($outputPath, $data);
return true;
}
// 尝试从GridFS下载
$bucket = $this->client->test->selectGridFSBucket();
try {
$stream = fopen($outputPath, 'w');
$bucket->downloadToStream(new MongoDB\BSON\ObjectId($id), $stream);
fclose($stream);
return true;
} catch (Exception $e) {
return false;
}
}
}
// 使用示例
$handler = new LargeFileHandler();
// 创建测试文件
$smallFile = '/tmp/small.bin';
file_put_contents($smallFile, str_repeat('x', 1 * 1024 * 1024));
$mediumFile = '/tmp/medium.bin';
file_put_contents($mediumFile, str_repeat('x', 8 * 1024 * 1024));
// 上传文件
$result1 = $handler->handleUpload($smallFile);
echo "小文件上传: {$result1['storage']}\n";
$result2 = $handler->handleUpload($mediumFile);
echo "中等文件上传: {$result2['storage']}\n";
// 清理
unlink($smallFile);
unlink($mediumFile);运行结果:
小文件上传: binary
中等文件上传: compressed8.4 如何实现数据去重?
问题描述
如何避免存储重复的二进制数据?
回答内容
使用内容寻址存储(CAS)模式,通过哈希值去重。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
class DeduplicationStorage
{
private $client;
private $filesCollection;
private $referencesCollection;
public function __construct()
{
$this->client = new Client('mongodb://localhost:27017');
$this->filesCollection = $this->client->test->dedup_files;
$this->referencesCollection = $this->client->test->dedup_references;
// 创建索引
$this->filesCollection->createIndex(['hash' => 1], ['unique' => true]);
$this->referencesCollection->createIndex(['file_hash' => 1]);
}
// 存储文件(自动去重)
public function store(string $data, string $filename, string $owner): string
{
$hash = hash('sha256', $data);
// 检查文件是否已存在
$existingFile = $this->filesCollection->findOne(['hash' => $hash]);
if ($existingFile) {
// 文件已存在,只创建引用
$fileId = $existingFile['_id'];
echo "文件已存在,创建引用\n";
} else {
// 存储新文件
$result = $this->filesCollection->insertOne([
'hash' => $hash,
'data' => new Binary($data, Binary::TYPE_GENERIC),
'size' => strlen($data),
'ref_count' => 0,
]);
$fileId = $result->getInsertedId();
echo "存储新文件\n";
}
// 创建引用
$refResult = $this->referencesCollection->insertOne([
'file_id' => $fileId,
'file_hash' => $hash,
'filename' => $filename,
'owner' => $owner,
'created_at' => new MongoDB\BSON\UTCDateTime(),
]);
// 更新引用计数
$this->filesCollection->updateOne(
['_id' => $fileId],
['$inc' => ['ref_count' => 1]]
);
return (string)$refResult->getInsertedId();
}
// 获取文件
public function retrieve(string $referenceId): ?array
{
$ref = $this->referencesCollection->findOne([
'_id' => new MongoDB\BSON\ObjectId($referenceId)
]);
if (!$ref) {
return null;
}
$file = $this->filesCollection->findOne(['_id' => $ref['file_id']]);
return [
'data' => $file['data']->getData(),
'filename' => $ref['filename'],
'size' => $file['size'],
];
}
// 删除引用
public function deleteReference(string $referenceId): bool
{
$ref = $this->referencesCollection->findOne([
'_id' => new MongoDB\BSON\ObjectId($referenceId)
]);
if (!$ref) {
return false;
}
// 删除引用
$this->referencesCollection->deleteOne(['_id' => $ref['_id']]);
// 减少引用计数
$result = $this->filesCollection->updateOne(
['_id' => $ref['file_id']],
['$inc' => ['ref_count' => -1]]
);
// 如果没有引用,删除文件
$file = $this->filesCollection->findOne(['_id' => $ref['file_id']]);
if ($file && $file['ref_count'] <= 0) {
$this->filesCollection->deleteOne(['_id' => $file['_id']]);
echo "文件已删除(无引用)\n";
}
return true;
}
// 获取统计
public function getStats(): array
{
$filesCount = $this->filesCollection->countDocuments([]);
$refsCount = $this->referencesCollection->countDocuments([]);
$totalSize = $this->filesCollection->aggregate([
['$group' => ['_id' => null, 'total' => ['$sum' => '$size']]]
])->toArray()[0]['total'] ?? 0;
return [
'unique_files' => $filesCount,
'total_references' => $refsCount,
'total_size' => $totalSize,
'saved_size' => $totalSize * ($refsCount - $filesCount) / $refsCount,
];
}
}
// 使用示例
$storage = new DeduplicationStorage();
// 存储相同内容
$data = "重复的文件内容";
$ref1 = $storage->store($data, 'file1.txt', 'user1');
$ref2 = $storage->store($data, 'file2.txt', 'user2');
// 查看统计
$stats = $storage->getStats();
echo "\n存储统计:\n";
echo " 唯一文件: {$stats['unique_files']}\n";
echo " 总引用: {$stats['total_references']}\n";
echo " 节省空间: " . round($stats['saved_size'] / 1024, 2) . "KB\n";运行结果:
存储新文件
文件已存在,创建引用
存储统计:
唯一文件: 1
总引用: 2
节省空间: 0.02KB8.5 Binary类型支持哪些索引?
问题描述
Binary字段可以创建索引吗?有什么限制?
回答内容
Binary字段本身不建议创建索引,但可以通过元数据字段优化查询。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
$client = new Client('mongodb://localhost:27017');
$collection = $client->test->binary_index_demo;
$collection->drop();
echo "=== Binary字段索引策略 ===\n\n";
// 不推荐:直接对Binary字段创建索引
// $collection->createIndex(['data' => 1]); // 不推荐
// 推荐:对元数据字段创建索引
$collection->createIndex(['filename' => 1]);
$collection->createIndex(['hash' => 1]);
$collection->createIndex(['mime_type' => 1]);
$collection->createIndex(['created_at' => -1]);
$collection->createIndex(['owner' => 1, 'created_at' => -1]);
// 插入文档
$collection->insertOne([
'filename' => 'document.pdf',
'data' => new Binary('%PDF-1.4...', Binary::TYPE_GENERIC),
'hash' => hash('sha256', '%PDF-1.4...'),
'mime_type' => 'application/pdf',
'size' => 12345,
'owner' => 'user123',
'created_at' => new MongoDB\BSON\UTCDateTime(),
]);
// 通过元数据查询(使用索引)
$doc = $collection->findOne(['filename' => 'document.pdf']);
echo "通过文件名查询: 找到文档\n";
$doc = $collection->findOne(['hash' => hash('sha256', '%PDF-1.4...')]);
echo "通过哈希查询: 找到文档\n";
// 查看索引
echo "\n创建的索引:\n";
$indexes = $collection->listIndexes();
foreach ($indexes as $index) {
echo " - {$index->getName()}: " . json_encode($index->getKey()) . "\n";
}
// 索引使用建议
echo "\n=== 索引使用建议 ===\n";
echo "1. 不要对Binary字段本身创建索引\n";
echo "2. 对常用查询字段创建索引(filename, hash, mime_type)\n";
echo "3. 使用哈希值快速定位文件\n";
echo "4. 复合索引优化多条件查询\n";
echo "5. 考虑使用部分索引减少索引大小\n";运行结果:
=== Binary字段索引策略 ===
通过文件名查询: 找到文档
通过哈希查询: 找到文档
创建的索引:
- _id_: {"_id":1}
- filename_1: {"filename":1}
- hash_1: {"hash":1}
- mime_type_1: {"mime_type":1}
- created_at_-1: {"created_at":-1}
- owner_1_created_at_-1: {"owner":1,"created_at":-1}
=== 索引使用建议 ===
1. 不要对Binary字段本身创建索引
2. 对常用查询字段创建索引(filename, hash, mime_type)
3. 使用哈希值快速定位文件
4. 复合索引优化多条件查询
5. 考虑使用部分索引减少索引大小8.6 如何迁移现有文件到MongoDB?
问题描述
如何将文件系统中的现有文件迁移到MongoDB?
回答内容
编写迁移脚本,批量导入文件并验证完整性。
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
class FileMigrationTool
{
private $client;
private $collection;
public function __construct()
{
$this->client = new Client('mongodb://localhost:27017');
$this->collection = $this->client->test->migrated_files;
// 创建索引
$this->collection->createIndex(['original_path' => 1], ['unique' => true]);
$this->collection->createIndex(['hash' => 1]);
}
// 迁移单个文件
public function migrateFile(string $filePath, array $extraMetadata = []): array
{
if (!file_exists($filePath)) {
return ['success' => false, 'error' => '文件不存在'];
}
$size = filesize($filePath);
// 检查文件大小
if ($size > 16 * 1024 * 1024) {
return ['success' => false, 'error' => '文件过大,请使用GridFS'];
}
$data = file_get_contents($filePath);
$hash = hash('sha256', $data);
// 检查是否已迁移
$existing = $this->collection->findOne(['original_path' => $filePath]);
if ($existing) {
return [
'success' => true,
'id' => (string)$existing['_id'],
'status' => 'already_exists'
];
}
// 存储文件
$result = $this->collection->insertOne([
'original_path' => $filePath,
'filename' => basename($filePath),
'data' => new Binary($data, Binary::TYPE_GENERIC),
'hash' => $hash,
'mime_type' => mime_content_type($filePath),
'size' => $size,
'migrated_at' => new MongoDB\BSON\UTCDateTime(),
'metadata' => $extraMetadata,
]);
return [
'success' => true,
'id' => (string)$result->getInsertedId(),
'status' => 'migrated',
'size' => $size,
];
}
// 批量迁移目录
public function migrateDirectory(string $directory, int $batchSize = 100): array
{
$results = [
'total' => 0,
'success' => 0,
'failed' => 0,
'skipped' => 0,
'errors' => [],
];
$iterator = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator($directory, RecursiveDirectoryIterator::SKIP_DOTS),
RecursiveIteratorIterator::SELF_FIRST
);
foreach ($iterator as $file) {
if ($file->isFile()) {
$results['total']++;
$result = $this->migrateFile($file->getPathname());
if ($result['success']) {
if ($result['status'] === 'migrated') {
$results['success']++;
} else {
$results['skipped']++;
}
} else {
$results['failed']++;
$results['errors'][] = [
'file' => $file->getPathname(),
'error' => $result['error']
];
}
}
}
return $results;
}
// 验证迁移结果
public function verifyMigration(string $id): bool
{
$doc = $this->collection->findOne([
'_id' => new MongoDB\BSON\ObjectId($id)
]);
if (!$doc) {
return false;
}
$data = $doc['data']->getData();
$actualHash = hash('sha256', $data);
return $actualHash === $doc['hash'];
}
// 导出文件
public function exportFile(string $id, string $outputPath): bool
{
$doc = $this->collection->findOne([
'_id' => new MongoDB\BSON\ObjectId($id)
]);
if (!$doc) {
return false;
}
file_put_contents($outputPath, $doc['data']->getData());
return true;
}
}
// 使用示例
$migration = new FileMigrationTool();
// 创建测试目录和文件
$testDir = '/tmp/migration_test';
mkdir($testDir);
file_put_contents("$testDir/file1.txt", "内容1");
file_put_contents("$testDir/file2.txt", "内容2");
// 迁移目录
$results = $migration->migrateDirectory($testDir);
echo "=== 迁移结果 ===\n";
echo "总文件数: {$results['total']}\n";
echo "成功: {$results['success']}\n";
echo "跳过: {$results['skipped']}\n";
echo "失败: {$results['failed']}\n";
// 清理
unlink("$testDir/file1.txt");
unlink("$testDir/file2.txt");
rmdir($testDir);运行结果:
=== 迁移结果 ===
总文件数: 2
成功: 2
跳过: 0
失败: 09. 实战练习
9.1 基础练习:图片水印系统
解题思路
- 读取图片二进制数据
- 使用GD库添加水印
- 存储处理后的图片
常见误区
- 忘记验证图片格式
- 水印位置计算错误
- 未保存原始图片
分步提示
- 创建图片上传和存储功能
- 实现水印添加功能
- 支持多种水印样式
参考代码
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
class ImageWatermarkSystem
{
private $collection;
public function __construct()
{
$client = new Client('mongodb://localhost:27017');
$this->collection = $client->test->watermarked_images;
}
public function addWatermark(string $imagePath, string $watermarkText): string
{
$imageData = file_get_contents($imagePath);
$image = imagecreatefromstring($imageData);
$width = imagesx($image);
$height = imagesy($image);
// 添加水印
$color = imagecolorallocatealpha($image, 255, 255, 255, 60);
$font = 5;
$textWidth = imagefontwidth($font) * strlen($watermarkText);
$x = $width - $textWidth - 10;
$y = $height - 10;
imagestring($image, $font, $x, $y, $watermarkText, $color);
// 输出到缓冲区
ob_start();
imagepng($image);
$watermarkedData = ob_get_clean();
imagedestroy($image);
// 存储
$result = $this->collection->insertOne([
'data' => new Binary($watermarkedData, Binary::TYPE_GENERIC),
'watermark' => $watermarkText,
'created_at' => new MongoDB\BSON\UTCDateTime(),
]);
return (string)$result->getInsertedId();
}
}
// 使用示例
$system = new ImageWatermarkSystem();
// 创建测试图片
$testImage = '/tmp/test.png';
$image = imagecreatetruecolor(200, 200);
imagefill($image, 0, 0, imagecolorallocate($image, 255, 255, 255));
imagepng($image, $testImage);
imagedestroy($image);
// 添加水印
$id = $system->addWatermark($testImage, 'Copyright 2024');
echo "水印添加成功: $id\n";
unlink($testImage);9.2 进阶练习:文件版本控制系统
解题思路
- 存储文件的多个版本
- 计算版本差异
- 支持版本回滚
常见误区
- 没有存储完整的版本历史
- 差异计算不准确
- 回滚逻辑错误
分步提示
- 设计版本数据结构
- 实现版本比较功能
- 实现回滚功能
参考代码
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
class FileVersionControl
{
private $collection;
public function __construct()
{
$client = new Client('mongodb://localhost:27017');
$this->collection = $client->test->file_versions;
$this->collection->createIndex(['file_id' => 1, 'version' => -1]);
}
public function saveVersion(string $fileId, string $content, string $author): int
{
$latest = $this->collection->findOne(
['file_id' => $fileId],
['sort' => ['version' => -1]]
);
$newVersion = $latest ? $latest['version'] + 1 : 1;
$this->collection->insertOne([
'file_id' => $fileId,
'version' => $newVersion,
'content' => new Binary($content, Binary::TYPE_GENERIC),
'hash' => hash('sha256', $content),
'author' => $author,
'created_at' => new MongoDB\BSON\UTCDateTime(),
]);
return $newVersion;
}
public function getVersion(string $fileId, int $version): ?string
{
$doc = $this->collection->findOne([
'file_id' => $fileId,
'version' => $version,
]);
return $doc ? $doc['content']->getData() : null;
}
public function getHistory(string $fileId): array
{
return $this->collection->find(
['file_id' => $fileId],
['projection' => ['content' => 0], 'sort' => ['version' => -1]]
)->toArray();
}
}
// 使用示例
$vc = new FileVersionControl();
$v1 = $vc->saveVersion('doc1', '版本1内容', 'user1');
echo "保存版本: $v1\n";
$v2 = $vc->saveVersion('doc1', '版本2内容', 'user1');
echo "保存版本: $v2\n";
$content = $vc->getVersion('doc1', 1);
echo "版本1内容: $content\n";9.3 挑战练习:分布式文件存储系统
解题思路
- 实现文件分片存储
- 支持并行上传下载
- 实现断点续传
常见误区
- 分片大小不合理
- 并发控制不当
- 完整性验证缺失
分步提示
- 设计分片数据结构
- 实现分片上传
- 实现分片合并和验证
参考代码
php
<?php
require_once 'vendor/autoload.php';
use MongoDB\BSON\Binary;
use MongoDB\Client;
class DistributedFileStorage
{
private $client;
private $chunks;
private $files;
private $chunkSize = 1024 * 1024; // 1MB per chunk
public function __construct()
{
$this->client = new Client('mongodb://localhost:27017');
$this->chunks = $this->client->test->file_chunks;
$this->files = $this->client->test->file_metadata;
$this->chunks->createIndex(['file_id' => 1, 'chunk_index' => 1]);
}
public function uploadChunk(string $fileId, int $chunkIndex, string $data): bool
{
$this->chunks->insertOne([
'file_id' => $fileId,
'chunk_index' => $chunkIndex,
'data' => new Binary($data, Binary::TYPE_GENERIC),
'hash' => hash('sha256', $data),
]);
return true;
}
public function completeUpload(string $fileId, int $totalChunks, string $filename): void
{
$this->files->insertOne([
'_id' => $fileId,
'filename' => $filename,
'total_chunks' => $totalChunks,
'created_at' => new MongoDB\BSON\UTCDateTime(),
]);
}
public function downloadChunk(string $fileId, int $chunkIndex): ?string
{
$chunk = $this->chunks->findOne([
'file_id' => $fileId,
'chunk_index' => $chunkIndex,
]);
return $chunk ? $chunk['data']->getData() : null;
}
public function getFileSize(string $fileId): int
{
$chunks = $this->chunks->find(['file_id' => $fileId])->toArray();
$size = 0;
foreach ($chunks as $chunk) {
$size += strlen($chunk['data']->getData());
}
return $size;
}
}
// 使用示例
$dfs = new DistributedFileStorage();
// 上传分片
$data = str_repeat('x', 2 * 1024 * 1024); // 2MB
$chunk1 = substr($data, 0, 1024 * 1024);
$chunk2 = substr($data, 1024 * 1024);
$dfs->uploadChunk('file1', 0, $chunk1);
$dfs->uploadChunk('file1', 1, $chunk2);
$dfs->completeUpload('file1', 2, 'test.bin');
echo "文件大小: " . $dfs->getFileSize('file1') . " 字节\n";10. 知识点总结
10.1 核心要点
存储机制
- Binary类型使用长度前缀存储二进制数据
- 支持多种子类型标识数据语义
- BSON文档大小限制为16MB
PHP操作
- 使用MongoDB\BSON\Binary类
- 支持从字符串、Base64、十六进制创建
- 通过getData()获取原始二进制数据
子类型选择
- TYPE_GENERIC (0):通用二进制数据
- TYPE_UUID (4):UUID标识符
- TYPE_MD5 (5):MD5哈希值
- TYPE_ENCRYPTED (6):加密数据
性能优化
- 小于4MB使用Binary
- 大于4MB考虑GridFS
- 使用压缩减少存储空间
- 通过哈希值去重
10.2 易错点回顾
忘记指定子类型
php// 错误:使用默认子类型可能不符合预期 $binary = new Binary($data); // 正确:明确指定子类型 $binary = new Binary($data, Binary::TYPE_GENERIC);Base64编码混淆
php// 错误:直接存储Base64字符串 $binary = new Binary($base64String, Binary::TYPE_GENERIC); // 正确:先解码再存储 $binary = new Binary(base64_decode($base64String), Binary::TYPE_GENERIC);大小限制忽视
php// 错误:存储超大文件 $largeFile = file_get_contents('video.mp4'); // 100MB $binary = new Binary($largeFile, Binary::TYPE_GENERIC); // 正确:使用GridFS或分片存储 $bucket = new GridFSBucket($manager, 'database'); $stream = $bucket->openUploadStream('video.mp4'); fwrite($stream, $largeFile);UUID格式错误
php// 错误:直接存储UUID字符串 $uuid = '550e8400-e29b-41d4-a716-446655440000'; $binary = new Binary($uuid, Binary::TYPE_UUID); // 正确:转换为二进制格式 $uuidBytes = pack('H*', str_replace('-', '', $uuid)); $binary = new Binary($uuidBytes, Binary::TYPE_UUID);查询条件错误
php// 错误:直接使用字符串查询 $result = $collection->findOne(['data' => $stringData]); // 正确:使用Binary对象查询 $result = $collection->findOne(['data' => new Binary($stringData, Binary::TYPE_GENERIC)]);
11. 拓展参考资料
11.1 官方文档
11.2 学习路径
基础阶段
- 掌握Binary类型的基本操作
- 理解子类型的含义和选择
- 学会Base64和十六进制编码转换
进阶阶段
- 掌握GridFS的使用场景
- 学习二进制数据的压缩和加密
- 理解内容寻址存储模式
高级阶段
- 分布式文件存储设计
- 二进制数据的安全传输
- 性能优化和监控
11.3 相关知识点
- GridFS:MongoDB的大文件存储解决方案
- BSON编码:二进制JSON格式规范
- 内容寻址存储:通过哈希值去重的存储模式
- 数据加密:AES-256-GCM等加密算法
- 数字签名:RSA签名验证数据完整性
