Skip to content

String类型

概述

String(字符串)是MongoDB中最基础且最常用的数据类型,用于存储文本数据。在MongoDB的BSON格式中,字符串使用UTF-8编码,支持多种语言字符集,包括中文、英文、特殊字符等。字符串类型在用户信息存储、内容管理、配置参数等场景中广泛应用,是构建文本内容的基础数据类型。

理解字符串类型的特性、存储机制和最佳实践,对于设计高效的文档结构、优化查询性能、确保数据完整性至关重要。本章节将全面介绍MongoDB中字符串类型的使用方法和注意事项。

基本概念

字符串类型特性

MongoDB字符串类型具有以下核心特性:

1. UTF-8编码支持

  • 支持多语言字符集(中文、英文、日文、韩文等)
  • 自动处理Unicode字符
  • 支持Emoji表情符号

2. 动态长度

  • 字符串长度可变
  • 最大受文档大小限制(16MB)
  • 存储空间:4字节(长度)+ 字符串字节数 + 1字节(终止符)

3. 索引支持

  • 常规B-tree索引
  • 文本索引(Text Index)
  • 通配符索引

4. 查询能力

  • 精确匹配查询
  • 正则表达式模式匹配
  • 全文搜索
  • 字典序范围查询

字符串类型语法

php
<?php
// 场景说明:演示MongoDB字符串类型的基本语法和使用方式

// 1. 连接MongoDB数据库
$client = new MongoDB\Client("mongodb://localhost:27017");
$database = $client->selectDatabase("testdb");
$collection = $database->selectCollection("string_examples");

// 2. 插入不同类型的字符串数据
$document = [
    'basic_string' => 'Hello World',                    // 基本字符串
    'chinese_string' => '你好,世界',                    // 中文字符串
    'unicode_string' => 'こんにちは世界',                // Unicode字符串
    'emoji_string' => 'Hello 🌍🎉',                      // 包含Emoji的字符串
    'empty_string' => '',                                // 空字符串
    'long_string' => str_repeat('A', 10000),            // 长字符串
    'special_chars' => 'Special: @#$%^&*()',            // 特殊字符
    'multiline' => "Line 1\nLine 2\nLine 3"             // 多行字符串
];

// 关键行注释:插入包含各种字符串类型的文档
$result = $collection->insertOne($document);
echo "插入成功,文档ID: " . $result->getInsertedId() . "\n";

// 3. 查询字符串数据
// 精确匹配查询
$exactMatch = $collection->findOne(['basic_string' => 'Hello World']);
echo "精确匹配结果: " . json_encode($exactMatch, JSON_UNESCAPED_UNICODE) . "\n";

// 正则表达式查询
$regex = new MongoDB\BSON\Regex('^Hello', 'i');  // 不区分大小写的前缀匹配
$regexResults = $collection->find(['basic_string' => $regex])->toArray();
echo "正则匹配结果数量: " . count($regexResults) . "\n";

// 4. 字符串长度查询
$longStringQuery = $collection->find([
    'long_string' => ['$exists' => true]
])->toArray();
echo "长字符串文档数量: " . count($longStringQuery) . "\n";

// 运行结果展示:
// 插入成功,文档ID: 507f1f77bcf86cd799439011
// 精确匹配结果: {"_id":"507f1f77bcf86cd799439011","basic_string":"Hello World",...}
// 正则匹配结果数量: 1
// 长字符串文档数量: 1
?>

常见改法对比

php
<?php
// 错误示例:使用错误的编码
$wrongEncoding = mb_convert_encoding('你好', 'ISO-8859-1', 'UTF-8');  // 错误:编码不匹配
$collection->insertOne(['text' => $wrongEncoding]);

// 正确示例:确保UTF-8编码
$correctEncoding = '你好';
if (!mb_check_encoding($correctEncoding, 'UTF-8')) {
    $correctEncoding = mb_convert_encoding($correctEncoding, 'UTF-8');
}
$collection->insertOne(['text' => $correctEncoding]);

// 错误示例:字符串过长导致文档超出限制
$tooLongString = str_repeat('A', 20 * 1024 * 1024);  // 错误:超过16MB限制
$collection->insertOne(['content' => $tooLongString]);

// 正确示例:使用GridFS存储大文本
$bucket = new MongoDB\GridFS\Bucket($database);
$stream = $bucket->openUploadStream('large_file.txt');
fwrite($stream, $tooLongString);
fclose($stream);
?>

字符串类型规范

1. 命名规范

  • 字段名使用小写字母和下划线
  • 避免使用MongoDB保留字
  • 保持命名一致性

2. 存储规范

  • 统一使用UTF-8编码
  • 对长文本考虑压缩存储
  • 超大文本使用GridFS

3. 查询规范

  • 为常用查询字段创建索引
  • 避免使用未锚定的正则表达式
  • 使用文本索引进行全文搜索

原理深度解析

字符串存储机制

MongoDB使用BSON格式存储字符串,存储结构包含以下部分:

存储结构

[长度(4字节)] + [UTF-8编码的字节序列] + [终止符(1字节)]

编码处理流程

  1. 字符串转换为UTF-8字节序列
  2. 计算字节长度并存储
  3. 添加终止符
php
<?php
// 场景说明:深入分析字符串的存储机制和编码处理

class StringStorageAnalyzer {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 分析不同字符串的存储特性
    public function analyzeStringStorage() {
        $collection = $this->database->selectCollection('string_storage_analysis');
        
        $testStrings = [
            'ascii' => 'Hello',
            'chinese' => '你好世界',
            'emoji' => '😀🎉',
            'mixed' => 'Hello 世界 🌍'
        ];
        
        $analysis = [];
        foreach ($testStrings as $type => $string) {
            $analysis[$type] = [
                'value' => $string,
                'char_length' => mb_strlen($string, 'UTF-8'),      // 字符数
                'byte_length' => strlen($string),                   // 字节数
                'bytes_per_char' => strlen($string) / max(mb_strlen($string, 'UTF-8'), 1),
                'is_valid_utf8' => mb_check_encoding($string, 'UTF-8'),
                'bson_overhead' => 5  // 4字节长度 + 1字节终止符
            ];
        }
        
        // 关键行注释:插入分析结果到数据库
        $collection->insertOne([
            'analysis' => $analysis,
            'created_at' => new MongoDB\BSON\UTCDateTime()
        ]);
        
        return $analysis;
    }
    
    // 演示字符串索引的工作原理
    public function demonstrateStringIndexing() {
        $collection = $this->database->selectCollection('string_indexing_demo');
        
        // 创建不同类型的索引
        $collection->createIndex(['username' => 1], ['name' => 'username_regular_idx']);
        $collection->createIndex(['content' => 'text'], ['name' => 'content_text_idx']);
        
        // 插入测试数据
        for ($i = 0; $i < 1000; $i++) {
            $collection->insertOne([
                'username' => 'user_' . str_pad($i, 4, '0', STR_PAD_LEFT),
                'content' => "This is content for user $i with some keywords"
            ]);
        }
        
        // 测试不同查询的性能
        $startTime = microtime(true);
        $exactResult = $collection->findOne(['username' => 'user_0500']);
        $exactTime = microtime(true) - $startTime;
        
        $startTime = microtime(true);
        $textResult = $collection->find(['$text' => ['$search' => 'keywords']])->toArray();
        $textTime = microtime(true) - $startTime;
        
        return [
            'exact_query_time' => $exactTime,
            'text_search_time' => $textTime,
            'exact_results' => count([$exactResult]),
            'text_results' => count($textResult)
        ];
    }
}

// 使用示例
$analyzer = new StringStorageAnalyzer('testdb');
$storageAnalysis = $analyzer->analyzeStringStorage();
print_r($storageAnalysis);

$indexingDemo = $analyzer->demonstrateStringIndexing();
print_r($indexingDemo);

// 运行结果展示:
// Array (
//     [ascii] => Array (
//         [value] => Hello
//         [char_length] => 5
//         [byte_length] => 5
//         [bytes_per_char] => 1
//         [is_valid_utf8] => 1
//         [bson_overhead] => 5
//     )
//     [chinese] => Array (
//         [value] => 你好世界
//         [char_length] => 4
//         [byte_length] => 12
//         [bytes_per_char] => 3
//         [is_valid_utf8] => 1
//         [bson_overhead] => 5
//     )
//     ...
// )
?>

字符串查询优化原理

索引选择策略

  1. 精确匹配查询:使用常规B-tree索引
  2. 前缀匹配:使用锚定正则表达式,可利用索引
  3. 包含匹配:使用文本索引或全文搜索
  4. 范围查询:使用字典序索引

查询优化技术

php
<?php
// 场景说明:演示字符串查询的各种优化技术

class StringQueryOptimizer {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 优化1:使用索引的精确匹配
    public function optimizedExactMatch() {
        $collection = $this->database->selectCollection('optimized_queries');
        $collection->createIndex(['email' => 1], ['unique' => true]);
        
        // 关键行注释:精确匹配使用索引,性能最优
        $result = $collection->findOne(['email' => 'user@example.com']);
        return $result;
    }
    
    // 优化2:使用锚定正则表达式
    public function optimizedRegexQuery() {
        $collection = $this->database->selectCollection('optimized_queries');
        $collection->createIndex(['username' => 1]);
        
        // 锚定正则表达式(^前缀),可以使用索引
        $anchoredRegex = new MongoDB\BSON\Regex('^admin', 'i');
        $results = $collection->find(['username' => $anchoredRegex])->toArray();
        
        // 未锚定正则表达式,无法使用索引,性能较差
        $unanchoredRegex = new MongoDB\BSON\Regex('admin', 'i');
        $slowResults = $collection->find(['username' => $unanchoredRegex])->toArray();
        
        return [
            'anchored_count' => count($results),
            'unanchored_count' => count($slowResults),
            'performance_note' => '锚定正则表达式性能更优'
        ];
    }
    
    // 优化3:使用文本索引进行全文搜索
    public function optimizedTextSearch() {
        $collection = $this->database->selectCollection('text_search_demo');
        
        // 创建文本索引
        $collection->createIndex([
            'title' => 'text',
            'content' => 'text'
        ], [
            'weights' => [
                'title' => 10,    // 标题权重更高
                'content' => 5
            ],
            'name' => 'content_text_idx'
        ]);
        
        // 插入测试数据
        $collection->insertMany([
            ['title' => 'MongoDB Tutorial', 'content' => 'Learn MongoDB basics'],
            ['title' => 'Advanced MongoDB', 'content' => 'Deep dive into MongoDB'],
            ['title' => 'PHP Guide', 'content' => 'PHP programming with MongoDB']
        ]);
        
        // 关键行注释:使用文本索引进行高效全文搜索
        $searchResults = $collection->find([
            '$text' => ['$search' => 'MongoDB tutorial']
        ], [
            'projection' => [
                'title' => 1,
                'content' => 1,
                'score' => ['$meta' => 'textScore']
            ],
            'sort' => ['score' => ['$meta' => 'textScore']]
        ])->toArray();
        
        return $searchResults;
    }
}

$optimizer = new StringQueryOptimizer('testdb');
print_r($optimizer->optimizedExactMatch());
print_r($optimizer->optimizedRegexQuery());
print_r($optimizer->optimizedTextSearch());
?>

常见错误与踩坑点

错误1:字符串编码不一致导致查询失败

错误表现

  • 插入的中文字符串查询不到
  • 字符串显示乱码
  • 正则表达式匹配失败

产生原因

  • 字符串编码不是UTF-8
  • 不同编码混用
  • 未正确处理编码转换

解决方案

php
<?php
// 场景说明:演示字符串编码问题的正确处理方式

class EncodingHandler {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 错误示例:未检查编码直接存储
    public function wrongEncodingExample() {
        $collection = $this->database->selectCollection('encoding_errors');
        
        // 错误:假设输入可能是任意编码
        $inputString = mb_convert_encoding('你好', 'ISO-8859-1', 'UTF-8');
        
        try {
            $collection->insertOne(['text' => $inputString]);
            echo "错误:插入了错误编码的字符串\n";
        } catch (Exception $e) {
            echo "捕获异常: " . $e->getMessage() . "\n";
        }
    }
    
    // 正确示例:确保UTF-8编码
    public function correctEncodingExample() {
        $collection = $this->database->selectCollection('encoding_correct');
        
        $inputString = '你好世界';
        
        // 关键行注释:检查并确保UTF-8编码
        if (!mb_check_encoding($inputString, 'UTF-8')) {
            $inputString = mb_convert_encoding($inputString, 'UTF-8', 'auto');
        }
        
        $collection->insertOne([
            'text' => $inputString,
            'encoding' => 'UTF-8',
            'char_count' => mb_strlen($inputString, 'UTF-8'),
            'byte_count' => strlen($inputString)
        ]);
        
        echo "正确:插入了UTF-8编码的字符串\n";
        
        // 验证查询
        $result = $collection->findOne(['text' => '你好世界']);
        echo "查询成功: " . $result['text'] . "\n";
    }
    
    // 编码验证工具函数
    public function validateAndFixEncoding($string) {
        // 检测编码
        $encoding = mb_detect_encoding($string, ['UTF-8', 'GBK', 'BIG5', 'ISO-8859-1']);
        
        // 如果不是UTF-8,转换为UTF-8
        if ($encoding !== 'UTF-8') {
            $string = mb_convert_encoding($string, 'UTF-8', $encoding);
        }
        
        // 验证转换结果
        if (!mb_check_encoding($string, 'UTF-8')) {
            throw new Exception("无法转换为有效的UTF-8编码");
        }
        
        return $string;
    }
}

$handler = new EncodingHandler('testdb');
$handler->wrongEncodingExample();
$handler->correctEncodingExample();
?>

错误2:字符串长度超出限制

错误表现

  • 插入文档时报错"document is too large"
  • 单个字符串字段导致整个文档超过16MB限制

产生原因

  • 未考虑文档大小限制
  • 直接存储超大文本
  • 未使用GridFS处理大文件

解决方案

php
<?php
// 场景说明:演示如何正确处理超长字符串

class LongStringHandler {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 错误示例:直接存储超大字符串
    public function wrongLongStringExample() {
        $collection = $this->database->selectCollection('long_string_errors');
        
        // 错误:创建超过16MB的字符串
        $hugeString = str_repeat('A', 20 * 1024 * 1024);  // 20MB
        
        try {
            $collection->insertOne(['content' => $hugeString]);
            echo "错误:不应该成功插入\n";
        } catch (MongoDB\Driver\Exception\RuntimeException $e) {
            echo "捕获异常: " . $e->getMessage() . "\n";
        }
    }
    
    // 正确示例:使用GridFS存储大文本
    public function correctLongStringExample() {
        $bucket = new MongoDB\GridFS\Bucket($this->database);
        
        $largeContent = str_repeat('Large content. ', 1000000);  // 约15MB
        
        // 关键行注释:使用GridFS存储大文件
        $stream = $bucket->openUploadStream('large_text_file.txt', [
            'metadata' => [
                'type' => 'text',
                'created_at' => new MongoDB\BSON\UTCDateTime(),
                'size' => strlen($largeContent)
            ]
        ]);
        
        fwrite($stream, $largeContent);
        fclose($stream);
        
        echo "正确:使用GridFS存储了大文本\n";
        
        // 读取大文本
        $downloadStream = $bucket->openDownloadStreamByName('large_text_file.txt');
        $content = stream_get_contents($downloadStream);
        fclose($downloadStream);
        
        echo "读取成功,内容长度: " . strlen($content) . " 字节\n";
    }
    
    // 智能选择存储方式
    public function smartStringStorage($string, $metadata = []) {
        $maxDocumentSize = 15 * 1024 * 1024;  // 15MB(留出空间给其他字段)
        
        if (strlen($string) > $maxDocumentSize) {
            // 使用GridFS
            $bucket = new MongoDB\GridFS\Bucket($this->database);
            $filename = 'string_' . uniqid() . '.txt';
            
            $stream = $bucket->openUploadStream($filename, [
                'metadata' => array_merge($metadata, [
                    'storage_type' => 'gridfs',
                    'size' => strlen($string)
                ])
            ]);
            
            fwrite($stream, $string);
            fclose($stream);
            
            return [
                'storage_type' => 'gridfs',
                'filename' => $filename,
                'size' => strlen($string)
            ];
        } else {
            // 直接存储
            $collection = $this->database->selectCollection('direct_strings');
            $result = $collection->insertOne([
                'content' => $string,
                'metadata' => $metadata,
                'storage_type' => 'direct',
                'size' => strlen($string)
            ]);
            
            return [
                'storage_type' => 'direct',
                'document_id' => (string)$result->getInsertedId(),
                'size' => strlen($string)
            ];
        }
    }
}

$handler = new LongStringHandler('testdb');
$handler->wrongLongStringExample();
$handler->correctLongStringExample();
?>

错误3:正则表达式查询性能低下

错误表现

  • 正则表达式查询非常慢
  • 全表扫描导致CPU占用高
  • 查询超时

产生原因

  • 使用未锚定的正则表达式
  • 正则表达式过于复杂
  • 未创建合适的索引

解决方案

php
<?php
// 场景说明:演示正则表达式查询的性能优化

class RegexQueryOptimizer {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 性能对比测试
    public function performanceComparison() {
        $collection = $this->database->selectCollection('regex_performance');
        $collection->createIndex(['username' => 1]);
        
        // 插入测试数据
        for ($i = 0; $i < 10000; $i++) {
            $collection->insertOne([
                'username' => 'user_' . str_pad($i, 5, '0', STR_PAD_LEFT),
                'email' => "user{$i}@example.com"
            ]);
        }
        
        // 测试1:未锚定正则表达式(慢)
        $startTime = microtime(true);
        $unanchoredRegex = new MongoDB\BSON\Regex('12345');
        $slowResults = $collection->find(['username' => $unanchoredRegex])->toArray();
        $slowTime = microtime(true) - $startTime;
        
        // 测试2:锚定正则表达式(快)
        $startTime = microtime(true);
        $anchoredRegex = new MongoDB\BSON\Regex('^user_0012345');
        $fastResults = $collection->find(['username' => $anchoredRegex])->toArray();
        $fastTime = microtime(true) - $startTime;
        
        // 测试3:精确匹配(最快)
        $startTime = microtime(true);
        $exactResults = $collection->find(['username' => 'user_01234'])->toArray();
        $exactTime = microtime(true) - $startTime;
        
        return [
            'unanchored_regex' => [
                'time' => $slowTime,
                'count' => count($slowResults),
                'performance' => '慢(全表扫描)'
            ],
            'anchored_regex' => [
                'time' => $fastTime,
                'count' => count($fastResults),
                'performance' => '较快(使用索引)'
            ],
            'exact_match' => [
                'time' => $exactTime,
                'count' => count($exactResults),
                'performance' => '最快(索引查找)'
            ]
        ];
    }
    
    // 正则表达式优化建议
    public function regexOptimizationTips() {
        return [
            'tip1' => '使用锚定符(^和$)限制匹配范围',
            'tip2' => '避免使用.*开头的模式',
            'tip3' => '使用区分大小写的正则表达式(去掉i标志)',
            'tip4' => '考虑使用文本索引替代复杂正则',
            'tip5' => '对常查询字段创建索引'
        ];
    }
}

$optimizer = new RegexQueryOptimizer('testdb');
$performance = $optimizer->performanceComparison();
print_r($performance);
print_r($optimizer->regexOptimizationTips());
?>

常见应用场景

场景1:用户信息存储

场景描述:存储用户的基本信息,包括用户名、邮箱、个人简介等字符串数据。

使用方法

  • 使用唯一索引确保用户名和邮箱唯一
  • 对个人简介创建文本索引支持搜索
  • 使用正则表达式验证邮箱格式

示例代码

php
<?php
// 场景说明:用户信息存储系统的完整实现

class UserInformationSystem {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
        $this->setupIndexes();
    }
    
    // 创建索引
    private function setupIndexes() {
        $collection = $this->database->selectCollection('users');
        
        // 关键行注释:创建唯一索引确保用户名和邮箱唯一
        $collection->createIndex(['username' => 1], ['unique' => true]);
        $collection->createIndex(['email' => 1], ['unique' => true]);
        $collection->createIndex(['bio' => 'text']);  // 文本索引支持搜索
    }
    
    // 创建用户
    public function createUser($userData) {
        $collection = $this->database->selectCollection('users');
        
        // 数据验证
        if (!$this->validateEmail($userData['email'])) {
            throw new Exception("邮箱格式不正确");
        }
        
        if (!$this->validateUsername($userData['username'])) {
            throw new Exception("用户名格式不正确");
        }
        
        $userDocument = [
            'username' => strtolower(trim($userData['username'])),
            'email' => strtolower(trim($userData['email'])),
            'display_name' => trim($userData['display_name']),
            'bio' => trim($userData['bio'] ?? ''),
            'location' => trim($userData['location'] ?? ''),
            'website' => trim($userData['website'] ?? ''),
            'created_at' => new MongoDB\BSON\UTCDateTime(),
            'updated_at' => new MongoDB\BSON\UTCDateTime()
        ];
        
        try {
            $result = $collection->insertOne($userDocument);
            return [
                'success' => true,
                'user_id' => (string)$result->getInsertedId()
            ];
        } catch (MongoDB\Driver\Exception\BulkWriteException $e) {
            return [
                'success' => false,
                'error' => '用户名或邮箱已存在'
            ];
        }
    }
    
    // 搜索用户
    public function searchUsers($query) {
        $collection = $this->database->selectCollection('users');
        
        // 关键行注释:使用文本索引进行全文搜索
        $results = $collection->find([
            '$text' => ['$search' => $query]
        ], [
            'projection' => [
                'username' => 1,
                'display_name' => 1,
                'bio' => 1,
                'score' => ['$meta' => 'textScore']
            ],
            'sort' => ['score' => ['$meta' => 'textScore']],
            'limit' => 20
        ])->toArray();
        
        return $results;
    }
    
    // 验证邮箱格式
    private function validateEmail($email) {
        return filter_var($email, FILTER_VALIDATE_EMAIL) !== false;
    }
    
    // 验证用户名格式
    private function validateUsername($username) {
        // 只允许字母、数字、下划线,长度3-20
        return preg_match('/^[a-zA-Z0-9_]{3,20}$/', $username);
    }
}

// 使用示例
$userSystem = new UserInformationSystem('userdb');

// 创建用户
$createResult = $userSystem->createUser([
    'username' => 'john_doe',
    'email' => 'john@example.com',
    'display_name' => 'John Doe',
    'bio' => 'Software developer interested in MongoDB and PHP',
    'location' => 'New York, USA'
]);

print_r($createResult);

// 搜索用户
$searchResults = $userSystem->searchUsers('MongoDB developer');
print_r($searchResults);
?>

场景2:内容管理系统

场景描述:存储文章、博客等文本内容,支持全文搜索和标签管理。

使用方法

  • 创建文本索引支持内容搜索
  • 使用数组存储标签
  • 生成URL友好的slug

示例代码

php
<?php
// 场景说明:内容管理系统的文章存储和搜索功能

class ContentManagementSystem {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
        $this->setupIndexes();
    }
    
    private function setupIndexes() {
        $collection = $this->database->selectCollection('articles');
        
        // 关键行注释:创建文本索引支持全文搜索
        $collection->createIndex(
            ['title' => 'text', 'content' => 'text', 'tags' => 'text'],
            [
                'weights' => [
                    'title' => 10,    // 标题权重最高
                    'tags' => 5,      // 标签权重中等
                    'content' => 1    // 内容权重最低
                ],
                'name' => 'article_text_idx'
            ]
        );
        
        // 创建唯一slug索引
        $collection->createIndex(['slug' => 1], ['unique' => true]);
    }
    
    // 创建文章
    public function createArticle($articleData) {
        $collection = $this->database->selectCollection('articles');
        
        $slug = $this->generateSlug($articleData['title']);
        
        $articleDocument = [
            'title' => trim($articleData['title']),
            'slug' => $slug,
            'content' => $articleData['content'],
            'excerpt' => mb_substr(strip_tags($articleData['content']), 0, 200, 'UTF-8'),
            'author_id' => $articleData['author_id'],
            'tags' => $articleData['tags'] ?? [],
            'status' => $articleData['status'] ?? 'draft',
            'view_count' => 0,
            'created_at' => new MongoDB\BSON\UTCDateTime(),
            'updated_at' => new MongoDB\BSON\UTCDateTime(),
            'published_at' => $articleData['status'] === 'published' ? 
                new MongoDB\BSON\UTCDateTime() : null
        ];
        
        $result = $collection->insertOne($articleDocument);
        
        return [
            'article_id' => (string)$result->getInsertedId(),
            'slug' => $slug
        ];
    }
    
    // 搜索文章
    public function searchArticles($query, $options = []) {
        $collection = $this->database->selectCollection('articles');
        
        $searchConditions = [
            '$text' => ['$search' => $query],
            'status' => 'published'
        ];
        
        $findOptions = [
            'projection' => [
                'title' => 1,
                'excerpt' => 1,
                'slug' => 1,
                'author_id' => 1,
                'tags' => 1,
                'published_at' => 1,
                'score' => ['$meta' => 'textScore']
            ],
            'sort' => ['score' => ['$meta' => 'textScore']],
            'limit' => $options['limit'] ?? 10
        ];
        
        return $collection->find($searchConditions, $findOptions)->toArray();
    }
    
    // 生成URL友好的slug
    private function generateSlug($title) {
        $slug = strtolower(trim($title));
        $slug = preg_replace('/[^a-z0-9\s-]/', '', $slug);
        $slug = preg_replace('/[\s-]+/', '-', $slug);
        $slug = trim($slug, '-');
        
        // 确保唯一性
        $collection = $this->database->selectCollection('articles');
        $originalSlug = $slug;
        $counter = 1;
        
        while ($collection->findOne(['slug' => $slug])) {
            $slug = $originalSlug . '-' . $counter;
            $counter++;
        }
        
        return $slug;
    }
}

// 使用示例
$cms = new ContentManagementSystem('cms_db');

// 创建文章
$articleResult = $cms->createArticle([
    'title' => 'MongoDB String Type Tutorial',
    'content' => '<p>This is a comprehensive tutorial about MongoDB string types. Strings are the most commonly used data type in MongoDB...</p>',
    'author_id' => 'author_001',
    'tags' => ['mongodb', 'string', 'tutorial', 'database'],
    'status' => 'published'
]);

print_r($articleResult);

// 搜索文章
$searchResults = $cms->searchArticles('mongodb string tutorial', ['limit' => 5]);
print_r($searchResults);
?>

场景3:配置参数存储

场景描述:存储应用程序的配置参数,支持动态更新和类型验证。

示例代码

php
<?php
// 场景说明:配置参数管理系统

class ConfigManager {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 设置配置项
    public function setConfig($key, $value, $type = 'string') {
        $collection = $this->database->selectCollection('configs');
        
        $configDocument = [
            'key' => $key,
            'value' => $value,
            'type' => $type,
            'updated_at' => new MongoDB\BSON\UTCDateTime()
        ];
        
        // 关键行注释:使用upsert更新或插入配置
        $collection->replaceOne(
            ['key' => $key],
            $configDocument,
            ['upsert' => true]
        );
        
        return true;
    }
    
    // 获取配置项
    public function getConfig($key, $default = null) {
        $collection = $this->database->selectCollection('configs');
        
        $config = $collection->findOne(['key' => $key]);
        
        if (!$config) {
            return $default;
        }
        
        // 根据类型转换值
        switch ($config['type']) {
            case 'integer':
                return (int)$config['value'];
            case 'boolean':
                return filter_var($config['value'], FILTER_VALIDATE_BOOLEAN);
            case 'array':
                return json_decode($config['value'], true);
            default:
                return $config['value'];
        }
    }
}

// 使用示例
$config = new ConfigManager('app_config');

$config->setConfig('site_name', 'My Application');
$config->setConfig('max_upload_size', '10485760', 'integer');
$config->setConfig('debug_mode', 'true', 'boolean');

echo "站点名称: " . $config->getConfig('site_name') . "\n";
echo "最大上传大小: " . $config->getConfig('max_upload_size') . "\n";
echo "调试模式: " . ($config->getConfig('debug_mode') ? '开启' : '关闭') . "\n";
?>

场景4:日志记录系统

场景描述:记录应用程序日志,支持按级别和时间查询。

示例代码

php
<?php
// 场景说明:日志记录系统

class LogSystem {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 记录日志
    public function log($level, $message, $context = []) {
        $collection = $this->database->selectCollection('logs');
        
        $logDocument = [
            'level' => $level,
            'message' => $message,
            'context' => $context,
            'created_at' => new MongoDB\BSON\UTCDateTime()
        ];
        
        $collection->insertOne($logDocument);
    }
    
    // 查询日志
    public function getLogs($filters = [], $limit = 100) {
        $collection = $this->database->selectCollection('logs');
        
        $query = [];
        if (isset($filters['level'])) {
            $query['level'] = $filters['level'];
        }
        if (isset($filters['start_date']) || isset($filters['end_date'])) {
            $query['created_at'] = [];
            if (isset($filters['start_date'])) {
                $query['created_at']['$gte'] = $filters['start_date'];
            }
            if (isset($filters['end_date'])) {
                $query['created_at']['$lte'] = $filters['end_date'];
            }
        }
        
        return $collection->find($query, [
            'sort' => ['created_at' => -1],
            'limit' => $limit
        ])->toArray();
    }
}

// 使用示例
$logger = new LogSystem('app_logs');

$logger->log('INFO', 'User logged in', ['user_id' => 'user_001']);
$logger->log('ERROR', 'Database connection failed', ['error' => 'Connection timeout']);

$logs = $logger->getLogs(['level' => 'ERROR'], 10);
print_r($logs);
?>

场景5:多语言内容管理

场景描述:存储多语言版本的内容,支持国际化。

示例代码

php
<?php
// 场景说明:多语言内容管理系统

class I18nContentManager {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 存储多语言内容
    public function setI18nContent($key, $translations) {
        $collection = $this->database->selectCollection('i18n_contents');
        
        $document = [
            'key' => $key,
            'translations' => $translations,
            'updated_at' => new MongoDB\BSON\UTCDateTime()
        ];
        
        $collection->replaceOne(
            ['key' => $key],
            $document,
            ['upsert' => true]
        );
    }
    
    // 获取指定语言的内容
    public function getI18nContent($key, $language = 'en') {
        $collection = $this->database->selectCollection('i18n_contents');
        
        $document = $collection->findOne(['key' => $key]);
        
        if (!$document || !isset($document['translations'][$language])) {
            return null;
        }
        
        return $document['translations'][$language];
    }
}

// 使用示例
$i18n = new I18nContentManager('i18n_db');

$i18n->setI18nContent('welcome_message', [
    'en' => 'Welcome to our application',
    'zh' => '欢迎使用我们的应用',
    'ja' => '私たちのアプリケーションへようこそ',
    'ko' => '우리 애플리케이션에 오신 것을 환영합니다'
]);

echo $i18n->getI18nContent('welcome_message', 'zh');
?>

企业级进阶应用场景

场景1:大规模文本搜索引擎

场景描述:构建支持百万级文档的高性能文本搜索系统。

使用方法

  • 使用复合文本索引
  • 实现分词和权重优化
  • 支持模糊搜索和同义词

示例代码

php
<?php
// 场景说明:企业级文本搜索引擎实现

class EnterpriseTextSearchEngine {
    private $database;
    private $collection;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
        $this->collection = $this->database->selectCollection('search_index');
        $this->setupIndexes();
    }
    
    private function setupIndexes() {
        // 关键行注释:创建优化的文本索引
        $this->collection->createIndex(
            [
                'title' => 'text',
                'content' => 'text',
                'keywords' => 'text',
                'tags' => 'text'
            ],
            [
                'weights' => [
                    'title' => 15,
                    'keywords' => 10,
                    'tags' => 8,
                    'content' => 1
                ],
                'name' => 'enterprise_text_idx',
                'default_language' => 'none'  // 支持多语言
            ]
        );
        
        // 创建其他辅助索引
        $this->collection->createIndex(['category' => 1, 'created_at' => -1]);
        $this->collection->createIndex(['author_id' => 1]);
    }
    
    // 高级搜索功能
    public function advancedSearch($query, $filters = [], $options = []) {
        $pipeline = [];
        
        // 文本搜索阶段
        if (!empty($query)) {
            $pipeline[] = [
                '$match' => [
                    '$text' => [
                        '$search' => $query,
                        '$caseSensitive' => $options['case_sensitive'] ?? false
                    ]
                ]
            ];
        }
        
        // 过滤条件
        if (!empty($filters)) {
            $pipeline[] = ['$match' => $filters];
        }
        
        // 添加相关性评分
        $pipeline[] = [
            '$addFields' => [
                'score' => ['$meta' => 'textScore']
            ]
        ];
        
        // 排序
        $sortField = $options['sort_by'] ?? 'score';
        $sortOrder = $options['sort_order'] ?? -1;
        $pipeline[] = ['$sort' => [$sortField => $sortOrder]];
        
        // 分页
        if (isset($options['skip'])) {
            $pipeline[] = ['$skip' => $options['skip']];
        }
        if (isset($options['limit'])) {
            $pipeline[] = ['$limit' => $options['limit']];
        }
        
        // 投影
        $pipeline[] = [
            '$project' => [
                'title' => 1,
                'excerpt' => ['$substr' => ['$content', 0, 200]],
                'score' => 1,
                'category' => 1,
                'created_at' => 1,
                'author_id' => 1
            ]
        ];
        
        return $this->collection->aggregate($pipeline)->toArray();
    }
    
    // 批量索引文档
    public function batchIndex($documents) {
        $batchSize = 1000;
        $batches = array_chunk($documents, $batchSize);
        
        foreach ($batches as $batch) {
            $this->collection->insertMany($batch);
        }
        
        return [
            'total_indexed' => count($documents),
            'batches' => count($batches)
        ];
    }
    
    // 搜索建议(自动补全)
    public function getSuggestions($prefix, $limit = 10) {
        $regex = new MongoDB\BSON\Regex('^' . preg_quote($prefix, '/'), 'i');
        
        return $this->collection->find(
            ['title' => $regex],
            [
                'projection' => ['title' => 1],
                'limit' => $limit,
                'sort' => ['_id' => -1]
            ]
        )->toArray();
    }
}

// 使用示例
$searchEngine = new EnterpriseTextSearchEngine('enterprise_search');

// 批量索引文档
$documents = [];
for ($i = 0; $i < 10000; $i++) {
    $documents[] = [
        'title' => "Document $i: MongoDB Tutorial",
        'content' => "This is the content of document $i about MongoDB features...",
        'keywords' => ['mongodb', 'database', 'nosql'],
        'tags' => ['tutorial', 'beginner'],
        'category' => 'Technical',
        'author_id' => 'author_' . ($i % 100),
        'created_at' => new MongoDB\BSON\UTCDateTime()
    ];
}

$indexResult = $searchEngine->batchIndex($documents);
print_r($indexResult);

// 执行搜索
$searchResults = $searchEngine->advancedSearch(
    'MongoDB tutorial',
    ['category' => 'Technical'],
    ['limit' => 20, 'sort_by' => 'score']
);
print_r($searchResults);
?>

场景2:实时数据分析与监控

场景描述:实时分析字符串数据的模式和趋势。

示例代码

php
<?php
// 场景说明:实时字符串数据分析系统

class StringAnalyticsSystem {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 分析字符串长度分布
    public function analyzeStringLengthDistribution($collectionName, $fieldName) {
        $collection = $this->database->selectCollection($collectionName);
        
        $pipeline = [
            [
                '$project' => [
                    'length' => ['$strLenBytes' => '$' . $fieldName]
                ]
            ],
            [
                '$bucket' => [
                    'groupBy' => '$length',
                    'boundaries' => [0, 50, 100, 500, 1000, 5000, 10000],
                    'default' => 'very_long',
                    'output' => [
                        'count' => ['$sum' => 1],
                        'avg_length' => ['$avg' => '$length']
                    ]
                ]
            ]
        ];
        
        return $collection->aggregate($pipeline)->toArray();
    }
    
    // 字符频率分析
    public function analyzeCharacterFrequency($text) {
        $frequency = [];
        $length = mb_strlen($text, 'UTF-8');
        
        for ($i = 0; $i < $length; $i++) {
            $char = mb_substr($text, $i, 1, 'UTF-8');
            if (!isset($frequency[$char])) {
                $frequency[$char] = 0;
            }
            $frequency[$char]++;
        }
        
        arsort($frequency);
        
        return [
            'total_chars' => $length,
            'unique_chars' => count($frequency),
            'top_chars' => array_slice($frequency, 0, 10, true)
        ];
    }
}

// 使用示例
$analytics = new StringAnalyticsSystem('analytics_db');

// 分析数据
$distribution = $analytics->analyzeStringLengthDistribution('articles', 'content');
print_r($distribution);

$charFreq = $analytics->analyzeCharacterFrequency('你好世界,Hello World!');
print_r($charFreq);
?>

行业最佳实践

实践1:字符串字段命名规范

实践内容

  • 使用小写字母和下划线
  • 使用描述性名称
  • 保持命名一致性

推荐理由

  • 提高代码可读性
  • 避免命名冲突
  • 便于维护和查询
php
<?php
// 好的命名示例
$goodDocument = [
    'user_name' => 'John Doe',        // 清晰描述
    'email_address' => 'john@example.com',
    'phone_number' => '+1234567890',
    'created_at' => new MongoDB\BSON\UTCDateTime()
];

// 不好的命名示例
$badDocument = [
    'un' => 'John Doe',               // 不清晰
    'emailAddress' => 'john@example.com',  // 混合命名风格
    'phone' => '+1234567890',
    'created' => new MongoDB\BSON\UTCDateTime()
];
?>

实践2:字符串验证和清洗

实践内容

  • 存储前验证字符串格式
  • 清理多余空格和特殊字符
  • 统一编码格式

推荐理由

  • 保证数据质量
  • 避免查询问题
  • 提高系统稳定性
php
<?php
class StringValidator {
    // 验证并清洗字符串
    public static function sanitize($string, $options = []) {
        // 去除首尾空格
        $string = trim($string);
        
        // 确保UTF-8编码
        if (!mb_check_encoding($string, 'UTF-8')) {
            $string = mb_convert_encoding($string, 'UTF-8', 'auto');
        }
        
        // 去除HTML标签(可选)
        if ($options['strip_tags'] ?? false) {
            $string = strip_tags($string);
        }
        
        // 转义特殊字符(可选)
        if ($options['escape_html'] ?? false) {
            $string = htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
        }
        
        // 长度限制
        if (isset($options['max_length'])) {
            $string = mb_substr($string, 0, $options['max_length'], 'UTF-8');
        }
        
        return $string;
    }
    
    // 验证邮箱格式
    public static function validateEmail($email) {
        $email = self::sanitize($email);
        return filter_var($email, FILTER_VALIDATE_EMAIL) !== false;
    }
    
    // 验证URL格式
    public static function validateUrl($url) {
        $url = self::sanitize($url);
        return filter_var($url, FILTER_VALIDATE_URL) !== false;
    }
}

// 使用示例
$email = StringValidator::sanitize('  JOHN@EXAMPLE.COM  ');
$isValid = StringValidator::validateEmail($email);
echo "邮箱验证结果: " . ($isValid ? '有效' : '无效') . "\n";
?>

实践3:索引优化策略

实践内容

  • 为常用查询字段创建索引
  • 使用复合索引优化多字段查询
  • 定期监控索引使用情况

推荐理由

  • 提高查询性能
  • 减少资源消耗
  • 优化用户体验
php
<?php
class IndexOptimizer {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 分析索引使用情况
    public function analyzeIndexUsage($collectionName) {
        $collection = $this->database->selectCollection($collectionName);
        
        // 获取索引统计
        $stats = $collection->aggregate([
            ['$indexStats' => new stdClass()]
        ])->toArray();
        
        return $stats;
    }
    
    // 推荐索引
    public function recommendIndexes($collectionName, $queryPatterns) {
        $recommendations = [];
        
        foreach ($queryPatterns as $pattern) {
            $fields = array_keys($pattern);
            $recommendations[] = [
                'fields' => $fields,
                'index_spec' => array_fill_keys($fields, 1),
                'reason' => '优化常见查询模式'
            ];
        }
        
        return $recommendations;
    }
}

// 使用示例
$optimizer = new IndexOptimizer('mydb');
$usage = $optimizer->analyzeIndexUsage('users');
print_r($usage);
?>

常见问题答疑(FAQ)

问题1:如何处理字符串的大小写敏感问题?

问题描述:MongoDB默认区分大小写,如何实现不区分大小写的查询?

回答内容

MongoDB默认区分大小写,可以通过以下几种方式处理:

方法1:使用正则表达式

php
<?php
// 场景说明:演示大小写不敏感的查询方法

$client = new MongoDB\Client("mongodb://localhost:27017");
$collection = $client->mydb->users;

// 方法1:使用正则表达式的i标志
$caseInsensitiveRegex = new MongoDB\BSON\Regex('^john doe$', 'i');
$results = $collection->find(['username' => $caseInsensitiveRegex])->toArray();

echo "使用正则表达式查询结果数量: " . count($results) . "\n";
?>

方法2:使用小写字段

php
<?php
// 方法2:存储小写版本用于查询
$collection->insertOne([
    'username' => 'John Doe',
    'username_lower' => 'john doe'  // 小写版本
]);

// 查询时使用小写字段
$results = $collection->find([
    'username_lower' => strtolower('John Doe')
])->toArray();

echo "使用小写字段查询结果数量: " . count($results) . "\n";
?>

方法3:使用Collation

php
<?php
// 方法3:使用Collation实现不区分大小写
$results = $collection->find(
    ['username' => 'john doe'],
    ['collation' => ['locale' => 'en', 'strength' => 1]]
)->toArray();

echo "使用Collation查询结果数量: " . count($results) . "\n";
?>

问题2:如何优化长字符串的存储?

问题描述:存储大量长字符串时如何优化性能和存储空间?

回答内容

对于长字符串,可以采用以下优化策略:

策略1:使用压缩

php
<?php
// 场景说明:演示长字符串的压缩存储

class LongStringStorage {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 压缩存储
    public function storeCompressed($key, $string) {
        $collection = $this->database->selectCollection('compressed_strings');
        
        $compressed = gzcompress($string, 9);
        
        $document = [
            'key' => $key,
            'data' => new MongoDB\BSON\Binary($compressed, MongoDB\BSON\Binary::TYPE_GENERIC),
            'compressed' => true,
            'original_size' => strlen($string),
            'compressed_size' => strlen($compressed),
            'compression_ratio' => round(strlen($compressed) / strlen($string) * 100, 2)
        ];
        
        $collection->insertOne($document);
        
        return [
            'original_size' => strlen($string),
            'compressed_size' => strlen($compressed),
            'saved_space' => strlen($string) - strlen($compressed)
        ];
    }
    
    // 解压读取
    public function retrieveCompressed($key) {
        $collection = $this->database->selectCollection('compressed_strings');
        
        $document = $collection->findOne(['key' => $key]);
        
        if ($document && $document['compressed']) {
            return gzuncompress($document['data']->getData());
        }
        
        return null;
    }
}

// 使用示例
$storage = new LongStringStorage('mydb');
$longText = str_repeat('This is a long text. ', 10000);

$stats = $storage->storeCompressed('example_text', $longText);
print_r($stats);

$retrieved = $storage->retrieveCompressed('example_text');
echo "解压后长度: " . strlen($retrieved) . "\n";
?>

策略2:使用GridFS

php
<?php
// 使用GridFS存储超大文本
$client = new MongoDB\Client("mongodb://localhost:27017");
$database = $client->selectDatabase('mydb');
$bucket = new MongoDB\GridFS\Bucket($database);

$largeContent = str_repeat('Large content. ', 1000000);

// 存储到GridFS
$stream = $bucket->openUploadStream('large_text.txt', [
    'metadata' => [
        'type' => 'text',
        'created_at' => new MongoDB\BSON\UTCDateTime()
    ]
]);

fwrite($stream, $largeContent);
fclose($stream);

echo "已使用GridFS存储大文本\n";
?>

问题3:如何实现高效的字符串搜索?

问题描述:如何在大数据集中快速搜索字符串?

回答内容

实现高效字符串搜索的关键是正确使用索引:

php
<?php
// 场景说明:演示高效字符串搜索的实现

class EfficientStringSearch {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 设置搜索索引
    public function setupSearchIndexes($collectionName) {
        $collection = $this->database->selectCollection($collectionName);
        
        // 文本索引
        $collection->createIndex(
            ['title' => 'text', 'content' => 'text'],
            ['weights' => ['title' => 10, 'content' => 1]]
        );
        
        // 常规索引
        $collection->createIndex(['title' => 1]);
        $collection->createIndex(['category' => 1, 'created_at' => -1]);
    }
    
    // 高效搜索方法
    public function search($collectionName, $query, $type = 'text') {
        $collection = $this->database->selectCollection($collectionName);
        
        switch ($type) {
            case 'text':
                // 全文搜索
                return $collection->find([
                    '$text' => ['$search' => $query]
                ], [
                    'projection' => ['score' => ['$meta' => 'textScore']],
                    'sort' => ['score' => ['$meta' => 'textScore']]
                ])->toArray();
                
            case 'prefix':
                // 前缀搜索(使用索引)
                return $collection->find([
                    'title' => new MongoDB\BSON\Regex('^' . preg_quote($query, '/'), 'i')
                ])->toArray();
                
            case 'exact':
                // 精确匹配(使用索引)
                return $collection->find([
                    'title' => $query
                ])->toArray();
                
            default:
                return [];
        }
    }
}

// 使用示例
$search = new EfficientStringSearch('mydb');
$search->setupSearchIndexes('articles');

// 执行不同类型的搜索
$textResults = $search->search('articles', 'mongodb tutorial', 'text');
$prefixResults = $search->search('articles', 'Mongo', 'prefix');
$exactResults = $search->search('articles', 'MongoDB Tutorial', 'exact');

echo "全文搜索结果: " . count($textResults) . "\n";
echo "前缀搜索结果: " . count($prefixResults) . "\n";
echo "精确搜索结果: " . count($exactResults) . "\n";
?>

问题4:如何处理多语言字符串?

问题描述:如何正确存储和查询多语言字符串?

回答内容

MongoDB原生支持UTF-8编码的多语言字符串:

php
<?php
// 场景说明:演示多语言字符串的处理

class MultilingualStringHandler {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 存储多语言内容
    public function storeMultilingualContent($key, $translations) {
        $collection = $this->database->selectCollection('multilingual_content');
        
        $document = [
            'key' => $key,
            'translations' => $translations,
            'languages' => array_keys($translations),
            'created_at' => new MongoDB\BSON\UTCDateTime()
        ];
        
        $collection->insertOne($document);
    }
    
    // 查询多语言内容
    public function getMultilingualContent($key, $language = 'en') {
        $collection = $this->database->selectCollection('multilingual_content');
        
        $document = $collection->findOne(['key' => $key]);
        
        if ($document && isset($document['translations'][$language])) {
            return $document['translations'][$language];
        }
        
        // 回退到默认语言
        if ($document && isset($document['translations']['en'])) {
            return $document['translations']['en'];
        }
        
        return null;
    }
    
    // 搜索多语言内容
    public function searchMultilingualContent($query, $language = null) {
        $collection = $this->database->selectCollection('multilingual_content');
        
        $searchConditions = [];
        
        if ($language) {
            // 在指定语言中搜索
            $searchConditions['translations.' . $language] = new MongoDB\BSON\Regex($query, 'i');
        } else {
            // 在所有语言中搜索
            foreach (['en', 'zh', 'ja', 'ko'] as $lang) {
                $searchConditions['$or'][] = ['translations.' . $lang => new MongoDB\BSON\Regex($query, 'i')];
            }
        }
        
        return $collection->find($searchConditions)->toArray();
    }
}

// 使用示例
$handler = new MultilingualStringHandler('mydb');

// 存储多语言内容
$handler->storeMultilingualContent('welcome', [
    'en' => 'Welcome',
    'zh' => '欢迎',
    'ja' => 'ようこそ',
    'ko' => '환영합니다',
    'ar' => 'مرحبا',
    'ru' => 'Добро пожаловать'
]);

// 获取指定语言的内容
echo $handler->getMultilingualContent('welcome', 'zh') . "\n";
echo $handler->getMultilingualContent('welcome', 'ja') . "\n";
?>

问题5:如何防止字符串注入攻击?

问题描述:如何防止通过字符串字段进行的注入攻击?

回答内容

防止字符串注入攻击的关键措施:

php
<?php
// 场景说明:演示字符串安全处理

class StringSecurityHandler {
    // 防止XSS攻击
    public static function preventXSS($string) {
        return htmlspecialchars($string, ENT_QUOTES | ENT_HTML5, 'UTF-8');
    }
    
    // 防止SQL注入(虽然MongoDB不是SQL,但也要注意)
    public static function sanitizeForQuery($string) {
        // MongoDB驱动会自动处理转义,但最好还是验证输入
        return trim($string);
    }
    
    // 验证字符串格式
    public static function validateStringFormat($string, $pattern) {
        return preg_match($pattern, $string) === 1;
    }
    
    // 安全的字符串存储
    public static function safeStore($collection, $data) {
        $sanitizedData = [];
        
        foreach ($data as $key => $value) {
            if (is_string($value)) {
                // 去除危险字符
                $sanitizedData[$key] = self::preventXSS(trim($value));
            } else {
                $sanitizedData[$key] = $value;
            }
        }
        
        return $collection->insertOne($sanitizedData);
    }
}

// 使用示例
$client = new MongoDB\Client("mongodb://localhost:27017");
$collection = $client->mydb->users;

// 不安全的输入
$unsafeInput = '<script>alert("XSS")</script>';

// 安全处理
$safeInput = StringSecurityHandler::preventXSS($unsafeInput);
echo "安全处理后的字符串: " . $safeInput . "\n";

// 安全存储
StringSecurityHandler::safeStore($collection, [
    'username' => 'john_doe',
    'bio' => $unsafeInput
]);
?>

问题6:如何处理字符串的版本控制和历史记录?

问题描述:如何追踪字符串字段的修改历史?

回答内容

实现字符串版本控制的方法:

php
<?php
// 场景说明:演示字符串版本控制

class StringVersionControl {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 更新字符串并记录历史
    public function updateWithHistory($collectionName, $documentId, $field, $newValue) {
        $collection = $this->database->selectCollection($collectionName);
        $historyCollection = $this->database->selectCollection($collectionName . '_history');
        
        // 获取当前值
        $document = $collection->findOne(['_id' => new MongoDB\BSON\ObjectId($documentId)]);
        
        if ($document && isset($document[$field])) {
            // 记录历史
            $historyCollection->insertOne([
                'document_id' => $documentId,
                'field' => $field,
                'old_value' => $document[$field],
                'new_value' => $newValue,
                'updated_at' => new MongoDB\BSON\UTCDateTime(),
                'version' => $document['version'] ?? 1
            ]);
        }
        
        // 更新文档
        $collection->updateOne(
            ['_id' => new MongoDB\BSON\ObjectId($documentId)],
            [
                '$set' => [
                    $field => $newValue,
                    'updated_at' => new MongoDB\BSON\UTCDateTime()
                ],
                '$inc' => ['version' => 1]
            ]
        );
    }
    
    // 获取字段的历史记录
    public function getFieldHistory($collectionName, $documentId, $field) {
        $historyCollection = $this->database->selectCollection($collectionName . '_history');
        
        return $historyCollection->find([
            'document_id' => $documentId,
            'field' => $field
        ], [
            'sort' => ['updated_at' => -1]
        ])->toArray();
    }
    
    // 回滚到指定版本
    public function rollbackToVersion($collectionName, $documentId, $field, $version) {
        $historyCollection = $this->database->selectCollection($collectionName . '_history');
        
        $history = $historyCollection->findOne([
            'document_id' => $documentId,
            'field' => $field,
            'version' => $version
        ]);
        
        if ($history) {
            $this->updateWithHistory($collectionName, $documentId, $field, $history['old_value']);
            return true;
        }
        
        return false;
    }
}

// 使用示例
$versionControl = new StringVersionControl('mydb');

// 创建文档
$collection = $versionControl->database->selectCollection('articles');
$result = $collection->insertOne([
    'title' => 'Original Title',
    'version' => 1
]);

$documentId = (string)$result->getInsertedId();

// 更新标题并记录历史
$versionControl->updateWithHistory('articles', $documentId, 'title', 'Updated Title');
$versionControl->updateWithHistory('articles', $documentId, 'title', 'Final Title');

// 查看历史
$history = $versionControl->getFieldHistory('articles', $documentId, 'title');
print_r($history);
?>

实战练习

练习1:基础练习 - 字符串存储和查询

解题思路

  1. 创建集合并插入不同类型的字符串
  2. 实现精确匹配查询
  3. 实现正则表达式查询
  4. 统计字符串长度分布

常见误区

  • 忘记验证字符串编码
  • 未创建索引导致查询慢
  • 正则表达式未锚定

分步提示

  1. 连接MongoDB数据库
  2. 创建测试集合并插入数据
  3. 创建必要的索引
  4. 执行各种查询并分析结果

参考代码

php
<?php
// 练习1:字符串基础操作

class StringBasicsExercise {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    public function run() {
        $collection = $this->database->selectCollection('string_exercise');
        $collection->drop();  // 清空集合
        
        // 步骤1:插入不同类型的字符串
        $documents = [
            ['text' => 'Hello World', 'type' => 'ascii'],
            ['text' => '你好世界', 'type' => 'chinese'],
            ['text' => 'こんにちは', 'type' => 'japanese'],
            ['text' => '안녕하세요', 'type' => 'korean'],
            ['text' => 'Hello 世界 🌍', 'type' => 'mixed']
        ];
        
        $collection->insertMany($documents);
        
        // 步骤2:创建索引
        $collection->createIndex(['text' => 1]);
        
        // 步骤3:精确匹配查询
        $exactMatch = $collection->findOne(['text' => 'Hello World']);
        echo "精确匹配结果: " . $exactMatch['text'] . "\n";
        
        // 步骤4:正则表达式查询
        $regex = new MongoDB\BSON\Regex('^Hello', 'i');
        $regexResults = $collection->find(['text' => $regex])->toArray();
        echo "正则匹配结果数量: " . count($regexResults) . "\n";
        
        // 步骤5:统计字符串长度
        $pipeline = [
            [
                '$project' => [
                    'text' => 1,
                    'type' => 1,
                    'char_length' => ['$strLenCP' => '$text'],
                    'byte_length' => ['$strLenBytes' => '$text']
                ]
            ]
        ];
        
        $stats = $collection->aggregate($pipeline)->toArray();
        print_r($stats);
    }
}

// 运行练习
$exercise = new StringBasicsExercise('exercise_db');
$exercise->run();
?>

练习2:进阶练习 - 构建全文搜索系统

解题思路

  1. 创建文本索引
  2. 实现多字段搜索
  3. 添加相关性评分
  4. 实现搜索结果高亮

常见误区

  • 文本索引权重设置不当
  • 未处理特殊字符
  • 搜索结果未排序

分步提示

  1. 设计文档结构
  2. 创建优化的文本索引
  3. 实现搜索功能
  4. 优化搜索性能

参考代码

php
<?php
// 练习2:全文搜索系统

class FullTextSearchExercise {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    public function run() {
        $collection = $this->database->selectCollection('search_exercise');
        $collection->drop();
        
        // 步骤1:插入测试数据
        $articles = [
            [
                'title' => 'MongoDB Tutorial for Beginners',
                'content' => 'Learn MongoDB basics and fundamentals in this comprehensive tutorial',
                'tags' => ['mongodb', 'tutorial', 'beginner']
            ],
            [
                'title' => 'Advanced MongoDB Techniques',
                'content' => 'Master advanced MongoDB features and optimization strategies',
                'tags' => ['mongodb', 'advanced', 'optimization']
            ],
            [
                'title' => 'PHP and MongoDB Integration',
                'content' => 'How to use MongoDB with PHP for web development',
                'tags' => ['php', 'mongodb', 'web']
            ]
        ];
        
        $collection->insertMany($articles);
        
        // 步骤2:创建文本索引
        $collection->createIndex(
            ['title' => 'text', 'content' => 'text', 'tags' => 'text'],
            ['weights' => ['title' => 10, 'content' => 5, 'tags' => 3]]
        );
        
        // 步骤3:执行搜索
        $searchQuery = 'mongodb tutorial';
        $results = $collection->find([
            '$text' => ['$search' => $searchQuery]
        ], [
            'projection' => [
                'title' => 1,
                'content' => 1,
                'score' => ['$meta' => 'textScore']
            ],
            'sort' => ['score' => ['$meta' => 'textScore']]
        ])->toArray();
        
        // 步骤4:显示结果
        foreach ($results as $result) {
            echo "标题: " . $result['title'] . "\n";
            echo "相关性评分: " . $result['score'] . "\n";
            echo "---\n";
        }
    }
}

// 运行练习
$exercise = new FullTextSearchExercise('exercise_db');
$exercise->run();
?>

练习3:挑战练习 - 实现字符串相似度匹配

解题思路

  1. 实现字符串相似度算法
  2. 使用聚合管道进行匹配
  3. 优化性能

常见误区

  • 算法选择不当
  • 性能优化不足
  • 未考虑编码问题

分步提示

  1. 实现Levenshtein距离算法
  2. 创建自定义聚合函数
  3. 批量处理字符串
  4. 优化查询性能

参考代码

php
<?php
// 练习3:字符串相似度匹配

class StringSimilarityExercise {
    private $database;
    
    public function __construct($databaseName) {
        $client = new MongoDB\Client("mongodb://localhost:27017");
        $this->database = $client->selectDatabase($databaseName);
    }
    
    // 计算Levenshtein距离
    private function levenshteinDistance($str1, $str2) {
        $len1 = mb_strlen($str1, 'UTF-8');
        $len2 = mb_strlen($str2, 'UTF-8');
        
        $matrix = [];
        
        for ($i = 0; $i <= $len1; $i++) {
            $matrix[$i][0] = $i;
        }
        
        for ($j = 0; $j <= $len2; $j++) {
            $matrix[0][$j] = $j;
        }
        
        for ($i = 1; $i <= $len1; $i++) {
            for ($j = 1; $j <= $len2; $j++) {
                $cost = (mb_substr($str1, $i - 1, 1, 'UTF-8') === mb_substr($str2, $j - 1, 1, 'UTF-8')) ? 0 : 1;
                $matrix[$i][$j] = min(
                    $matrix[$i - 1][$j] + 1,
                    $matrix[$i][$j - 1] + 1,
                    $matrix[$i - 1][$j - 1] + $cost
                );
            }
        }
        
        return $matrix[$len1][$len2];
    }
    
    // 查找相似字符串
    public function findSimilarStrings($targetString, $threshold = 3) {
        $collection = $this->database->selectCollection('similarity_exercise');
        
        // 插入测试数据
        $collection->insertMany([
            ['text' => 'hello world'],
            ['text' => 'hello word'],
            ['text' => 'helo world'],
            ['text' => 'hello worlds'],
            ['text' => 'goodbye world']
        ]);
        
        $allStrings = $collection->find()->toArray();
        $similar = [];
        
        foreach ($allStrings as $doc) {
            $distance = $this->levenshteinDistance($targetString, $doc['text']);
            
            if ($distance <= $threshold) {
                $similar[] = [
                    'text' => $doc['text'],
                    'distance' => $distance,
                    'similarity' => 1 - ($distance / max(strlen($targetString), strlen($doc['text'])))
                ];
            }
        }
        
        // 按相似度排序
        usort($similar, function($a, $b) {
            return $a['distance'] <=> $b['distance'];
        });
        
        return $similar;
    }
}

// 运行练习
$exercise = new StringSimilarityExercise('exercise_db');
$results = $exercise->findSimilarStrings('hello world', 2);
print_r($results);
?>

知识点总结

核心要点

  1. 字符串类型特性

    • UTF-8编码支持多语言
    • 动态长度,最大16MB
    • 支持多种索引类型
    • 丰富的查询能力
  2. 存储机制

    • BSON格式存储
    • 4字节长度前缀 + UTF-8字节 + 1字节终止符
    • 自动编码转换
  3. 索引优化

    • 常规索引用于精确匹配
    • 文本索引用于全文搜索
    • 锚定正则表达式可使用索引
  4. 查询优化

    • 使用索引提高性能
    • 避免未锚定正则表达式
    • 合理使用文本搜索

易错点回顾

  1. 编码问题

    • 忘记验证UTF-8编码
    • 混用不同编码
    • 未处理特殊字符
  2. 性能问题

    • 未创建必要索引
    • 使用低效的正则表达式
    • 存储过大的字符串
  3. 安全问题

    • 未防止XSS攻击
    • 未验证输入格式
    • 未转义特殊字符

拓展参考资料

官方文档链接

进阶学习路径建议

本知识点承接:《MongoDB基础概念》→《数据类型概述》

后续延伸至:《文本索引详解》→《全文搜索优化》→《多语言内容管理》

建议学习顺序

  1. MongoDB基础概念
  2. 数据类型概述
  3. String类型(本章节)
  4. 文本索引详解
  5. 全文搜索优化
  6. 多语言内容管理