Collation
On this page
3.4 版本中的新功能.
核对 (Collation) 允许用户来指定一个特定语言规则的字符串比较, 比如小写字母和口音标记的规则。
你可以为集合、视图或者索引指定核对,也可以给特定的支持核对操作的方法来指定核对。
文档结构
一个核对的文档结构有以下一些字段:
{
locale: <string>,
caseLevel: <boolean>,
caseFirst: <string>,
strength: <int>,
numericOrdering: <boolean>,
alternate: <string>,
maxVariable: <string>,
backwards: <boolean>
}
当指定了核对, locale
字段就是强制要带上的; 所有其他的字段都是可选的。关于这些字段的描述,参见 Collation Document。
默认的核对规则参数根据你指定的 locale
字段而不同。关于完整的默认核对参数列表以及它们相关的 locale
,参见 Collation Default Parameters.
Field | Type | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
locale |
string | The ICU locale. See Supported Languages and Locales for a list of supported locales. To specify simple binary comparison, specify locale value of "simple" . |
||||||||||||
strength |
integer | 可选。指定比较的等级。相当于 ICU 比较等级。可能的值:
|
||||||||||||
caseLevel |
boolean | 可选。 Flag that determines whether to include case comparison at strength level 1 or2 .Iftrue , include case comparison; i.e.When used withstrength:1 , collation compares base characters and case.When used withstrength:2 , collation compares base characters, diacritics (and possible other secondary differences) and case.Iffalse , do not include case comparison at level 1 or2 . The default isfalse .For more information, seeICU Collation: Case Level. |
||||||||||||
caseFirst |
string | 可选。 A flag that determines sort order of case differences during tertiary level comparisons.Possible values are:
|
||||||||||||
numericOrdering |
boolean | 可选。 Flag that determines whether to compare numeric strings as numbers or as strings.Iftrue , compare as numbers; i.e."10" is greater than"2" .Iffalse , compare as strings; i.e."10" is less than"2" .Default isfalse . |
||||||||||||
alternate |
string | 可选。 Field that determines whether collation should consider whitespace and punctuation as base characters for purposes of comparison.Possible values are:ValueDescription"non-ignorable" Whitespace and punctuation are considered base characters."shifted" Whitespace and punctuation are not considered base characters and are only distinguished at strength levels greater than 3.SeeICU Collation: Comparison Levelsfor more information.Default is"non-ignorable" . |
||||||||||||
maxVariable |
string | 可选。 Field that determines up to which characters are considered ignorable whenalternate:"shifted" . Has no effect ifalternate:"non-ignorable" Possible values are:ValueDescription"punct" Both whitespaces and punctuation are “ignorable”, i.e. not considered base characters."space" Whitespace are “ignorable”, i.e. not considered base characters. |
||||||||||||
backwards |
boolean | 可选。 Flag that determines whether strings with diacritics sort from back of the string, such as with some French dictionary ordering.Iftrue , compare from back to front.Iffalse , compare from front to back.The default value isfalse . |
||||||||||||
normalization |
boolean | 可选。 Flag that determines whether to check if text require normalization and to perform normalization. Generally, majority of text does not require this normalization processing.Iftrue , check if fully normalized and perform normaliztion to compare text.Iffalse , does not check.The default value isfalse .Seehttp://userguide.icu-project.org/collation/concepts#TOC-Normalizationfor details. |
支持核对的操作
You can specify collation for the following operations:
NOTE
You cannot specify multiple collations for an operation. For example, you cannot specify different collations per field, or if performing a find with a sort, you cannot use one collation for the find and another for the sort.
表现
Local Variants
Some collation locales have variants, which employ special language-specific rules. To specify a locale variant, use the following syntax:
{ "locale" : "<locale code>@collation=<variant>" }
For example, to use the pinyin
variant of the Chinese collation:
{ "locale" : "zh@collation=pinyin" }
For a complete list of all collation locales and their variants, seeCollation Locales.
Collation and Views
- You can specify a default collation for a view at creation time. If no collation is specified, the view’s default collation is the “simple” binary comparison collator. That is, the view does not inherit the collection’s default collation.
- String comparisons on the view use the view’s default collation. An operation that attempts to change or override a view’s default collation will fail with an error.
- If creating a view from another view, you cannot specify a collation that differs from the source view’s collation.
- If performing an aggregation that involves multiple views, such as with
$lookup
or$graphLookup
, the views must have the same collation .
Collation and Index Use
To use an index for string comparisons, an operation must also specify the same collation. That is, an index with a collation cannot support an operation that performs string comparisons on the indexed fields if the operation specifies a different collation.
For example, the collection myColl
has an index on a string field category
with the collation locale"fr"
.
db.myColl.createIndex( { category: 1 }, { collation: { locale: "fr" } } )
The following query operation, which specifies the same collation as the index, can use the index:
db.myColl.find( { category: "cafe" } ).collation( { locale: "fr" } )
However, the following query operation, which by default uses the “simple” binary collator, cannot use the index:
db.myColl.find( { category: "cafe" } )
For a compound index where the index prefix keys are not strings, arrays, and embedded documents, an operation that specifies a different collation can still use the index to support comparisons on the index prefix keys.
For example, the collection myColl
has a compound index on the numeric fields score
and price
and the string fieldcategory
; the index is created with the collation locale"fr"
for string comparisons:
db.myColl.createIndex(
{ score: 1, price: 1, category: 1 },
{ collation: { locale: "fr" } } )
The following operations, which use"simple"
binary collation for string comparisons, can use the index:
db.myColl.find( { score: 5 } ).sort( { price: 1 } )
db.myColl.find( { score: 5, price: { $gt: NumberDecimal( "10" ) } } ).sort( { price: 1 } )
The following operation, which uses"simple"
binary collation for string comparisons on the indexed category
field, can use the index to fulfill only thescore:5
portion of the query:
db.myColl.find( { score: 5, category: "cafe" } )