Collation
On this page
3.4 版本中的新功能.
核对 (Collation) 允许用户来指定一个特定语言规则的字符串比较, 比如小写字母和口音标记的规则。
你可以为集合、视图或者索引指定核对,也可以给特定的支持核对操作的方法来指定核对。
文档结构
一个核对的文档结构有以下一些字段:
{
locale: <string>,
caseLevel: <boolean>,
caseFirst: <string>,
strength: <int>,
numericOrdering: <boolean>,
alternate: <string>,
maxVariable: <string>,
backwards: <boolean>
}
当指定了核对, locale 字段就是强制要带上的; 所有其他的字段都是可选的。关于这些字段的描述,参见 Collation Document。
默认的核对规则参数根据你指定的 locale 字段而不同。关于完整的默认核对参数列表以及它们相关的 locale,参见 Collation Default Parameters.
| Field | Type | Description | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
locale |
string | The ICU locale. See Supported Languages and Locales for a list of supported locales. To specify simple binary comparison, specify locale value of "simple". |
||||||||||||
strength |
integer | 可选。指定比较的等级。相当于 ICU 比较等级。可能的值:
|
||||||||||||
caseLevel |
boolean | 可选。 Flag that determines whether to include case comparison at strength level 1 or2.Iftrue, include case comparison; i.e.When used withstrength:1, collation compares base characters and case.When used withstrength:2, collation compares base characters, diacritics (and possible other secondary differences) and case.Iffalse, do not include case comparison at level 1 or2. The default isfalse.For more information, seeICU Collation: Case Level. |
||||||||||||
caseFirst |
string | 可选。 A flag that determines sort order of case differences during tertiary level comparisons.Possible values are:
|
||||||||||||
numericOrdering |
boolean | 可选。 Flag that determines whether to compare numeric strings as numbers or as strings.Iftrue, compare as numbers; i.e."10"is greater than"2".Iffalse, compare as strings; i.e."10"is less than"2".Default isfalse. |
||||||||||||
alternate |
string | 可选。 Field that determines whether collation should consider whitespace and punctuation as base characters for purposes of comparison.Possible values are:ValueDescription"non-ignorable"Whitespace and punctuation are considered base characters."shifted"Whitespace and punctuation are not considered base characters and are only distinguished at strength levels greater than 3.SeeICU Collation: Comparison Levelsfor more information.Default is"non-ignorable". |
||||||||||||
maxVariable |
string | 可选。 Field that determines up to which characters are considered ignorable whenalternate:"shifted". Has no effect ifalternate:"non-ignorable"Possible values are:ValueDescription"punct"Both whitespaces and punctuation are “ignorable”, i.e. not considered base characters."space"Whitespace are “ignorable”, i.e. not considered base characters. |
||||||||||||
backwards |
boolean | 可选。 Flag that determines whether strings with diacritics sort from back of the string, such as with some French dictionary ordering.Iftrue, compare from back to front.Iffalse, compare from front to back.The default value isfalse. |
||||||||||||
normalization |
boolean | 可选。 Flag that determines whether to check if text require normalization and to perform normalization. Generally, majority of text does not require this normalization processing.Iftrue, check if fully normalized and perform normaliztion to compare text.Iffalse, does not check.The default value isfalse.Seehttp://userguide.icu-project.org/collation/concepts#TOC-Normalizationfor details. |
支持核对的操作
You can specify collation for the following operations:
NOTE
You cannot specify multiple collations for an operation. For example, you cannot specify different collations per field, or if performing a find with a sort, you cannot use one collation for the find and another for the sort.
表现
Local Variants
Some collation locales have variants, which employ special language-specific rules. To specify a locale variant, use the following syntax:
{ "locale" : "<locale code>@collation=<variant>" }
For example, to use the pinyin variant of the Chinese collation:
{ "locale" : "zh@collation=pinyin" }
For a complete list of all collation locales and their variants, seeCollation Locales.
Collation and Views
- You can specify a default collation for a view at creation time. If no collation is specified, the view’s default collation is the “simple” binary comparison collator. That is, the view does not inherit the collection’s default collation.
- String comparisons on the view use the view’s default collation. An operation that attempts to change or override a view’s default collation will fail with an error.
- If creating a view from another view, you cannot specify a collation that differs from the source view’s collation.
- If performing an aggregation that involves multiple views, such as with
$lookupor$graphLookup, the views must have the same collation .
Collation and Index Use
To use an index for string comparisons, an operation must also specify the same collation. That is, an index with a collation cannot support an operation that performs string comparisons on the indexed fields if the operation specifies a different collation.
For example, the collection myColl has an index on a string field category with the collation locale"fr".
db.myColl.createIndex( { category: 1 }, { collation: { locale: "fr" } } )
The following query operation, which specifies the same collation as the index, can use the index:
db.myColl.find( { category: "cafe" } ).collation( { locale: "fr" } )
However, the following query operation, which by default uses the “simple” binary collator, cannot use the index:
db.myColl.find( { category: "cafe" } )
For a compound index where the index prefix keys are not strings, arrays, and embedded documents, an operation that specifies a different collation can still use the index to support comparisons on the index prefix keys.
For example, the collection myColl has a compound index on the numeric fields score and price and the string fieldcategory; the index is created with the collation locale"fr"for string comparisons:
db.myColl.createIndex(
{ score: 1, price: 1, category: 1 },
{ collation: { locale: "fr" } } )
The following operations, which use"simple"binary collation for string comparisons, can use the index:
db.myColl.find( { score: 5 } ).sort( { price: 1 } )
db.myColl.find( { score: 5, price: { $gt: NumberDecimal( "10" ) } } ).sort( { price: 1 } )
The following operation, which uses"simple"binary collation for string comparisons on the indexed category field, can use the index to fulfill only thescore:5portion of the query:
db.myColl.find( { score: 5, category: "cafe" } )