| rfc9841v1.txt | rfc9841.txt | |||
|---|---|---|---|---|
| Internet Engineering Task Force (IETF) J. Alakuijala | Internet Engineering Task Force (IETF) J. Alakuijala | |||
| Request for Comments: 9841 T. Duong | Request for Comments: 9841 T. Duong | |||
| Updates: 7932 E. Kliuchnikov | Updates: 7932 E. Kliuchnikov | |||
| Category: Informational Z. Szabadka | Category: Informational Z. Szabadka | |||
| ISSN: 2070-1721 L. Vandevenne, Ed. | ISSN: 2070-1721 L. Vandevenne, Ed. | |||
| Google, Inc | Google, Inc. | |||
| August 2025 | September 2025 | |||
| Shared Brotli Compressed Data Format | Shared Brotli Compressed Data Format | |||
| Abstract | Abstract | |||
| This specification defines a data format for shared brotli | This specification defines a data format for shared brotli | |||
| compression, which adds support for shared dictionaries, large | compression, which adds support for shared dictionaries, large | |||
| window, and a container format to brotli (RFC 7932). Shared | window, and a container format to brotli (RFC 7932). Shared | |||
| dictionaries and large window support allow significant compression | dictionaries and large window support allow significant compression | |||
| gains compared to regular brotli. This document updates RFC 7932. | gains compared to regular brotli. This document specifies an | |||
| extension to the method defined in RFC 7932. | ||||
| Status of This Memo | Status of This Memo | |||
| This document is not an Internet Standards Track specification; it is | This document is not an Internet Standards Track specification; it is | |||
| published for informational purposes. | published for informational purposes. | |||
| This document is a product of the Internet Engineering Task Force | This document is a product of the Internet Engineering Task Force | |||
| (IETF). It represents the consensus of the IETF community. It has | (IETF). It represents the consensus of the IETF community. It has | |||
| received public review and has been approved for publication by the | received public review and has been approved for publication by the | |||
| Internet Engineering Steering Group (IESG). Not all documents | Internet Engineering Steering Group (IESG). Not all documents | |||
| skipping to change at line 163 ¶ | skipping to change at line 164 ¶ | |||
| For this specification, a byte is exactly 8 bits, even on machines | For this specification, a byte is exactly 8 bits, even on machines | |||
| that store a character on a number of bits different from eight. | that store a character on a number of bits different from eight. | |||
| See below for the numbering of bits within a byte. | See below for the numbering of bits within a byte. | |||
| String: A sequence of arbitrary bytes. | String: A sequence of arbitrary bytes. | |||
| Bytes stored within a computer do not have a "bit order" since they | Bytes stored within a computer do not have a "bit order" since they | |||
| are always treated as a unit. However, a byte considered as an | are always treated as a unit. However, a byte considered as an | |||
| integer between 0 and 255 does have a most significant bit (MSB) and | integer between 0 and 255 does have a most significant bit (MSB) and | |||
| least significant bit (LSB), and since we write numbers with the most | least significant bit (LSB), and since we write numbers with the most | |||
| significant digit on the left, bytes with the MSB are also written on | significant digit on the left, we also write bytes with the MSB on | |||
| the left. In the diagrams below, the bits of a byte are written so | the left. In the diagrams below, the bits of a byte are written so | |||
| that bit 0 is the LSB, i.e., the bits are numbered as follows: | that bit 0 is the LSB, i.e., the bits are numbered as follows: | |||
| +--------+ | +--------+ | |||
| |76543210| | |76543210| | |||
| +--------+ | +--------+ | |||
| Within a computer, a number may occupy multiple bytes. All multi- | Within a computer, a number may occupy multiple bytes. All multi- | |||
| byte numbers in the format described here are unsigned and stored | byte numbers in the format described here are unsigned and stored | |||
| with the least significant byte first (at the lower memory address). | with the least significant byte first (at the lower memory address). | |||
| skipping to change at line 264 ¶ | skipping to change at line 265 ¶ | |||
| original dictionary in the custom dictionary. | original dictionary in the custom dictionary. | |||
| If no shared dictionary is set, the decoder behaves the same as in | If no shared dictionary is set, the decoder behaves the same as in | |||
| [RFC7932] on a brotli stream. | [RFC7932] on a brotli stream. | |||
| If a shared dictionary is set, then it can set LZ77 dictionaries, | If a shared dictionary is set, then it can set LZ77 dictionaries, | |||
| override static dictionary words, and/or override transforms. | override static dictionary words, and/or override transforms. | |||
| 3.1. Custom Static Dictionaries | 3.1. Custom Static Dictionaries | |||
| If a custom word list is set, then the following behavior of the RFC | If a custom word list is set, then the following behaviors of the | |||
| 7932 decoder [RFC7932] is overridden: | decoder defined in [RFC7932] are overridden: | |||
| Instead of the Static Dictionary Data from Appendix A of | Instead of the Static Dictionary Data from Appendix A of | |||
| [RFC7932], one or more word lists from the custom static | [RFC7932], one or more word lists from the custom static | |||
| dictionary data are used. | dictionary data are used. | |||
| Instead of NDBITS at the end of Appendix A of [RFC7932], a custom | Instead of NDBITS at the end of Appendix A of [RFC7932], a custom | |||
| SIZE_BITS_BY_LENGTH per custom word list is used. | SIZE_BITS_BY_LENGTH per custom word list is used. | |||
| The copy length for a static dictionary reference must be between | The copy length for a static dictionary reference must be between | |||
| 4 and 31 and may not be a value for which SIZE_BITS_BY_LENGTH of | 4 and 31 and may not be a value for which SIZE_BITS_BY_LENGTH of | |||
| this dictionary is 0. | this dictionary is 0. | |||
| If a custom transforms list is set without context dependency, then | If a custom transforms list is set without context dependency, then | |||
| the following behavior of the RFC 7932 decoder [RFC7932] is | the following behaviors of the decoder defined in [RFC7932] are | |||
| overridden: | overridden: | |||
| The "List of Word Transformations" from Appendix B of [RFC7932] is | The "List of Word Transformations" from Appendix B of [RFC7932] is | |||
| overridden by one or more lists of custom prefixes, suffixes, and | overridden by one or more lists of custom prefixes, suffixes, and | |||
| transform operations. | transform operations. | |||
| The transform_id must be smaller than the number of transforms | The transform_id must be smaller than the number of transforms | |||
| given in the custom transforms list. | given in the custom transforms list. | |||
| If the dictionary is context dependent, it includes a lookup table of | If the dictionary is context dependent, it includes a lookup table of | |||
| a 64-word list and transform list combinations. When resolving a | 64 word list and transform list combinations. When resolving a | |||
| static dictionary word, the decoder computes the literal Context ID | static dictionary word, the decoder computes the literal Context ID | |||
| as described in Section 7.1 of [RFC7932]. The literal Context ID is | as described in Section 7.1 of [RFC7932]. The literal Context ID is | |||
| used as the index in the lookup tables to select the word list and | used as the index in the lookup tables to select the word list and | |||
| transforms to use. If the dictionary is not context dependent, this | transforms to use. If the dictionary is not context dependent, this | |||
| ID is implicitly 0 instead. | ID is implicitly 0 instead. | |||
| If a distance goes beyond the dictionary for the current ID and | If a distance goes beyond the dictionary for the current ID and | |||
| multiple word/transform list combinations are defined, then a next | multiple word/transform list combinations are defined, then the next | |||
| dictionary is used in the following order: if not context dependent, | dictionary is used in the following order: | |||
| the same order as defined in the shared dictionary. If context | ||||
| dependent, the index matching the current context is used first, the | * If context dependent: | |||
| same order as defined in the shared dictionary excluding the current | ||||
| context are used next. | - use the index matching the current context first, and then | |||
| - use the same order as defined in the shared dictionary | ||||
| (excluding the current context) next. | ||||
| * If not context dependent: | ||||
| - use the same order as defined in the shared dictionary. | ||||
| 3.1.1. Transform Operations | 3.1.1. Transform Operations | |||
| A shared dictionary may include custom word transformations to | A shared dictionary may include custom word transformations to | |||
| replace those specified in Section 8 and Appendix B of [RFC7932]. A | replace those specified in Section 8 and Appendix B of [RFC7932]. A | |||
| transform consists of a possible prefix, a transform operation, for | transform consists of a possible prefix, a transform operation, a | |||
| some operations a parameter, and a possible suffix. In the shared | parameter (for some operations), and a possible suffix. In the | |||
| dictionary format, the transform operation is represented by a | shared dictionary format, the transform operation is represented by a | |||
| numerical ID, which is listed in the table below. | numerical ID, which is listed in the table below. | |||
| +====+===========================+ | +====+===========================+ | |||
| | ID | Operation | | | ID | Operation | | |||
| +====+===========================+ | +====+===========================+ | |||
| | 0 | Identity | | | 0 | Identity | | |||
| +----+---------------------------+ | +----+---------------------------+ | |||
| | 1 | OmitLast1 | | | 1 | OmitLast1 | | |||
| +----+---------------------------+ | +----+---------------------------+ | |||
| | 2 | OmitLast2 | | | 2 | OmitLast2 | | |||
| skipping to change at line 464 ¶ | skipping to change at line 472 ¶ | |||
| 4. Varint Encoding | 4. Varint Encoding | |||
| A varint is encoded in base 128 in one or more bytes as follows: | A varint is encoded in base 128 in one or more bytes as follows: | |||
| +--------+--------+ +--------+ | +--------+--------+ +--------+ | |||
| |1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx| | |1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx| | |||
| +--------+--------+ +--------+ | +--------+--------+ +--------+ | |||
| where the "x" bits of the first byte are the LSBs of the value and | where the "x" bits of the first byte are the LSBs of the value and | |||
| the "x" bits of the last byte are the MSBs of the value. The last | the "x" bits of the last byte are the MSBs of the value. The last | |||
| byte must have its MSB set to 0, all other bytes to 1 to indicate | byte must have its MSB set to 0 and all other bytes must have their | |||
| there is a next byte. | MSBs set to 1 to indicate there is a next byte. | |||
| The maximum allowed amount of bits to read is 63 bits; if the 9th | The maximum allowed amount of bits to read is 63 bits; if the 9th | |||
| byte is present and has its MSB set, then the stream must be | byte is present and has its MSB set, then the stream must be | |||
| considered as invalid. | considered as invalid. | |||
| 5. Shared Dictionary Stream | 5. Shared Dictionary Stream | |||
| The shared dictionary stream encodes a custom dictionary for brotli, | The shared dictionary stream encodes a custom dictionary for brotli, | |||
| including custom words and/or custom transformations. A shared | including custom words and/or custom transformations. A shared | |||
| dictionary may appear as a standalone or as contents of a resource in | dictionary may appear as a standalone or as contents of a resource in | |||
| a framing format container. | a framing format container. | |||
| A compliant shared brotli dictionary stream must have the following | A compliant shared brotli dictionary stream must have the following | |||
| format: | format: | |||
| 2 bytes: File signature, in hexadecimal the bytes 91, 0. | 2 bytes: File signature in hexadecimal format (bytes 91 and 0). | |||
| varint: LZ77_DICTIONARY_LENGTH. The number of bytes for an LZ7711 | varint: LZ77_DICTIONARY_LENGTH. The number of bytes for an LZ77 | |||
| dictionary or 0 if there is none. The maximum allowed value is | dictionary, or 0 if there is none. The maximum allowed value is | |||
| the maximum possible sliding window size of brotli or large window | the maximum possible sliding window size of brotli or large window | |||
| brotli. | brotli. | |||
| LZ77_DICTIONARY_LENGTH bytes: Contents of the LZ77 dictionary. | LZ77_DICTIONARY_LENGTH bytes: Contents of the LZ77 dictionary. | |||
| 1 byte: NUM_CUSTOM_WORD_LISTS. May have a value of 0 to 64. | 1 byte: NUM_CUSTOM_WORD_LISTS. May have a value in range 0 to 64. | |||
| NUM_CUSTOM_WORD_LISTS times a word list with the following format | NUM_CUSTOM_WORD_LISTS times a word list with the following format | |||
| for each word list: | for each word list: | |||
| 28 bytes: SIZE_BITS_BY_LENGTH. An array of 28 unsigned 8-bit | 28 bytes: SIZE_BITS_BY_LENGTH. An array of 28 unsigned 8-bit | |||
| integers, indexed by word lengths 4 to 31. The value | integers, indexed by word lengths 4 to 31. The value | |||
| represents log2(number of words of this length), with the | represents log2(number of words of this length), with the | |||
| exception of 0 meaning 0 words of this length. The max allowed | exception of 0 meaning 0 words of this length. The max allowed | |||
| length value is 15 bits. OFFSETS_BY_LENGTH is computed from | length value is 15 bits. OFFSETS_BY_LENGTH is computed from | |||
| this as OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] + | this as OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] + | |||
| (SIZE_BITS_BY_LENGTH[i] ? (i << SIZE_BITS_BY_LENGTH[i]) : 0). | (SIZE_BITS_BY_LENGTH[i] ? (i << SIZE_BITS_BY_LENGTH[i]) : 0). | |||
| N bytes: Words dictionary data, where N is OFFSETS_BY_LENGTH[31] | N bytes: Words dictionary data, where N is OFFSETS_BY_LENGTH[31] | |||
| + (SIZE_BITS_BY_LENGTH[31] ? (31 << SIZE_BITS_BY_LENGTH[31]) : | + (SIZE_BITS_BY_LENGTH[31] ? (31 << SIZE_BITS_BY_LENGTH[31]) : | |||
| 0), with all the words of shortest length first, then all words | 0), with all the words of shortest length first, then all words | |||
| of the next length, and so on, where there are either 0 or a | of the next length, and so on, where there are either 0 or a | |||
| positive power of two number of words for each length. | positive power of two number of words for each length. | |||
| 1 byte: NUM_CUSTOM_TRANSFORM_LISTS. May have a value of 0 to 64. | 1 byte: NUM_CUSTOM_TRANSFORM_LISTS. May have a value in range 0 to | |||
| 64. | ||||
| NUM_CUSTOM_TRANSFORM_LISTS times a transform list with the | NUM_CUSTOM_TRANSFORM_LISTS times a transform list with the | |||
| following format for each transform list: | following format for each transform list: | |||
| 2 bytes: PREFIX_SUFFIX_LENGTH. The length of prefix/suffix data. | 2 bytes: PREFIX_SUFFIX_LENGTH. The length of prefix/suffix data. | |||
| Must be at least 1 because the list must always end with a | Must be at least 1 because the list must always end with a | |||
| zero-length stringlet even if it is empty. | zero-length stringlet even if it is empty. | |||
| NUM_PREFIX_SUFFIX times: Prefix/suffix stringlet. | NUM_PREFIX_SUFFIX times: Prefix/suffix stringlet. | |||
| NUM_PREFIX_SUFFIX is the number of stringlets parsed and must | NUM_PREFIX_SUFFIX is the number of stringlets parsed and must | |||
| skipping to change at line 533 ¶ | skipping to change at line 542 ¶ | |||
| for the last (terminating) entry of the transform list. For | for the last (terminating) entry of the transform list. For | |||
| other entries, STRING_LENGTH must be in range 1..255. The 0 | other entries, STRING_LENGTH must be in range 1..255. The 0 | |||
| entry must be present and must be the last byte of the | entry must be present and must be the last byte of the | |||
| PREFIX_SUFFIX_LENGTH bytes of prefix/suffix data, else the | PREFIX_SUFFIX_LENGTH bytes of prefix/suffix data, else the | |||
| stream must be rejected as invalid. | stream must be rejected as invalid. | |||
| STRING_LENGTH bytes: Contents of the prefix/suffix. | STRING_LENGTH bytes: Contents of the prefix/suffix. | |||
| 1 byte: NTRANSFORMS. Number of transformation triplets. | 1 byte: NTRANSFORMS. Number of transformation triplets. | |||
| NTRANSFORMS times: Data for each transform: | NTRANSFORMS times the data for each transform listed below: | |||
| 1 byte: Index of prefix in prefix/suffix data; must be less | 1 byte: Index of prefix in prefix/suffix data; must be less | |||
| than NUM_PREFIX_SUFFIX. | than NUM_PREFIX_SUFFIX. | |||
| 1 byte: Index of suffix in prefix/suffix data; must be less | 1 byte: Index of suffix in prefix/suffix data; must be less | |||
| than NUM_PREFIX_SUFFIX. | than NUM_PREFIX_SUFFIX. | |||
| 1 byte: Operation index; must be an index in the table of | 1 byte: Operation index; must be an index in the table of | |||
| operations listed in Section 3.1.1. | operations listed in Section 3.1.1. | |||
| If and only if at least one transform has operation index | If and only if at least one transform has operation index | |||
| ShiftFirst or ShiftAll: | ShiftFirst or ShiftAll, then NTRANSFORMS times the following: | |||
| NTRANSFORMS times: | ||||
| 2 bytes: Parameters for the transform. If the transform | 2 bytes: Parameters for the transform. If the transform does | |||
| does not have type ShiftFirst or ShiftAll, the value must | not have type ShiftFirst or ShiftAll, the value must be 0. | |||
| be 0. ShiftFirst and ShiftAll interpret these bytes as | ShiftFirst and ShiftAll interpret these bytes as an unsigned | |||
| an unsigned 16-bit integer. | 16-bit integer. | |||
| If NUM_CUSTOM_WORD_LISTS > 0 or NUM_CUSTOM_TRANSFORM_LISTS > 0 | If NUM_CUSTOM_WORD_LISTS > 0 or NUM_CUSTOM_TRANSFORM_LISTS > 0 | |||
| (else implicitly NUM_DICTIONARIES is 1 and points to the brotli | (else implicitly NUM_DICTIONARIES is 1 and points to the brotli | |||
| built-in and there is no context map): | built-in and there is no context map): | |||
| 1 byte: NUM_DICTIONARIES. May have value 1 to 64. Each | 1 byte: NUM_DICTIONARIES. May have a value in range 1 to 64. | |||
| dictionary is a combination of a word list and a transform | Each dictionary is a combination of a word list and a transform | |||
| list. Each next dictionary is used when the distance goes | list. Each next dictionary is used when the distance goes | |||
| beyond the previous. If a CONTEXT_MAP is enabled, then the | beyond the previous. If a CONTEXT_MAP is enabled, then the | |||
| dictionary matching the context is moved to the front in the | dictionary matching the context is moved to the front in the | |||
| order for this context. | order for this context. | |||
| NUM_DICTIONARIES times: The DICTIONARY_MAP: | NUM_DICTIONARIES times the DICTIONARY_MAP, which contains: | |||
| 1 byte: Index into a custom word list or value | 1 byte: Index into a custom word list or value | |||
| NUM_CUSTOM_WORD_LISTS to indicate using the brotli [RFC7932] | NUM_CUSTOM_WORD_LISTS to indicate using the brotli [RFC7932] | |||
| built-in default word list. | built-in default word list. | |||
| 1 byte: Index into a custom transform list or value | 1 byte: Index into a custom transform list or value | |||
| NUM_CUSTOM_TRANSFORM_LISTS to indicate using the brotli | NUM_CUSTOM_TRANSFORM_LISTS to indicate using the brotli | |||
| [RFC7932] built-in default transform list. | [RFC7932] built-in default transform list. | |||
| 1 byte: CONTEXT_ENABLED. If 0, there is no context map. If 1, a | 1 byte: CONTEXT_ENABLED. If 0, there is no context map. If 1, a | |||
| skipping to change at line 592 ¶ | skipping to change at line 599 ¶ | |||
| first dictionary to use for this context. | first dictionary to use for this context. | |||
| 6. Large Window Brotli Compressed Data Stream | 6. Large Window Brotli Compressed Data Stream | |||
| Large window brotli allows a sliding window beyond the 24-bit maximum | Large window brotli allows a sliding window beyond the 24-bit maximum | |||
| of regular brotli [RFC7932]. | of regular brotli [RFC7932]. | |||
| The compressed data stream is backwards compatible to brotli | The compressed data stream is backwards compatible to brotli | |||
| [RFC7932] and may optionally have the following differences: | [RFC7932] and may optionally have the following differences: | |||
| Encoding of WBITS in the stream header: The following new pattern of | In the encoding of WBITS in the stream header, the following new | |||
| 14 bits is supported: | pattern of 14 bits is supported: | |||
| 8 bits: Value 00010001 to indicate a large window brotli stream. | 8 bits: Value 00010001 to indicate a large window brotli stream. | |||
| 6 bits: WBITS. Must have value in range 10 to 62. | 6 bits: WBITS. Must have value in range 10 to 62. | |||
| Distance alphabet: If the stream is a large window brotli stream, | Distance alphabet: If the stream is a large window brotli stream, | |||
| the maximum number of extra bits is 62 and the theoretical maximum | the maximum number of extra bits is 62 and the theoretical maximum | |||
| size of the distance alphabet is (16 + NDIRECT + (124 << | size of the distance alphabet is (16 + NDIRECT + (124 << | |||
| NPOSTFIX)). This overrides the value for the distance alphabet | NPOSTFIX)). This overrides the value for the distance alphabet | |||
| size given in Section 3.3 of [RFC7932] and affects the number of | size given in Section 3.3 of [RFC7932] and affects the number of | |||
| skipping to change at line 638 ¶ | skipping to change at line 645 ¶ | |||
| * The stream may have the format of regular brotli [RFC7932] or the | * The stream may have the format of regular brotli [RFC7932] or the | |||
| format of large window brotli as described in Section 6. | format of large window brotli as described in Section 6. | |||
| 8. Shared Brotli Framing Format Stream | 8. Shared Brotli Framing Format Stream | |||
| A compliant shared brotli framing format stream has the format | A compliant shared brotli framing format stream has the format | |||
| described below. | described below. | |||
| 8.1. Main Format | 8.1. Main Format | |||
| 4 bytes: File signature, in hexadecimal the bytes 0x91, 0x0a, 0x42, | 4 bytes: File signature in hexadecimal format (bytes 0x91, 0x0a, | |||
| 0x52. The first byte contains the invalid WBITS combination for | 0x42, and 0x52). The first byte contains the invalid WBITS | |||
| brotli [RFC7932] and large window brotli. | combination for brotli [RFC7932] and large window brotli. | |||
| 1 byte: Container flags that are 8 bits and have the following | 1 byte: Container flags that are 8 bits and have the following | |||
| meanings: | meanings: | |||
| bit 0 and 1: Version indicator that must be b'00. Otherwise, the | bits 0 and 1: Version indicator that must be b'00. Otherwise, | |||
| decoder must reject the data stream as invalid. | the decoder must reject the data stream as invalid. | |||
| bit 2: If 0, the file contains no final footer, may not contain | bit 2: If 0, the file contains no final footer, may not contain | |||
| any metadata chunks, may not contain a central directory, and | any metadata chunks, may not contain a central directory, and | |||
| may encode only a single resource (using one or more data | may encode only a single resource (using one or more data | |||
| chunks). If 1, the file may contain one or more resources, | chunks). If 1, the file may contain one or more resources, | |||
| metadata, and a central directory, and it must contain a final | metadata, and a central directory, and it must contain a final | |||
| footer. | footer. | |||
| multiple times: A chunk, each with the format specified in | multiple times: A chunk, each with the format specified in | |||
| Section 8.2. | Section 8.2. | |||
| skipping to change at line 707 ¶ | skipping to change at line 714 ¶ | |||
| can be 1 serialized dictionary and 15 prefix dictionaries | can be 1 serialized dictionary and 15 prefix dictionaries | |||
| maximum (a serialized dictionary may already contain one of | maximum (a serialized dictionary may already contain one of | |||
| those). Circular references are not allowed (any dictionary | those). Circular references are not allowed (any dictionary | |||
| reference that directly or indirectly uses this chunk itself as | reference that directly or indirectly uses this chunk itself as | |||
| dictionary). | dictionary). | |||
| Per dictionary reference: | Per dictionary reference: | |||
| 1 byte: Flags: | 1 byte: Flags: | |||
| bit 0 and 1: Dictionary source: | bits 0 and 1: Dictionary source: | |||
| 00: Internal dictionary reference to a full resource by | 00: Internal dictionary reference to a full resource by | |||
| pointer, which can span one or more chunks. Must | pointer, which can span one or more chunks. Must | |||
| point to a full data chunk or a first partial data | point to a full data chunk or a first partial data | |||
| chunk. | chunk. | |||
| 01: Internal dictionary reference to single chunk | 01: Internal dictionary reference to single chunk | |||
| contents by pointer. May point to any chunk with | contents by pointer. May point to any chunk with | |||
| content (data or metadata). If a partial data | content (data or metadata). If a partial data | |||
| chunk, only this part is the dictionary. In this | chunk, only this part is the dictionary. In this | |||
| skipping to change at line 731 ¶ | skipping to change at line 738 ¶ | |||
| 10: Reference to a dictionary by hash code of a | 10: Reference to a dictionary by hash code of a | |||
| resource. The dictionary can come from an external | resource. The dictionary can come from an external | |||
| source, such as a different container. The user of | source, such as a different container. The user of | |||
| the decoder must be able to provide the dictionary | the decoder must be able to provide the dictionary | |||
| contents given its hash code (even if it comes from | contents given its hash code (even if it comes from | |||
| this container itself) or treat it as an error when | this container itself) or treat it as an error when | |||
| the user does not have it available. | the user does not have it available. | |||
| 11: Invalid bit combination | 11: Invalid bit combination | |||
| bit 2 and 3: Dictionary type: | bits 2 and 3: Dictionary type: | |||
| 00: Prefix dictionary, set in front of the sliding | 00: Prefix dictionary, set in front of the sliding | |||
| window | window | |||
| 01: Serialized dictionary in the shared brotli format as | 01: Serialized dictionary in the shared brotli format as | |||
| specified in Section 5. | specified in Section 5. | |||
| 10: Invalid bit combination | 10: Invalid bit combination | |||
| 11: Invalid bit combination | 11: Invalid bit combination | |||
| bit 4-7: Must be 0 | bits 4-7: Must be 0 | |||
| If hash-based: | If hash-based: | |||
| 1 byte: Type of hash used. Only supported value: 3, | 1 byte: Type of hash used. Only supported value: 3, | |||
| indicating 256-bit HighwayHash [HWYHASH]. | indicating 256-bit HighwayHash [HWYHASH]. | |||
| 32 bytes: 256-bit HighwayHash checksum to refer to | 32 bytes: 256-bit HighwayHash checksum to refer to | |||
| dictionary. | dictionary. | |||
| If pointer based: Varint-encoded pointer to its chunk in this | If pointer based: Varint-encoded pointer to its chunk in this | |||
| container. The chunk must come in the container earlier | container. The chunk must come in the container earlier | |||
| than the current chunk. | than the current chunk. | |||
| X bytes: Extra header bytes, depending on CHUNK_TYPE. If present, | X bytes: Extra header bytes, depending on CHUNK_TYPE. If present, | |||
| they are specified in the subsequent sections. | they are specified in the subsequent sections. | |||
| remaining bytes: The chunk contents. The uncompressed data in | remaining bytes: The chunk contents. The uncompressed data in the | |||
| the chunk content depends on CHUNK_TYPE and is specified in the | chunk content depends on CHUNK_TYPE and is specified in the | |||
| subsequent sections. The compressed data has following format | subsequent sections. The compressed data has following format | |||
| depending on CODEC: | depending on CODEC: | |||
| * uncompressed: The raw bytes. | * uncompressed: The raw bytes. | |||
| * If "keep decoder", the continuation of the compressed stream | * If "keep decoder", the continuation of the compressed stream | |||
| that was interrupted at the end of the previous chunk. The | that was interrupted at the end of the previous chunk. The | |||
| decoder from the previous chunk must be used and its state | decoder from the previous chunk must be used and its state it | |||
| it had at the end of the previous chunk must be kept at the | had at the end of the previous chunk must be kept at the start | |||
| start of the decoding of this chunk. | of the decoding of this chunk. | |||
| * brotli: The bytes are in brotli format [RFC7932]. | * brotli: The bytes are in brotli format [RFC7932]. | |||
| * shared brotli: The bytes are in the shared brotli format | * shared brotli: The bytes are in the shared brotli format | |||
| specified in Section 7. | specified in Section 7. | |||
| 8.3. Metadata Format | 8.3. Metadata Format | |||
| All the metadata chunk types use the following format for the | All the metadata chunk types use the following format for the | |||
| uncompressed content: | uncompressed content: | |||
| Per field: | Per field: | |||
| 2 bytes: Code to identify this metadata field. This must be two | 2 bytes: Code to identify this metadata field. This must be two | |||
| lowercase or two uppercase alpha ASCII characters. If the | lowercase or two uppercase alpha ASCII characters. If the | |||
| decoder encounters a lowercase field that it does not recognize | decoder encounters a lowercase field that it does not recognize | |||
| skipping to change at line 828 ¶ | skipping to change at line 835 ¶ | |||
| This chunk contains metadata that applies to the resource whose | This chunk contains metadata that applies to the resource whose | |||
| beginning is encoded in the subsequent data chunk or first partial | beginning is encoded in the subsequent data chunk or first partial | |||
| data chunk. | data chunk. | |||
| The contents of this chunk follows the format described in | The contents of this chunk follows the format described in | |||
| Section 8.3. | Section 8.3. | |||
| The following field types are recognized: | The following field types are recognized: | |||
| id: Name field. May appear 0 or 1 times. Has the following format: | id (N bytes): Name field. May appear 0 or 1 times. Has the | |||
| following format: name in UTF-8 encoding, length determined by the | ||||
| N bytes: Name in UTF-8 encoding, length determined by the field | field length. Treated generically but may be used as a filename. | |||
| length. Treated generically but may be used as a filename. If | If used as a filename, forward slashes '/' should be used as | |||
| used as a filename, forward slashes '/' should be used as | directory separators, relative paths should be used, and filenames | |||
| directory separators, relative paths should be used, and | ending in a slash with 0-length content in the matching data chunk | |||
| filenames ending in a slash with 0-length content in the | should be treated as an empty directory. | |||
| matching data chunk should be treated as an empty directory. | ||||
| mt: Modification type. May appear 0 or 1 times. Has the following | ||||
| format: | ||||
| 8 bytes: Microseconds since epoch, as a little-endian, signed | mt (8 bytes): Modification type. May appear 0 or 1 times. Has the | |||
| two's complement 64-bit integer. | following format: contains microseconds since epoch, as a little- | |||
| endian, signed two's complement 64-bit integer. | ||||
| custom user field: Any two uppercase ASCII characters. | custom user field: Any two uppercase ASCII characters. | |||
| 8.4.3. Data Chunk (Type 2) | 8.4.3. Data Chunk (Type 2) | |||
| A data chunk contains the actual data of a resource. | A data chunk contains the actual data of a resource. | |||
| This chunk has the following extra header bytes: | This chunk has the following extra header bytes: | |||
| 1 byte: Flags: | 1 byte: Flags: | |||
| bit 0: If true, indicates this is not a resource that should be | bit 0: If true, indicates this is not a resource that should be | |||
| output implicitly as part of extracting resources from this | output implicitly as part of extracting resources from this | |||
| container. Instead, it may be referred to only explicitly, | container. Instead, it may be referred to only explicitly, | |||
| e.g., as a dictionary reference by hash code or offset. This | e.g., as a dictionary reference by hash code or offset. This | |||
| flag should be set for data used as dictionary to improve | flag should be set for data used as dictionary to improve | |||
| compression of actual resources. | compression of actual resources. | |||
| bit 1: If true, hash code is given | bit 1: If true, hash code is given. | |||
| bits 2-7: Must be zero. | bits 2-7: Must be zero. | |||
| If hash code is given: | If hash code is given: | |||
| 1 byte: Type of hash used. Only supported value: 3, indicating | 1 byte: Type of hash used. Only supported value: 3, indicating | |||
| 256-bit HighwayHash [HWYHASH]. | 256-bit HighwayHash [HWYHASH]. | |||
| 32 bytes: 256-bit HighwayHash checksum of the uncompressed data. | 32 bytes: 256-bit HighwayHash checksum of the uncompressed data. | |||
| skipping to change at line 1003 ¶ | skipping to change at line 1007 ¶ | |||
| 8.4.10. Central Directory Chunk (Type 9) | 8.4.10. Central Directory Chunk (Type 9) | |||
| The central directory chunk along with the repeat metadata chunks | The central directory chunk along with the repeat metadata chunks | |||
| allow quickly finding and listing compressed resources in the | allow quickly finding and listing compressed resources in the | |||
| container file. | container file. | |||
| The central directory chunk is always uncompressed and does not have | The central directory chunk is always uncompressed and does not have | |||
| the codec byte. It instead has the following format: | the codec byte. It instead has the following format: | |||
| varint: Pointer into the file where the repeat metadata chunks are | varint: Pointer into the file where the repeat metadata chunks are | |||
| located or 0 if they are not present per chunk listed: | located or 0 if they are not present. | |||
| per chunk listed: | ||||
| varint: Pointer into the file where this chunk begins. | varint: Pointer into the file where this chunk begins. | |||
| varint: Number of header bytes N used below. | varint: Number of header bytes N used below. | |||
| N bytes: Copy of all the header bytes of the pointed at chunk, | N bytes: Copy of all the header bytes of the pointed at chunk, | |||
| including total size, chunk type byte, codec, uncompressed | including total size, chunk type byte, codec, uncompressed | |||
| size, dictionary references, and X extra header bytes. The | size, dictionary references, and X extra header bytes. The | |||
| content is not repeated here. | content is not repeated here. | |||
| The last listed chunk is reached when the end of the contents of the | The last listed chunk is reached when the end of the contents of the | |||
| central directory are reached. If the end does not match the last | central directory are reached. If the end does not match the last | |||
| byte of the central directory, the decoder must reject the data | byte of the central directory, the decoder must reject the data | |||
| stream as invalid. | stream as invalid. | |||
| If present, the central directory must list all data and metadata | If present, the central directory must list all data and metadata | |||
| chunks of all types. | chunks of all types. | |||
| 8.4.11. Final Footer Chunk (Type 10) | 8.4.11. Final Footer Chunk (Type 10) | |||
| The final footer chunk closes the file and is only present if in the | The final footer chunk closes the file and is only present if bit 2 | |||
| initial container header flags bit 2 was set. | of the initial container flags was set. | |||
| This chunk has the following content, which is always uncompressed: | This chunk has the following content, which is always uncompressed: | |||
| reversed varint: Size of this entire framing format file, including | reversed varint: Size of this entire framing format file, including | |||
| these bytes themselves, or 0 if this size is not given. | these bytes themselves, or 0 if this size is not given. | |||
| reversed varint: Pointer to the start of the central directory, or 0 | reversed varint: Pointer to the start of the central directory, or 0 | |||
| if there is none. | if there is none. | |||
| A reversed varint has the same format as a varint but its bytes are | A reversed varint has the same format as a varint but its bytes are | |||
| skipping to change at line 1092 ¶ | skipping to change at line 1098 ¶ | |||
| The dictionary must be treated with the same security precautions as | The dictionary must be treated with the same security precautions as | |||
| the content because a change to the dictionary can result in a change | the content because a change to the dictionary can result in a change | |||
| to the decompressed content. | to the decompressed content. | |||
| The CRIME attack [CRIME] shows that it's a bad idea to compress data | The CRIME attack [CRIME] shows that it's a bad idea to compress data | |||
| from mixed (e.g., public and private) sources -- the data sources | from mixed (e.g., public and private) sources -- the data sources | |||
| include not only the compressed data but also the dictionaries. For | include not only the compressed data but also the dictionaries. For | |||
| example, if you compress secret cookies using a public-data-only | example, if you compress secret cookies using a public-data-only | |||
| dictionary, you still leak information about the cookies. | dictionary, you still leak information about the cookies. | |||
| Not only can the dictionary reveal information about the compressed | The dictionary can reveal information about the compressed data and | |||
| data, but vice versa; data compressed with the dictionary can reveal | vice versa. That is, data compressed with the dictionary can reveal | |||
| the contents of the dictionary when an adversary can control parts of | contents of the dictionary when an adversary can control parts of the | |||
| data to compress and see the compressed size. On the other hand, if | data to compress and see the compressed size. On the other hand, if | |||
| the adversary can control the dictionary, the adversary can learn | the adversary can control the dictionary, the adversary can learn | |||
| information about the compressed data. | information about the compressed data. | |||
| The most robust defense against CRIME is not to compress private | The most robust defense against CRIME is not to compress private | |||
| data, e.g., sensitive headers like cookies or any content with | data, e.g., sensitive headers like cookies or any content with | |||
| personally identifiable information (PII). The challenge has been to | personally identifiable information (PII). The challenge has been to | |||
| identify secrets within a vast amount of data to be compressed. | identify secrets within a vast amount of data to be compressed. | |||
| Cloudflare uses a regular expression [CLOUDFLARE]. Another idea is | Cloudflare uses a regular expression [CLOUDFLARE]. Another idea is | |||
| to extend existing web template systems (e.g., Soy [SOY]) to allow | to extend existing web template systems (e.g., Soy [SOY]) to allow | |||
| skipping to change at line 1173 ¶ | skipping to change at line 1179 ¶ | |||
| [CRIME] CVE Program, "CVE-2012-4929", | [CRIME] CVE Program, "CVE-2012-4929", | |||
| <https://www.cve.org/CVERecord?id=CVE-2012-4929>. | <https://www.cve.org/CVERecord?id=CVE-2012-4929>. | |||
| [LZ77] Ziv, J. and A. Lempel, "A Universal Algorithm for | [LZ77] Ziv, J. and A. Lempel, "A Universal Algorithm for | |||
| Sequential Data Compression", IEEE Transactions on | Sequential Data Compression", IEEE Transactions on | |||
| Information Theory, vol. 23, no. 3, pp. 337-343, | Information Theory, vol. 23, no. 3, pp. 337-343, | |||
| DOI 10.1109/TIT.1977.1055714, May 1977, | DOI 10.1109/TIT.1977.1055714, May 1977, | |||
| <https://doi.org/10.1109/TIT.1977.1055714>. | <https://doi.org/10.1109/TIT.1977.1055714>. | |||
| [SOY] Google Developers, "Closure Tools", | [SOY] Google Developers, "Closure Tools", | |||
| <https://developers.google.com/closure/templates/>. | <https://developers.google.com/closure>. | |||
| Acknowledgments | Acknowledgments | |||
| The authors would like to thank Robert Obryk for suggesting | The authors would like to thank Robert Obryk for suggesting | |||
| improvements to the format and the text of the specification. | improvements to the format and the text of the specification. | |||
| Authors' Addresses | Authors' Addresses | |||
| Jyrki Alakuijala | Jyrki Alakuijala | |||
| Google, Inc. | Google, Inc. | |||
| End of changes. 36 change blocks. | ||||
| 79 lines changed or deleted | 85 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||