| rfc9841.original.xml | rfc9841.xml | |||
|---|---|---|---|---|
| <?xml version='1.0' encoding='utf-8'?> | <?xml version='1.0' encoding='utf-8'?> | |||
| <!DOCTYPE rfc [ | <!DOCTYPE rfc [ | |||
| <!ENTITY nbsp " "> | <!ENTITY nbsp " "> | |||
| <!ENTITY zwsp "​"> | <!ENTITY zwsp "​"> | |||
| <!ENTITY nbhy "‑"> | <!ENTITY nbhy "‑"> | |||
| <!ENTITY wj "⁠"> | <!ENTITY wj "⁠"> | |||
| ]> | ]> | |||
| <rfc xmlns:xi="http://www.w3.org/2001/XInclude" submissionType="IETF" docName="d | ||||
| raft-vandevenne-shared-brotli-format-15" category="info" updates="7932" ipr="tru | <rfc xmlns:xi="http://www.w3.org/2001/XInclude" submissionType="IETF" docName="d | |||
| st200902" obsoletes="" xml:lang="en" symRefs="true" sortRefs="true" tocInclude=" | raft-vandevenne-shared-brotli-format-15" number="9841" consensus="true" category | |||
| true" version="3"> | ="info" updates="7932" ipr="trust200902" obsoletes="" xml:lang="en" symRefs="tru | |||
| <!-- xml2rfc v2v3 conversion 3.27.0 --> | e" sortRefs="true" tocInclude="true" version="3"> | |||
| <!-- Generated by id2xml 1.5.2 on 2025-02-12T17:39:08Z --> | ||||
| <front> | <front> | |||
| <title abbrev="Shared Brotli Data Format">Shared Brotli Compressed Data Form at</title> | <title abbrev="Shared Brotli Data Format">Shared Brotli Compressed Data Form at</title> | |||
| <seriesInfo name="Internet-Draft" value="draft-vandevenne-shared-brotli-form at-15"/> | <seriesInfo name="RFC" value="9841"/> | |||
| <author initials="J." surname="Alakuijala" fullname="Jyrki Alakuijala"> | <author initials="J." surname="Alakuijala" fullname="Jyrki Alakuijala"> | |||
| <organization abbrev="Google, Inc">Google, Inc.</organization> | <organization abbrev="Google, Inc.">Google, Inc.</organization> | |||
| <address> | <address> | |||
| <email>jyrki@google.com</email> | <email>jyrki@google.com</email> | |||
| </address> | </address> | |||
| </author> | </author> | |||
| <author initials="T." surname="Duong" fullname="Thai Duong"> | <author initials="T." surname="Duong" fullname="Thai Duong"> | |||
| <organization abbrev="Google, Inc">Google, Inc.</organization> | <organization abbrev="Google, Inc.">Google, Inc.</organization> | |||
| <address> | <address> | |||
| <email>thaidn@google.com</email> | <email>thaidn@google.com</email> | |||
| </address> | </address> | |||
| </author> | </author> | |||
| <author initials="E." surname="Kliuchnikov" fullname="Evgenii Kliuchnikov"> | <author initials="E." surname="Kliuchnikov" fullname="Evgenii Kliuchnikov"> | |||
| <organization abbrev="Google, Inc">Google, Inc.</organization> | <organization abbrev="Google, Inc.">Google, Inc.</organization> | |||
| <address> | <address> | |||
| <email>eustas@google.com</email> | <email>eustas@google.com</email> | |||
| </address> | </address> | |||
| </author> | </author> | |||
| <author initials="Z." surname="Szabadka" fullname="Zoltan Szabadka"> | <author initials="Z." surname="Szabadka" fullname="Zoltan Szabadka"> | |||
| <organization abbrev="Google, Inc">Google, Inc.</organization> | <organization abbrev="Google, Inc.">Google, Inc.</organization> | |||
| <address> | <address> | |||
| <email>szabadka@google.com</email> | <email>szabadka@google.com</email> | |||
| </address> | </address> | |||
| </author> | </author> | |||
| <author initials="L." surname="Vandevenne" fullname="Lode Vandevenne"> | <author initials="L." surname="Vandevenne" fullname="Lode Vandevenne" role=" | |||
| <organization abbrev="Google, Inc">Google, Inc.</organization> | editor"> | |||
| <organization abbrev="Google, Inc.">Google, Inc.</organization> | ||||
| <address> | <address> | |||
| <email>lode@google.com</email> | <email>lode@google.com</email> | |||
| </address> | </address> | |||
| </author> | </author> | |||
| <date year="2025" month="June"/> | <date year="2025" month="September"/> | |||
| <area>WIT</area> | ||||
| <keyword>dictionary compression lz77</keyword> | ||||
| <abstract> | <abstract> | |||
| <t> | <t> | |||
| This specification defines a data format for shared brotli | This specification defines a data format for shared brotli | |||
| compression, which adds support for shared dictionaries, large window | compression, which adds support for shared dictionaries, large window, | |||
| and a container format to brotli (RFC 7932). Shared dictionaries and | and a container format to brotli (RFC 7932). Shared dictionaries and | |||
| large window support allow significant compression gains compared to | large window support allow significant compression gains compared to | |||
| regular brotli. This document updates RFC 7932.</t> | regular brotli. This document specifies an extension to the method defined in RFC 7932.</t> | |||
| </abstract> | </abstract> | |||
| </front> | </front> | |||
| <middle> | <middle> | |||
| <section anchor="sect-1" numbered="true" toc="default"> | <section anchor="sect-1" numbered="true" toc="default"> | |||
| <name>Introduction</name> | <name>Introduction</name> | |||
| <section anchor="sect-1.1" numbered="true" toc="default"> | <section anchor="sect-1.1" numbered="true" toc="default"> | |||
| <name>Purpose</name> | <name>Purpose</name> | |||
| <t> | <t> | |||
| The purpose of this specification is to extend the brotli compressed | The purpose of this specification is to extend the brotli compressed | |||
| data format (<xref target="RFC7932" format="default"/>) with new abilities th | data format <xref target="RFC7932" format="default"/> with new abilities that | |||
| at allow further | allow further | |||
| compression gains:</t> | compression gains.</t> | |||
| <ul spacing="normal"> | <ul spacing="normal"> | |||
| <li> | <li> | |||
| <t>Shared dictionaries allow a static shared context between | <t>Shared dictionaries allow a static shared context between | |||
| encoder and decoder for significant compression gains.</t> | encoder and decoder for significant compression gains.</t> | |||
| </li> | </li> | |||
| <li> | <li> | |||
| <t>Large window brotli allows much larger back reference distanc es | <t>Large window brotli allows much larger back reference distanc es | |||
| to give compression gains for files over 16MiB.</t> | to give compression gains for files over 16 MiB.</t> | |||
| </li> | </li> | |||
| <li> | <li> | |||
| <t>The framing format is a container format that allows storage of | <t>The framing format is a container format that allows storage of | |||
| multiple resources and that reference dictionaries.</t> | multiple resources and references dictionaries.</t> | |||
| </li> | </li> | |||
| </ul> | </ul> | |||
| <t> | <t> | |||
| This document is the authoritative specification of shared brotli | This document is the authoritative specification of shared brotli | |||
| data formats and the backwards compatible changes to brotli, and | data formats and the backwards compatible changes to brotli. This document al | |||
| defines:</t> | so defines the following:</t> | |||
| <ul> | <ul> | |||
| <li> | <li> | |||
| <t>The data format of serialized shared dictionaries</t> | <t>The data format of serialized shared dictionaries</t> | |||
| </li> | </li> | |||
| <li> | <li> | |||
| <t>The data format of the framing format</t> | <t>The data format of the framing format</t> | |||
| </li> | </li> | |||
| <li> | <li> | |||
| <t>The encoding of window bits and distances for large window | <t>The encoding of window bits and distances for large window | |||
| brotli in the brotli data format</t> | brotli in the brotli data format</t> | |||
| </li> | </li> | |||
| <li> | <li> | |||
| <t>The encoding of shared dictionary references in the brotli da ta | <t>The encoding of shared dictionary references in the brotli da ta | |||
| format</t> | format</t> | |||
| </li> | </li> | |||
| </ul> | </ul> | |||
| </section> | </section> | |||
| <section anchor="sect-1.2" numbered="true" toc="default"> | <section anchor="sect-1.2" numbered="true" toc="default"> | |||
| <name>Intended audience</name> | <name>Intended Audience</name> | |||
| <t> | <t> | |||
| This specification is intended for use by software implementers to | This specification is intended for use by software implementers to | |||
| compress data into and/or decompress data from the shared brotli | compress data into and/or decompress data from the shared brotli | |||
| dictionary format.</t> | dictionary format.</t> | |||
| <t> | <t> | |||
| The text of the specification assumes a basic background in | The text of the specification assumes a basic background in | |||
| programming at the level of bits and other primitive data | programming at the level of bits and other primitive data | |||
| representations. Familiarity with the technique of LZ77 coding <xref target=" LZ77"/> | representations. Familiarity with the technique of LZ77 coding <xref target=" LZ77"/> | |||
| is helpful but not required.</t> | is helpful, but not required.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-1.3" numbered="true" toc="default"> | <section anchor="sect-1.3" numbered="true" toc="default"> | |||
| <name>Scope</name> | <name>Scope</name> | |||
| <t> | <t> | |||
| This specification defines a data format for shared brotli | This specification defines a data format for shared brotli | |||
| compression, which adds support for dictionaries and extended | compression, which adds support for dictionaries and extended | |||
| features to brotli <xref target="RFC7932" format="default"/>.</t> | features to brotli <xref target="RFC7932" format="default"/>.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-1.4" numbered="true" toc="default"> | <section anchor="sect-1.4" numbered="true" toc="default"> | |||
| <name>Compliance</name> | <name>Compliance</name> | |||
| <t> | <t> | |||
| Unless otherwise indicated below, a compliant decompressor must be | Unless otherwise indicated below, a compliant decompressor must be | |||
| able to accept and decompress any data set that conforms to all the | able to accept and decompress any data set that conforms to all the | |||
| specifications presented here. A compliant compressor must produce | specifications presented here. Additionally, a compliant compressor must prod uce | |||
| data sets that conform to all the specifications presented here.</t> | data sets that conform to all the specifications presented here.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-1.5" numbered="true" toc="default"> | <section anchor="sect-1.5" numbered="true" toc="default"> | |||
| <name>Definitions of terms and conventions used</name> | <name>Definitions of Terms and Conventions Used</name> | |||
| <dl> | <dl> | |||
| <dt>Byte:</dt><dd> 8 bits stored or transmitted as a unit (same as an octet). F or | <dt>Byte:</dt><dd>8 bits stored or transmitted as a unit (same as an octet). Fo r | |||
| this specification, a byte is exactly 8 bits, even on machines that | this specification, a byte is exactly 8 bits, even on machines that | |||
| store a character on a number of bits different from eight. See | store a character on a number of bits different from eight. See | |||
| below for the numbering of bits within a byte.</dd> | below for the numbering of bits within a byte.</dd> | |||
| <dt>String:</dt><dd>a sequence of arbitrary bytes.</dd> | <dt>String:</dt><dd>A sequence of arbitrary bytes.</dd> | |||
| </dl> | </dl> | |||
| <t> | <t> | |||
| Bytes stored within a computer do not have a "bit order", since they | Bytes stored within a computer do not have a "bit order" since they are | |||
| are always treated as a unit. However, a byte considered as an | always treated as a unit. However, a byte considered as an integer between | |||
| integer between 0 and 255 does have a most- and least-significant | 0 and 255 does have a most significant bit (MSB) and least significant bit | |||
| bit, and since we write numbers with the most-significant digit on | (LSB), and since we write numbers with the most significant digit on the left | |||
| the left, we also write bytes with the most-significant bit on the | , | |||
| left. In the diagrams below, we number the bits of a byte so that bit | we also write bytes with the MSB on the left. In the diagrams below, the | |||
| 0 is the least-significant bit, i.e., the bits are numbered:</t> | bits of a byte are written so that bit 0 is the LSB, i.e., the bits are | |||
| numbered as follows:</t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | <artwork name="" type="" align="left" alt=""><![CDATA[ | |||
| +--------+ | +--------+ | |||
| |76543210| | |76543210| | |||
| +--------+ | +--------+ | |||
| ]]></artwork> | ]]></artwork> | |||
| <t> | <t> | |||
| Within a computer, a number may occupy multiple bytes. All multi-byte | Within a computer, a number may occupy multiple bytes. All multi-byte | |||
| numbers in the format described here are unsigned and stored with the | numbers in the format described here are unsigned and stored with the | |||
| least-significant byte first (at the lower memory address). For | least significant byte first (at the lower memory address). For | |||
| example, the decimal 16-bit number 520 is stored as:</t> | example, the decimal 16-bit number 520 is stored as:</t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | <artwork name="" type="" align="left" alt=""><![CDATA[ | |||
| 0 1 | 0 1 | |||
| +--------+--------+ | +--------+--------+ | |||
| |00001000|00000010| | |00001000|00000010| | |||
| +--------+--------+ | +--------+--------+ | |||
| ^ ^ | ^ ^ | |||
| | | | | | | |||
| | + more significant byte = 2 x 256 | | + more significant byte = 2 x 256 | |||
| + less significant byte = 8 | + less significant byte = 8 | |||
| ]]></artwork> | ]]></artwork> | |||
| <section anchor="sect-1.5.1" numbered="true" toc="default"> | <section anchor="sect-1.5.1" numbered="true" toc="default"> | |||
| <name>Packing into bytes</name> | <name>Packing into Bytes</name> | |||
| <t> | <t> | |||
| This document does not address the issue of the order in which bits | This document does not address the issue of the order in which bits | |||
| of a byte are transmitted on a bit-sequential medium, since the final | of a byte are transmitted on a bit-sequential medium, since the final | |||
| data format described here is byte- rather than bit-oriented. | data format described here is byte- rather than bit-oriented. | |||
| However, we describe the compressed block format below as a sequence | However, the compressed block format is described below as a sequence | |||
| of data elements of various bit lengths, not a sequence of bytes. We | of data elements of various bit lengths, not a sequence of bytes. Therefore, | |||
| must therefore specify how to pack these data elements into bytes to | we must specify how to pack these data elements into bytes to | |||
| form the final compressed byte sequence:</t> | form the final compressed byte sequence:</t> | |||
| <ul spacing="normal"> | <ul spacing="normal"> | |||
| <li> | <li> | |||
| <t>Data elements are packed into bytes in order of | <t>Data elements are packed into bytes in order of | |||
| increasing bit number within the byte, i.e., starting | increasing bit number within the byte, i.e., starting | |||
| with the least-significant bit of the byte.</t> | with the LSB of the byte.</t> | |||
| </li> | </li> | |||
| <li> | <li> | |||
| <t>Data elements other than prefix codes are packed | <t>Data elements other than prefix codes are packed | |||
| starting with the least-significant bit of the data | starting with the LSB of the data | |||
| element. These are referred to here as integer values | element. These are referred to here as integer values | |||
| and are considered unsigned.</t> | and are considered unsigned.</t> | |||
| </li> | </li> | |||
| <li> | <li> | |||
| <t>Prefix codes are packed starting with the most-significant | <t>Prefix codes are packed starting with the MSB of the code.< | |||
| bit of the code.</t> | /t> | |||
| </li> | </li> | |||
| </ul> | </ul> | |||
| <t> | <t> | |||
| In other words, if one were to print out the compressed data as a | In other words, if one were to print out the compressed data as a | |||
| sequence of bytes, starting with the first byte at the *right* margin | sequence of bytes starting with the first byte at the <strong>right</strong> | |||
| and proceeding to the *left*, with the most-significant bit of each | margin | |||
| and proceeding to the <strong>left</strong>, with the MSB of each | ||||
| byte on the left as usual, one would be able to parse the result from | byte on the left as usual, one would be able to parse the result from | |||
| right to left, with fixed-width elements in the correct MSB-to-LSB | right to left with fixed-width elements in the correct MSB-to-LSB | |||
| order and prefix codes in bit-reversed order (i.e., with the first | order and prefix codes in bit-reversed order (i.e., with the first | |||
| bit of the code in the relative LSB position).</t> | bit of the code in the relative LSB position).</t> | |||
| <t> | <t> | |||
| As an example, consider packing the following data elements into a | As an example, consider packing the following data elements into a | |||
| sequence of 3 bytes: 3-bit integer value 6, 4-bit integer value 2, | sequence of 3 bytes: 3-bit integer value 6, 4-bit integer value 2, | |||
| 3-bit prefix code b'110, 2-bit prefix code b'10, 12-bit integer value | 3-bit prefix code b'110, 2-bit prefix code b'10, and 12-bit integer value | |||
| 3628.</t> | 3628.</t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | <artwork name="" type="" align="left" alt=""><![CDATA[ | |||
| byte 2 byte 1 byte 0 | byte 2 byte 1 byte 0 | |||
| +--------+--------+--------+ | +--------+--------+--------+ | |||
| |11100010|11000101|10010110| | |11100010|11000101|10010110| | |||
| +--------+--------+--------+ | +--------+--------+--------+ | |||
| ^ ^ ^ ^ ^ | ^ ^ ^ ^ ^ | |||
| | | | | | | | | | | | | |||
| | | | | +------ integer value 6 | | | | | +------ integer value 6 | |||
| | | | +---------- integer value 2 | | | | +---------- integer value 2 | |||
| skipping to change at line 223 ¶ | skipping to change at line 228 ¶ | |||
| | +---------------- prefix code 10 | | +---------------- prefix code 10 | |||
| +----------------------------- integer value 3628 | +----------------------------- integer value 3628 | |||
| ]]></artwork> | ]]></artwork> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section anchor="sect-2" numbered="true" toc="default"> | <section anchor="sect-2" numbered="true" toc="default"> | |||
| <name>Shared Brotli Overview</name> | <name>Shared Brotli Overview</name> | |||
| <t> | <t> | |||
| Shared brotli extends brotli <xref target="RFC7932" format="default"/> with s upport for shared | Shared brotli extends brotli <xref target="RFC7932" format="default"/> with s upport for shared | |||
| dictionaries, larger LZ77 window and a framing format.</t> | dictionaries, a larger LZ77 window, and a framing format.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-3" numbered="true" toc="default"> | <section anchor="sect-3" numbered="true" toc="default"> | |||
| <name>Shared Dictionaries</name> | <name>Shared Dictionaries</name> | |||
| <t> | <t> | |||
| A shared dictionary is a piece of data shared by a compressor and | A shared dictionary is a piece of data shared by a compressor and | |||
| decompressor. The compressor can take advantage of the dictionary | decompressor. The compressor can take advantage of the dictionary | |||
| context to encode the input in a more compact manner. The compressor | context to encode the input in a more compact manner. The compressor | |||
| and the decompressor must use exactly the same dictionary. A shared | and the decompressor must use exactly the same dictionary. A shared | |||
| dictionary is specially useful to compress short input sequences.</t> | dictionary is specially useful to compress short input sequences.</t> | |||
| <t>A shared brotli dictionary can use two methods of sharing context:</t > | <t>A shared brotli dictionary can use two methods of sharing context:</t > | |||
| <t>An LZ77 dictionary. The encoder and decoder could refer | <dl><dt>LZ77 dictionary:</dt><dd>The encoder and decoder could ref er | |||
| to a given sequence of bytes. Multiple LZ77 dictionaries | to a given sequence of bytes. Multiple LZ77 dictionaries | |||
| can be set.</t> | can be set.</dd> | |||
| <t>A custom static dictionary: a word list with transforms. The | <dt>Custom static dictionary:</dt><dd>A word list with transforms. | |||
| The | ||||
| encoder and decoder will replace the static dictionary data | encoder and decoder will replace the static dictionary data | |||
| with the data in the shared dictionary. The original static | with the data in the shared dictionary. The original static | |||
| dictionary is described in <xref target="sect-8" format="default"/> in < xref target="RFC7932" format="default"/>. The original | dictionary is described in <xref target="sect-8" format="default"/> in < xref target="RFC7932" format="default"/>. The original | |||
| data from Appendix A and Appendix B of <xref target="RFC7932" format="de fault"/> will be | data from Appendices <xref section="A" target="RFC7932" sectionFormat="b are"/> and <xref section="B" target="RFC7932" sectionFormat="bare"/> of <xref ta rget="RFC7932"/> will be | |||
| replaced. In addition, it is possible to dynamically switch | replaced. In addition, it is possible to dynamically switch | |||
| this dictionary based on the data compression context, and/or | this dictionary based on the data compression context and/or | |||
| to include a reference to the original dictionary in the custom | include a reference to the original dictionary in the custom | |||
| dictionary.</t> | dictionary.</dd></dl> | |||
| <t> | <t> | |||
| If no shared dictionary is set the decoder behaves the same as in | If no shared dictionary is set, the decoder behaves the same as in | |||
| <xref target="RFC7932" format="default"/> on a brotli stream.</t> | <xref target="RFC7932" format="default"/> on a brotli stream.</t> | |||
| <t> | <t> | |||
| If a shared dictionary is set, then it can set any of: LZ77 | If a shared dictionary is set, then it can set LZ77 dictionaries, override | |||
| dictionaries, overriding static dictionary words, and/or overriding | static dictionary words, and/or override transforms.</t> | |||
| transforms.</t> | ||||
| <section anchor="sect-3.1" numbered="true" toc="default"> | <section anchor="sect-3.1" numbered="true" toc="default"> | |||
| <name>Custom Static Dictionaries</name> | <name>Custom Static Dictionaries</name> | |||
| <t> | <t> | |||
| If a custom word list is set, then the following behavior of the RFC | ||||
| 7932 decoder <xref target="RFC7932" format="default"/> is overridden:</t> | If a custom word list is set, then the following behaviors of the decoder def | |||
| ined in <xref target="RFC7932" format="default"/> are overridden:</t> | ||||
| <t indent="3"> | <t indent="3"> | |||
| Instead of the Static Dictionary Data from Appendix A | Instead of the Static Dictionary Data from <xref section="A" target="RFC79 | |||
| of <xref target="RFC7932" format="default"/>, one or more word lists from | 32" format="default"/>, one or more word lists from the custom static | |||
| the custom static | ||||
| dictionary data are used.</t> | dictionary data are used.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| Instead of NDBITS at the end of Appendix A, a custom | Instead of NDBITS at the end of <xref section="A" target="RFC7932" format= "default"/>, a custom | |||
| SIZE_BITS_BY_LENGTH per custom word list is used. | SIZE_BITS_BY_LENGTH per custom word list is used. | |||
| </t> | </t> | |||
| <t indent="3"> | <t indent="3"> | |||
| The copy length for a static dictionary reference must be | The copy length for a static dictionary reference must be | |||
| between 4 and 31 and may not be a value for which | between 4 and 31 and may not be a value for which | |||
| SIZE_BITS_BY_LENGTH of this dictionary is 0.</t> | SIZE_BITS_BY_LENGTH of this dictionary is 0.</t> | |||
| <t> | <t> | |||
| If a custom transforms list is set without context dependency, then | If a custom transforms list is set without context dependency, then | |||
| the following behavior of the RFC 7932 decoder <xref target="RFC7932" format= "default"/> is | the following behaviors of the decoder defined in <xref target="RFC7932" form at="default"/> are | |||
| overridden:</t> | overridden:</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| The "List of Word Transformations" from Appendix B is | The "List of Word Transformations" from <xref section="B" target="RFC7932" | |||
| overridden by one or more lists of custom prefixes, suffixes and | format="default"/> is | |||
| overridden by one or more lists of custom prefixes, suffixes, and | ||||
| transform operations.</t> | transform operations.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| The transform_id must be smaller than the number of transforms | The transform_id must be smaller than the number of transforms | |||
| given in the custom transforms list.</t> | given in the custom transforms list.</t> | |||
| <t> | <t> | |||
| If the dictionary is context dependent, it includes a lookup table of | If the dictionary is context dependent, it includes a lookup table of | |||
| 64 word list and transform list combinations. When resolving a static | 64 word list and transform list combinations. When resolving a static | |||
| dictionary word, the decoder computes the literal context id, as in | dictionary word, the decoder computes the literal Context ID as described in | |||
| section 7.1. of <xref target="RFC7932" format="default"/>. The literal contex | <xref target="RFC7932" section="7.1"/>. The literal Context ID is used as the | |||
| t id is used as index in | index in | |||
| the lookup tables to select the word list and transforms to use. If | the lookup tables to select the word list and transforms to use. If | |||
| the dictionary is not context dependent, this id is implicitely 0 | the dictionary is not context dependent, this ID is implicitly 0 | |||
| instead.</t> | instead.</t> | |||
| <t> | <t> | |||
| If a distance goes beyond the dictionary for the current id and | ||||
| multiple word list / transform list combinations are defined, then a | If a distance goes beyond the dictionary for the current ID and | |||
| next dictionary is used in the following order: if not context | multiple word/transform list combinations are defined, then the | |||
| dependent, the same order as defined in the shared dictionary. If | next dictionary is used in the following order:</t> | |||
| context dependent, the index matching the current context is used | <ul><li><t>If context dependent:</t> | |||
| first, the same order as defined in the shared dictionary excluding | <ul> | |||
| the current context are used next.</t> | <li>use the index matching the current context first, and then</li> | |||
| <li>use the same order as defined in the shared dictionary (excluding the curren | ||||
| t context) next.</li></ul></li> | ||||
| <li><t>If not context dependent:</t> | ||||
| <ul> | ||||
| <li>use the same order as defined in the shared dictionary.</li> | ||||
| </ul></li> | ||||
| </ul> | ||||
| <section anchor="sect-3.1.1" numbered="true" toc="default"> | <section anchor="sect-3.1.1" numbered="true" toc="default"> | |||
| <name>Transform Operations</name> | <name>Transform Operations</name> | |||
| <t> | <t> | |||
| A shared dictionary may include custom word transformations, to | A shared dictionary may include custom word transformations to | |||
| replace those specified in <xref target="sect-8" format="default"/> and Appen | replace those specified in <xref target="sect-8" format="default"/> and <xref | |||
| dix B of <xref target="RFC7932" format="default"/>. A | section="B" target="RFC7932" format="default"/>. | |||
| transform consists of a possible prefix, a transform operation, for | ||||
| some operations a parameter, and a possible suffix. In the shared | A | |||
| transform consists of a possible prefix, a transform operation, a parameter ( | ||||
| for some operations), and a possible suffix. In the shared | ||||
| dictionary format, the transform operation is represented by a | dictionary format, the transform operation is represented by a | |||
| numerical ID, listed in the table below.</t> | numerical ID, which is listed in the table below.</t> | |||
| <table anchor="operation-ids"> <!-- Assign an anchor --> | <table anchor="operation-ids"> | |||
| <name></name> <!-- Give the table a title --> | <name></name> | |||
| <thead> | <thead> | |||
| <tr> | <tr> | |||
| <th>ID</th> <!-- <th>: header --> | <th>ID</th> | |||
| <th>Operation</th> | <th>Operation</th> | |||
| </tr> | </tr> | |||
| </thead> | </thead> | |||
| <tbody> <!-- The rows --> | <tbody> | |||
| <tr> | <tr> | |||
| <td>0</td> <td>Identity</td> | <td>0</td> <td>Identity</td> | |||
| </tr><tr> | </tr><tr> | |||
| <td>1</td> <td>OmitLast1</td> | <td>1</td> <td>OmitLast1</td> | |||
| </tr><tr> | </tr><tr> | |||
| <td>2</td> <td>OmitLast2</td> | <td>2</td> <td>OmitLast2</td> | |||
| </tr><tr> | </tr><tr> | |||
| <td>3</td> <td>OmitLast3</td> | <td>3</td> <td>OmitLast3</td> | |||
| </tr><tr> | </tr><tr> | |||
| <td>4</td> <td>OmitLast4</td> | <td>4</td> <td>OmitLast4</td> | |||
| skipping to change at line 400 ¶ | skipping to change at line 411 ¶ | |||
| 15 OmitFirst4 | 15 OmitFirst4 | |||
| 16 OmitFirst5 | 16 OmitFirst5 | |||
| 17 OmitFirst6 | 17 OmitFirst6 | |||
| 18 OmitFirst7 | 18 OmitFirst7 | |||
| 19 OmitFirst8 | 19 OmitFirst8 | |||
| 20 OmitFirst9 | 20 OmitFirst9 | |||
| 21 ShiftFirst (by PARAMETER) | 21 ShiftFirst (by PARAMETER) | |||
| 22 ShiftAll (by PARAMETER) | 22 ShiftAll (by PARAMETER) | |||
| ]]></artwork> | ]]></artwork> | |||
| --> | --> | |||
| <t> | <t> | |||
| Operations 0 to 20 are specified in <xref target="sect-8" format="default"/> in <xref target="RFC7932" format="default"/>. | Operations 0 to 20 are specified in <xref section="8" target="RFC7932" format ="default"/>. | |||
| ShiftFirst and ShiftAll transform specifically encoded SCALARs.</t> | ShiftFirst and ShiftAll transform specifically encoded SCALARs.</t> | |||
| <t> | <t> | |||
| A SCALAR is a 7-, 11-, 16- or 21-bit unsigned integer encoded with 1, | A SCALAR is a 7-, 11-, 16-, or 21-bit unsigned integer encoded with 1, | |||
| 2, 3 or 4 bytes respectively with following bit contents:</t> | 2, 3, or 4 bytes, respectively, with the following bit contents:</t> | |||
| <!-- SG: should these be individual figures? --> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | <artwork name="" type="" align="left" alt=""><![CDATA[ | |||
| 7-bit SCALAR: | 7-bit SCALAR: | |||
| +--------+ | +--------+ | |||
| |0sssssss| | |0sssssss| | |||
| +--------+ | +--------+ | |||
| 11-bit SCALAR: | 11-bit SCALAR: | |||
| +--------+--------+ | +--------+--------+ | |||
| |110sssss|XXssssss| | |110sssss|XXssssss| | |||
| +--------+--------+ | +--------+--------+ | |||
| skipping to change at line 431 ¶ | skipping to change at line 441 ¶ | |||
| +--------+--------+--------+ | +--------+--------+--------+ | |||
| |1110ssss|XXssssss|XXssssss| | |1110ssss|XXssssss|XXssssss| | |||
| +--------+--------+--------+ | +--------+--------+--------+ | |||
| 21-bit SCALAR: | 21-bit SCALAR: | |||
| +--------+--------+--------+--------+ | +--------+--------+--------+--------+ | |||
| |11110sss|XXssssss|XXssssss|XXssssss| | |11110sss|XXssssss|XXssssss|XXssssss| | |||
| +--------+--------+--------+--------+ | +--------+--------+--------+--------+ | |||
| ]]></artwork> | ]]></artwork> | |||
| <t> | <t> | |||
| Given the input bytes matching SCALAR encoding pattern, the SCALAR | Given the input bytes matching the SCALAR encoding pattern, the SCALAR | |||
| value is obtained by concatenation of the "s" bits, with the most | value is obtained by concatenation of the "s" bits, with the MSBs coming from | |||
| significant bits coming from the earliest byte. The "X" bits could | the earliest byte. The "X" bits could | |||
| have arbitrary value.</t> | have arbitrary value.</t> | |||
| <t> | <t> | |||
| An ADDEND is defined as the result of limited sign extension of | An ADDEND is defined as the result of limited sign extension of | |||
| 16-bit unsigned PARAMETER:</t> | a 16-bit unsigned PARAMETER:</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| At first the PARAMETER is zero-extended to 32 bits. After this, | At first, the PARAMETER is zero-extended to 32 bits. After this, | |||
| if the resulting value is greater or equal than 0x8000, | 0xFF0000 is added if the resulting value is greater or equal than 0x8000.< | |||
| then 0xFF0000 is added.</t> | /t> | |||
| <t> | <t> | |||
| ShiftAll starts at the beginning of the word and repetitively applies | ShiftAll starts at the beginning of the word and repetitively applies | |||
| the following transform until the whole word is transformed:</t> | the following transformation until the whole word is transformed:</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| If the next untransformed byte matches the first byte of the 7-, | If the next untransformed byte matches the first byte of the 7-, | |||
| 11-, 16- or 21-bit SCALAR pattern, then:</t> | 11-, 16-, or 21-bit SCALAR pattern, then:</t> | |||
| <t indent="6"> | <t indent="6"> | |||
| If the untransformed part of the word is not long enough to | If the untransformed part of the word is not long enough to | |||
| match the whole SCALAR pattern, then the whole word is | match the whole SCALAR pattern, then the whole word is | |||
| marked as transformed.</t> | marked as transformed.</t> | |||
| <t indent="6"> | <t indent="6"> | |||
| Otherwise, let SHIFTED be the sum of the ADDEND and the | Otherwise, let SHIFTED be the sum of the ADDEND and the | |||
| encoded SCALAR. The lowest bits from SHIFTED | encoded SCALAR. The lowest bits from SHIFTED | |||
| are written back into the corresponding "s" bits. The "0", | are written back into the corresponding "s" bits. The "0", | |||
| "1" and "X" bits remain unchanged. Next, 1, 2, 3 or | "1", and "X" bits remain unchanged. Next, 1, 2, 3, or | |||
| 4 not transformed bytes marked as transformed, according to | 4 untransformed bytes are marked as transformed according to | |||
| the SCALAR pattern length.</t> | the SCALAR pattern length.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| Otherwise, the next untransformed byte is marked as transformed.</t> | Otherwise, the next untransformed byte is marked as transformed.</t> | |||
| <t> | <t> | |||
| ShiftFirst applies the same transform as ShiftAll, but does not | ShiftFirst applies the same transformation as ShiftAll, but does not | |||
| iterate.</t> | iterate.</t> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section anchor="sect-3.2" numbered="true" toc="default"> | <section anchor="sect-3.2" numbered="true" toc="default"> | |||
| <name>LZ77 Dictionaries</name> | <name>LZ77 Dictionaries</name> | |||
| <t> | <t> | |||
| If an LZ77 dictionary is set, then the decoder treats this as a | If an LZ77 dictionary is set, the decoder treats it as a | |||
| regular LZ77 copy, but behaves as if the bytes of this dictionary are | regular LZ77 copy but behaves as if the bytes of this dictionary are | |||
| accessible as the uncompressed bytes outside of the regular LZ77 | accessible as the uncompressed bytes outside of the regular LZ77 | |||
| window for backwards references.</t> | window for backwards references.</t> | |||
| <t> | <t> | |||
| Let LZ77_DICTIONARY_LENGTH be the length of the LZ77 dictionary. | Let LZ77_DICTIONARY_LENGTH be the length of the LZ77 dictionary. | |||
| Then word_id, described in <xref target="sect-8" format="default"/> in <xref target="RFC7932" format="default"/>, is redefined as:</t> | Then word_id, described in <xref section="8" target="RFC7932" format="default "/>, is redefined as:</t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | <artwork name="" type="" align="left" alt=""><![CDATA[ | |||
| word_id = distance - (max allowed distance + 1 + | word_id = distance - (max allowed distance + 1 + | |||
| LZ77_DICTIONARY_LENGTH) | LZ77_DICTIONARY_LENGTH) | |||
| ]]></artwork> | ]]></artwork> | |||
| <t> | <t> | |||
| For the case when LZ77_DICTIONARY_LENGTH is 0, word_id matches the | For the case when LZ77_DICTIONARY_LENGTH is 0, word_id matches the | |||
| <xref target="RFC7932" format="default"/> definition.</t> | <xref target="RFC7932" format="default"/> definition.</t> | |||
| <t> | <t> | |||
| Let dictionary_address be</t> | Let dictionary_address be:</t> | |||
| <t> | ||||
| LZ77_DICTIONARY_LENGTH + max allowed distance - distance</t> | <t indent="3"> LZ77_DICTIONARY_LENGTH + max allowed distance - distance</t> | |||
| <t> | <t> | |||
| Then distance values of <length, distance> pairs <xref target="RFC7932" format="default"/> in range | Then distance values of <length, distance> pairs <xref target="RFC7932" format="default"/> in range | |||
| (max allowed distance + 1)..(LZ77_DICTIONARY_LENGTH + max allowed | (max allowed distance + 1)..(LZ77_DICTIONARY_LENGTH + max allowed | |||
| distance) are interpreted as references starting in the LZ77 | distance) are interpreted as references starting in the LZ77 | |||
| dictionary at the byte at dictionary_address. If length is longer | dictionary at the byte at dictionary_address. If length is longer | |||
| than (LZ77_DICTIONARY_LENGTH - dictionary_address), then the | than (LZ77_DICTIONARY_LENGTH - dictionary_address), then the | |||
| reference continues to copy (length - LZ77_DICTIONARY_LENGTH + | reference continues to copy (length - LZ77_DICTIONARY_LENGTH + | |||
| dictionary_address) bytes from the regular LZ77 window starting at | dictionary_address) bytes from the regular LZ77 window starting at | |||
| the beginning.</t> | the beginning.</t> | |||
| </section> | </section> | |||
| skipping to change at line 499 ¶ | skipping to change at line 507 ¶ | |||
| (max allowed distance + 1)..(LZ77_DICTIONARY_LENGTH + max allowed | (max allowed distance + 1)..(LZ77_DICTIONARY_LENGTH + max allowed | |||
| distance) are interpreted as references starting in the LZ77 | distance) are interpreted as references starting in the LZ77 | |||
| dictionary at the byte at dictionary_address. If length is longer | dictionary at the byte at dictionary_address. If length is longer | |||
| than (LZ77_DICTIONARY_LENGTH - dictionary_address), then the | than (LZ77_DICTIONARY_LENGTH - dictionary_address), then the | |||
| reference continues to copy (length - LZ77_DICTIONARY_LENGTH + | reference continues to copy (length - LZ77_DICTIONARY_LENGTH + | |||
| dictionary_address) bytes from the regular LZ77 window starting at | dictionary_address) bytes from the regular LZ77 window starting at | |||
| the beginning.</t> | the beginning.</t> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section anchor="sect-4" numbered="true" toc="default"> | <section anchor="sect-4" numbered="true" toc="default"> | |||
| <name>Varint Encoding</name> | <name>Varint Encoding</name> | |||
| <t>A varint is encoded in base 128 in one or more bytes as follows:</t> | <t>A varint is encoded in base 128 in one or more bytes as follows:</t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | <artwork name="" type="" align="left" alt=""><![CDATA[ | |||
| +--------+--------+ +--------+ | +--------+--------+ +--------+ | |||
| |1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx| | |1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx| | |||
| +--------+--------+ +--------+ | +--------+--------+ +--------+ | |||
| ]]></artwork> | ]]></artwork> | |||
| <t> | <t> | |||
| where the "x" bits of the first byte are the least significant bits | where the "x" bits of the first byte are the LSBs | |||
| of the value and the "x" bits of the last byte are the most | of the value and the "x" bits of the last byte are the MSBs of the value. | |||
| significant bits of the value. The last byte must have its MSB set to | ||||
| 0, all other bytes to 1 to indicate there is a next byte.</t> | The last byte must have its MSB set to 0 and all other bytes must | |||
| have their MSBs set to 1 to indicate there is a next byte.</t> | ||||
| <t> | <t> | |||
| The maximum allowed amount of bits to read is 63 bits, if the 9th | The maximum allowed amount of bits to read is 63 bits; if the 9th | |||
| byte is present and has its MSB set then the stream must be | byte is present and has its MSB set, then the stream must be | |||
| considered as invalid.</t> | considered as invalid.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-5" numbered="true" toc="default"> | <section anchor="sect-5" numbered="true" toc="default"> | |||
| <name>Shared Dictionary Stream</name> | <name>Shared Dictionary Stream</name> | |||
| <t> | <t> | |||
| The shared dictionary stream encodes a custom dictionary for brotli | The shared dictionary stream encodes a custom dictionary for brotli, | |||
| including custom words and/or custom transformations. A shared | including custom words and/or custom transformations. A shared | |||
| dictionary may appear standalone or as contents of a resource in a | dictionary may appear as a standalone or as contents of a resource in a | |||
| framing format container.</t> | framing format container.</t> | |||
| <t> | <t> | |||
| A compliant shared brotli dictionary stream must have the following | A compliant shared brotli dictionary stream must have the following | |||
| format:</t> | format:</t> | |||
| <dl newline="false" spacing="normal" indent="3"> | <dl newline="false" spacing="normal" indent="3"> | |||
| <dt>2 bytes:</dt> | <dt>2 bytes:</dt> | |||
| <dd> | <dd>File signature in hexadecimal format (bytes 91 and 0).</dd> | |||
| file signature, in hexadecimal the bytes 91, 0.</dd> | <dt>varint:</dt> <dd>LZ77_DICTIONARY_LENGTH. The number of bytes for | |||
| <dt>varint:</dt> <dd>LZ77_DICTIONARY_LENGTH, number of bytes for a L | an LZ77 dictionary, or 0 if there is none. | |||
| Z77 | ||||
| dictionary, or 0 if there is none. | ||||
| The maximum allowed value is the maximum possible sliding | The maximum allowed value is the maximum possible sliding | |||
| window size of brotli or of large window brotli. | window size of brotli or large window brotli. | |||
| </dd> | </dd> | |||
| <dt> | ||||
| LZ77_DICTIONARY_LENGTH bytes:</dt><dd> contents of the LZ77 dictionary.</d | ||||
| d> | ||||
| <dt>1 byte:</dt><dd> | ||||
| <t>NUM_CUSTOM_WORD_LISTS, may have value 0 to 64</t> | ||||
| <t> NUM_CUSTOM_WORD_LISTS times a word list, with the following | <dt> | |||
| format for each word list: | LZ77_DICTIONARY_LENGTH bytes:</dt><dd>Contents of the LZ77 dictionary.</dd | |||
| </t> | > | |||
| <dt>1 byte:</dt><dd><t>NUM_CUSTOM_WORD_LISTS. May have a value in range 0 | ||||
| to 64.</t></dd> | ||||
| <dt>NUM_CUSTOM_WORD_LISTS times a word list with the following format for each w | ||||
| ord list:</dt> | ||||
| <dd> | ||||
| <t><br/></t> | ||||
| <dl> | <dl> | |||
| <dt>28 bytes:</dt><dd>SIZE_BITS_BY_LENGTH, array of 28 unsigned 8-bit | <dt>28 bytes:</dt><dd>SIZE_BITS_BY_LENGTH. An array of 28 unsign ed 8-bit | |||
| integers, indexed by word lengths 4 to 31. The value | integers, indexed by word lengths 4 to 31. The value | |||
| represents log2(number of words of this length), | represents log2(number of words of this length), | |||
| with the exception of 0 meaning 0 words of this | with the exception of 0 meaning 0 words of this | |||
| length. The max allowed length value is 15 bits. | length. The max allowed length value is 15 bits. | |||
| OFFSETS_BY_LENGTH is computed from this as | OFFSETS_BY_LENGTH is computed from this as | |||
| OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] + | OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] + | |||
| (SIZE_BITS_BY_LENGTH[i] ? (i << SIZE_BITS_BY_LENGTH[i]) | (SIZE_BITS_BY_LENGTH[i] ? (i << SIZE_BITS_BY_LENGTH[i]) | |||
| : 0) </dd> | : 0).</dd> | |||
| <dt>N bytes:</dt><dd> words dictionary data, where N is | ||||
| <dt>N bytes:</dt><dd>Words dictionary data, where N is | ||||
| OFFSETS_BY_LENGTH[31] + (SIZE_BITS_BY_LENGTH[31] ? | OFFSETS_BY_LENGTH[31] + (SIZE_BITS_BY_LENGTH[31] ? | |||
| (31 << SIZE_BITS_BY_LENGTH[31]) : 0), first all the word s of shortest length, then all words of the next length, and so on, where for ea ch length there are either 0 or a positive power of two amount of words. </dd> | (31 << SIZE_BITS_BY_LENGTH[31]) : 0), with all the words of shortest length first, then all words of the next length, and so on, where t here are either 0 or a positive power of two number of words for each length. </ dd> | |||
| </dl></dd> | </dl></dd> | |||
| <dt> | <dt> | |||
| 1 byte:</dt><dd><t>NUM_CUSTOM_TRANSFORM_LISTS, may have value 0 to 64</t> | 1 byte:</dt><dd>NUM_CUSTOM_TRANSFORM_LISTS. May have a value in range 0 to 64.</dd> | |||
| <t> | <dt> | |||
| NUM_CUSTOM_TRANSFORM_LISTS times a transform list, with the | NUM_CUSTOM_TRANSFORM_LISTS times a transform list with the | |||
| following format for each transform list: | following format for each transform list: | |||
| </t> | </dt> | |||
| <dd> | ||||
| <t><br/></t> | ||||
| <dl> | <dl> | |||
| <dt>2 bytes:</dt><dd> PREFIX_SUFFIX_LENGTH, the length of prefix /suffix | <dt>2 bytes:</dt><dd> PREFIX_SUFFIX_LENGTH. The length of prefix /suffix | |||
| data. Must be at least 1 because the list must | data. Must be at least 1 because the list must | |||
| always end with a zero-length stringlet even | always end with a zero-length stringlet even | |||
| if empty. | if it is empty. | |||
| </dd> | </dd> | |||
| <dt>NUM_PREFIX_SUFFIX times:</dt> <dd><t>prefix/suffix stringlet | <dt>NUM_PREFIX_SUFFIX times:</dt><dd><t>Prefix/suffix stringlet. | |||
| .</t> | NUM_PREFIX_SUFFIX is the number of stringlets parsed and | |||
| <t> | ||||
| NUM_PREFIX_SUFFIX is the amount of stringlets parsed and | ||||
| must be in range 1..256. | must be in range 1..256. | |||
| </t><dl> | </t><dl> | |||
| <dt>1 byte:</dt><dd> STRING_LENGTH, the length of the entry | <dt>1 byte:</dt><dd> STRING_LENGTH. The length of the entry contents. | |||
| 0 for the last (terminating) entry of the | 0 for the last (terminating) entry of the | |||
| transform list. For other entries STRING_LENGTH | transform list. For other entries, STRING_LENGTH | |||
| must be in range 1..255. The 0 entry must be | must be in range 1..255. The 0 entry must be | |||
| present and must be the last byte of the | present and must be the last byte of the | |||
| PREFIX_SUFFIX_LENGTH bytes of prefix/suffix | PREFIX_SUFFIX_LENGTH bytes of prefix/suffix | |||
| data, else the stream must be rejected as | data, else the stream must be rejected as | |||
| invalid.</dd> | invalid.</dd> | |||
| <dt>STRING_LENGTH bytes:</dt><dd> contents of the prefix/suffix.</dd> | <dt>STRING_LENGTH bytes:</dt><dd> Contents of the prefix/suffix.</dd> | |||
| </dl></dd> | </dl></dd> | |||
| <dt>1 byte:</dt><dd> NTRANSFORMS, amount of transformation triplets.</dd> | <dt>1 byte:</dt><dd> NTRANSFORMS. Number of transformation triplets.</dd> | |||
| <dt>NTRANSFORMS times:</dt><dd><t> data for each transform:</t> | <dt>NTRANSFORMS times the data for each transform listed below:< | |||
| /dt><dd> | ||||
| <t><br/></t> | ||||
| <dl> | <dl> | |||
| <dt> | <dt> | |||
| 1 byte:</dt><dd> index of prefix in prefix/suffix data; | 1 byte:</dt><dd>Index of prefix in prefix/suffix data; | |||
| must be less than NUM_PREFIX_SUFFIX. | must be less than NUM_PREFIX_SUFFIX. | |||
| </dd> | </dd> | |||
| <dt>1 byte:</dt><dd> index of suffix in prefix/suffix data; | <dt>1 byte:</dt><dd>Index of suffix in prefix/suffix data; | |||
| must be less than NUM_PREFIX_SUFFIX.</dd> | must be less than NUM_PREFIX_SUFFIX.</dd> | |||
| <dt>1 byte:</dt><dd> operation index, must be an index in th | <dt>1 byte:</dt><dd>Operation index; must be an index in the | |||
| e table of | table of | |||
| operations listed in the Section | operations listed in <xref target="sect-3.1.1"/>.</dd></dl></dd></dl> | |||
| "Transform Operations".</dd></dl> | <dl><dt> | |||
| <t> | ||||
| If and only if at least one transform has operation index | If and only if at least one transform has operation index | |||
| ShiftFirst or ShiftAll: | ShiftFirst or ShiftAll, then NTRANSFORMS times the following:</dt><dd> | |||
| </t> | <t><br/></t> | |||
| <dl> | ||||
| <t> NTRANSFORMS times:</t> | ||||
| <dl> | ||||
| <dt> | <dt> | |||
| 2 bytes:</dt><dd> parameters for the transform. If the transform | 2 bytes:</dt><dd>Parameters for the transform. If the transform | |||
| does not have type ShiftFirst or ShiftAll, the | does not have type ShiftFirst or ShiftAll, the | |||
| value must be 0. ShiftFirst and ShiftAll | value must be 0. ShiftFirst and ShiftAll | |||
| interpret these bytes as an unsigned 16-bit | interpret these bytes as an unsigned 16-bit | |||
| integer. | integer. | |||
| </dd></dl> | </dd></dl></dd></dl></dd></dl> | |||
| <t>if NUM_CUSTOM_WORD_LISTS > 0 or NUM_CUSTOM_TRANSFORM_LISTS > | <dl> | |||
| ; 0 | <dt>If NUM_CUSTOM_WORD_LISTS > 0 or NUM_CUSTOM_TRANSFORM_LISTS &g | |||
| t; 0 | ||||
| (else implicitly NUM_DICTIONARIES is 1 and points to the | (else implicitly NUM_DICTIONARIES is 1 and points to the | |||
| brotli built-in and there is no context map) | brotli built-in and there is no context map):</dt> | |||
| </t> | <dd> | |||
| <t><br/></t> | ||||
| <dl> | <dl> | |||
| <dt>1 byte:</dt><dd> NUM_DICTIONARIES, may have value 1 to 64. E | <dt>1 byte:</dt> | |||
| ach | <dd>NUM_DICTIONARIES. May have a value in range 1 to 64. Each | |||
| dictionary is a combination of a word list and a | dictionary is a combination of a word list and a | |||
| transform list. Each next dictionary is used when the | transform list. Each next dictionary is used when the | |||
| distance goes beyond the previous. If a CONTEXT_MAP is | distance goes beyond the previous. If a CONTEXT_MAP is | |||
| enabled, then the dictionary matching the context is | enabled, then the dictionary matching the context is | |||
| moved to the front in the order for this context. | moved to the front in the order for this context. | |||
| </dd> | </dd> | |||
| <dt>NUM_DICTIONARIES times:</dt><dd> <t>the DICTIONARY_MAP:</t> | <dt>NUM_DICTIONARIES times the DICTIONARY_MAP, which contains:</ | |||
| dt><dd> | ||||
| <t><br/></t> | ||||
| <dl><dt> | <dl><dt> | |||
| 1 byte:</dt><dd> index into a custom word list, or value | 1 byte:</dt><dd>Index into a custom word list or value | |||
| NUM_CUSTOM_WORD_LISTS to indicate to use the brotli | NUM_CUSTOM_WORD_LISTS to indicate using the brotli | |||
| <xref target="RFC7932" format="default"/> built-in default w | <xref target="RFC7932" format="default"/> built-in default w | |||
| ord list | ord list. | |||
| </dd> | </dd> | |||
| <dt>1 byte:</dt><dd>index into a custom transform list, or v | <dt>1 byte:</dt><dd>Index into a custom transform list or va | |||
| alue | lue | |||
| NUM_CUSTOM_TRANSFORM_LISTS to indicate to use the | NUM_CUSTOM_TRANSFORM_LISTS to indicate using the | |||
| brotli <xref target="RFC7932" format="default"/> built-in de | brotli <xref target="RFC7932" format="default"/> built-in de | |||
| fault transform list | fault transform list. | |||
| </dd> | </dd> | |||
| </dl> | </dl> | |||
| </dd> | </dd> | |||
| <dt>1 byte:</dt><dd><t> CONTEXT_ENABLED, if 0 there is no contex | ||||
| t map, if 1 a | <dt>1 byte:</dt><dd>CONTEXT_ENABLED. If 0, there is no context ma | |||
| context map used to select the dictionary is encoded | p. If 1, a | |||
| below</t></dd></dl> | context map used to select the dictionary is encoded as | |||
| <t>If CONTEXT_ENABLED is 1, a context map for the 64 brotli | below.</dd> | |||
| </dl> | ||||
| <dl> | ||||
| <dt>If CONTEXT_ENABLED is 1, there is a context map for the 64 b | ||||
| rotli | ||||
| <xref target="RFC7932" format="default"/> literals contexts: | <xref target="RFC7932" format="default"/> literals contexts: | |||
| </t> | </dt> | |||
| <dd><t><br/></t> | ||||
| <dl> | <dl> | |||
| <dt>64 bytes:</dt><dd> CONTEXT_MAP, index into the DICTIONAR | <dt>64 bytes:</dt><dd> CONTEXT_MAP. Index into the DICTIONAR | |||
| Y_MAP for | Y_MAP for | |||
| the first dictionary to use for this context | the first dictionary to use for this context. | |||
| </dd></dl></dd> | </dd></dl></dd> | |||
| </dl> | </dl></dd></dl> | |||
| </dd> | ||||
| </dl> | ||||
| </section> | </section> | |||
| <section anchor="sect-6" numbered="true" toc="default"> | <section anchor="sect-6" numbered="true" toc="default"> | |||
| <name>Large Window Brotli Compressed Data Stream</name> | <name>Large Window Brotli Compressed Data Stream</name> | |||
| <t> | <t> | |||
| Large window brotli allows a sliding window beyond the 24-bit maximum | Large window brotli allows a sliding window beyond the 24-bit maximum | |||
| of regular brotli <xref target="RFC7932" format="default"/>.</t> | of regular brotli <xref target="RFC7932" format="default"/>.</t> | |||
| <t> | <t> | |||
| The compressed data stream is backwards compatible to brotli | The compressed data stream is backwards compatible to brotli | |||
| <xref target="RFC7932" format="default"/>, and may optionally have the follow | <xref target="RFC7932" format="default"/> and may optionally have the foll | |||
| ing differences:</t> | owing differences:</t> | |||
| <dl><dt>Encoding of WBITS in the stream header:</dt><dd><t> the following new | ||||
| pattern of 14 bits is supported:</t> | ||||
| <dl> | <dl> | |||
| <dt>8 bits:</dt><dd> value 00010001, to indicate a large window | <dt>In the encoding of WBITS in the stream header, the following new pattern of | |||
| brotli stream</dd> | 14 bits is supported:</dt><dd><t><br/></t> | |||
| <dl newline="false" spacing="normal"> | ||||
| <dt>8 bits:</dt><dd>Value 00010001 to indicate a large window | ||||
| brotli stream.</dd> | ||||
| <dt>6 bits:</dt><dd> WBITS, must have value in range 10 to 62</dd> | <dt>6 bits:</dt><dd> WBITS. Must have value in range 10 to 62.</dd> | |||
| </dl></dd> | </dl></dd> | |||
| <dt>Distance alphabet:</dt><dd>If the stream is a large window brotl | ||||
| <dt>Distance alphabet:</dt><dd> if the stream is a large window brot | i | |||
| li | ||||
| stream, the maximum number of extra bits is 62 and the | stream, the maximum number of extra bits is 62 and the | |||
| theoretical maximum size of the distance alphabet is | theoretical maximum size of the distance alphabet is | |||
| (16 + NDIRECT + (124 << NPOSTFIX)). This overrides the value for the distance alphabet size given in <xref section="3.3" sectionFormat="of" targe t="RFC7932"/> and affects the amount of bits in the encoding of the Simple Prefi x Code for distances as described in <xref section="3.4" sectionFormat="of" targ et="RFC7932"/>. An additional limitation to distances, despite the large allowed alphabet size, is that the alphabet is not allowed to contain a distance symbol able to represent a distance larger than ((1 << 63) - 4) when its extra b its have their maximum value. It depends on NPOSTFIX and NDIRECT when this can o ccur. </dd> | (16 + NDIRECT + (124 << NPOSTFIX)). This overrides the value for the distance alphabet size given in <xref section="3.3" sectionFormat="of" targe t="RFC7932"/> and affects the number of bits in the encoding of the Simple Prefi x Code for distances as described in <xref section="3.4" sectionFormat="of" targ et="RFC7932"/>. An additional limitation to distances, despite the large allowed alphabet size, is that the alphabet is not allowed to contain a distance symbol able to represent a distance larger than ((1 << 63) - 4) when its extra b its have their maximum value. It depends on NPOSTFIX and NDIRECT when this can o ccur. </dd> | |||
| </dl> | </dl> | |||
| <t> | <t> | |||
| A decoder that does not support 64-bit integers may reject a stream | A decoder that does not support 64-bit integers may reject a stream | |||
| if WBITS is higher than 30 or a distance symbol from the distance | if WBITS is higher than 30 or a distance symbol from the distance | |||
| alphabet is able to encode a distance larger than 2147483644.</t> | alphabet is able to encode a distance larger than 2147483644.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-7" numbered="true" toc="default"> | <section anchor="sect-7" numbered="true" toc="default"> | |||
| <name>Shared Brotli Compressed Data Stream</name> | <name>Shared Brotli Compressed Data Stream</name> | |||
| <t> | <t> | |||
| The format of a shared brotli compressed data stream without framing | The format of a shared brotli compressed data stream without a framing | |||
| format is backwards compatible with brotli <xref target="RFC7932" format="def | format is backwards compatible with brotli <xref target="RFC7932" format="def | |||
| ault"/>, with the | ault"/> with the | |||
| following optional differences:</t> | following optional differences:</t> | |||
| <ul><li>LZ77 dictionaries as described above are supported</li> | <ul><li>LZ77 dictionaries as described above are supported.</li> | |||
| <li> Custom static dictionaries replacing or extending the static | <li> Custom static dictionaries replacing or extending the static | |||
| dictionary of brotli <xref target="RFC7932" format="default"/> with differe nt words or | dictionary of brotli <xref target="RFC7932" format="default"/> with differe nt words or | |||
| transforms are supported</li> | transforms are supported.</li> | |||
| <li>The stream may have the format of regular brotli <xref target="RFC7932"/>, | <li>The stream may have the format of regular brotli <xref target="RFC7932"/> | |||
| or the format of large window brotli as described in section | or the format of large window brotli as described in <xref target="sect-6 | |||
| 6.</li> | " format="default"/>.</li> | |||
| </ul> | </ul> | |||
| </section> | </section> | |||
| <section anchor="sect-8" numbered="true" toc="default"> | <section anchor="sect-8" numbered="true" toc="default"> | |||
| <name>Shared Brotli Framing Format Stream</name> | <name>Shared Brotli Framing Format Stream</name> | |||
| <t> | <t> | |||
| A compliant shared brotli framing format stream has the format | A compliant shared brotli framing format stream has the format | |||
| described below.</t> | described below.</t> | |||
| <section anchor="sect-8.1" numbered="true" toc="default"> | <section anchor="sect-8.1" numbered="true" toc="default"> | |||
| <name>Main Format</name> | <name>Main Format</name> | |||
| <dl> | <dl> | |||
| <dt>4 bytes:</dt><dd> file signature, in hexadecimal the bytes 0x9 | <dt>4 bytes:</dt><dd>File signature in hexadecimal format (bytes 0 | |||
| 1, 0x0a, | x91, 0x0a, 0x42, and 0x52). The first byte contains the invalid WBITS | |||
| 0x42, 0x52. The first byte contains the invalid WBITS | ||||
| combination for brotli <xref target="RFC7932" format="default"/> a nd large window brotli. | combination for brotli <xref target="RFC7932" format="default"/> a nd large window brotli. | |||
| </dd> | </dd> | |||
| <dt>1 byte:</dt><dd><t> container flags, 8 bits with meanings:</t> | <dt>1 byte:</dt><dd><t>Container flags that are 8 bits and have the foll | |||
| <dl><dt> bit 0 and 1:</dt><dd> version indicator, must be b'00, otherwise | owing meanings:</t> | |||
| the | ||||
| <dl><dt> bits 0 and 1:</dt><dd>Version indicator that must be b'00. Otherw | ||||
| ise, the | ||||
| decoder must reject the data stream as invalid. | decoder must reject the data stream as invalid. | |||
| </dd> | </dd> | |||
| <dt>bit 2:</dt> <dd>if 0, the file contains no final footer, m ay not contain | <dt>bit 2:</dt> <dd>If 0, the file contains no final footer, m ay not contain | |||
| any metadata chunks, may not contain a central directory, | any metadata chunks, may not contain a central directory, | |||
| and may encode only a single resource (using one or more | and may encode only a single resource (using one or more | |||
| data chunks). If 1, the file may contain one or more | data chunks). If 1, the file may contain one or more | |||
| resources, metadata, central directory, and must contain a | resources, metadata, and a central directory, and it must contain a | |||
| final footer. | final footer. | |||
| </dd> | </dd> | |||
| </dl> | </dl> | |||
| </dd> | </dd> | |||
| <dt>multiple times:</dt><dd> a chunk, each with the format specifi ed in section 8.2</dd> | <dt>multiple times:</dt><dd>A chunk, each with the format specifie d in <xref target="sect-8.2"/>.</dd> | |||
| </dl> | </dl> | |||
| </section> | </section> | |||
| <section anchor="sect-8.2" numbered="true" toc="default"> | <section anchor="sect-8.2" numbered="true" toc="default"> | |||
| <name>Chunk Format</name> | <name>Chunk Format</name> | |||
| <dl> | <dl> | |||
| <dt>varint:</dt> <dd>length of this chunk excluding this varint bu | <dt>varint:</dt><dd>Length of this chunk excluding this varint | |||
| t | but including all next header bytes and data. If the value is 0, | |||
| including all next header bytes and data. If the value | then the chunk type byte is not present and the chunk type is | |||
| is 0, then the chunk type byte is not present and the | assumed to be 0.</dd> | |||
| chunk type is assumed to be 0.</dd> | ||||
| <dt>1 byte:</dt><dd><t>CHUNK_TYPE</t> | <dt>1 byte:</dt><dd><t>CHUNK_TYPE</t> | |||
| <dl> | <dl indent="5" spacing="compact"> | |||
| <dt> 0:</dt><dd> padding chunk</dd> | <dt> 0:</dt><dd> padding chunk</dd> | |||
| <dt> 1:</dt><dd> metadata chunk</dd> | <dt> 1:</dt><dd> metadata chunk</dd> | |||
| <dt> 2:</dt><dd> data chunk</dd> | <dt> 2:</dt><dd> data chunk</dd> | |||
| <dt> 3:</dt><dd> first partial data chunk</dd> | <dt> 3:</dt><dd> first partial data chunk</dd> | |||
| <dt> 4:</dt><dd> middle partial data chunk</dd> | <dt> 4:</dt><dd> middle partial data chunk</dd> | |||
| <dt> 5:</dt><dd> last partial data chunk</dd> | <dt> 5:</dt><dd> last partial data chunk</dd> | |||
| <dt> 6:</dt><dd> footer metadata chunk</dd> | <dt> 6:</dt><dd> footer metadata chunk</dd> | |||
| <dt> 7:</dt><dd> global metadata chunk</dd> | <dt> 7:</dt><dd> global metadata chunk</dd> | |||
| <dt> 8:</dt><dd> repeat metadata chunk</dd> | <dt> 8:</dt><dd> repeat metadata chunk</dd> | |||
| <dt> 9:</dt><dd> central directory chunk</dd> | <dt> 9:</dt><dd> central directory chunk</dd> | |||
| <dt> 10:</dt><dd> final footer</dd> | <dt> 10:</dt><dd> final footer</dd> | |||
| </dl></dd></dl> | </dl></dd> | |||
| <t> | ||||
| if CHUNK_TYPE is not padding chunk, central directory or final | ||||
| footer:</t> | ||||
| <dt>If CHUNK_TYPE is not padding chunk, central directory, or final footer:</dt> | ||||
| <dd> | ||||
| <t><br/></t> | ||||
| <dl><dt> 1 byte:</dt><dd><t> CODEC:</t> | <dl><dt> 1 byte:</dt><dd><t> CODEC:</t> | |||
| <dl> | <dl spacing="compact"> | |||
| <dt>0:</dt><dd> uncompressed</dd> | <dt>0:</dt><dd> uncompressed</dd> | |||
| <dt> 1:</dt><dd> keep decoder</dd> | <dt> 1:</dt><dd> keep decoder</dd> | |||
| <dt> 2:</dt><dd> brotli</dd> | <dt> 2:</dt><dd> brotli</dd> | |||
| <dt> 3:</dt><dd> shared brotli</dd> | <dt> 3:</dt><dd> shared brotli</dd> | |||
| </dl> | </dl> | |||
| </dd></dl> | </dd> | |||
| </dl> | ||||
| <t>if CODEC is not "uncompressed":</t> | </dd> | |||
| <dt>If CODEC is not "uncompressed":</dt> | ||||
| <dd> | ||||
| <t><br/></t> | ||||
| <dl><dt> | <dl><dt> | |||
| varint:</dt><dd> uncompressed size in bytes of the data contained | varint:</dt><dd>Uncompressed size in bytes of the data contained | |||
| within the compressed stream | within the compressed stream. | |||
| </dd></dl> | </dd></dl></dd> | |||
| <t>if CODEC is "shared brotli":</t> | <dt>If CODEC is "shared brotli":</dt> | |||
| <dd><t><br/></t> | ||||
| <dl><dt> | <dl><dt> | |||
| 1 byte:</dt><dd><t> amount of dictionary references. Multiple dictionary | 1 byte:</dt><dd><t>Number of dictionary references. Multiple dictionary | |||
| references are possible with the following | references are possible with the following | |||
| restrictions: there can be maximum 1 serialized | restrictions: there can be 1 serialized | |||
| dictionary, and maximum 15 prefix dictionaries (a | dictionary and 15 prefix dictionaries maximum (a | |||
| serialized dictionary may already contain one of | serialized dictionary may already contain one of | |||
| those). Circular references are not allowed (any | those). Circular references are not allowed (any | |||
| dictionary reference that directly or indirectly | dictionary reference that directly or indirectly | |||
| uses this chunk itself as dictionary).</t></dd> | uses this chunk itself as dictionary).</t></dd> | |||
| </dl> | ||||
| <t> per dictionary reference:</t> | ||||
| <dl><dt>1 byte:</dt><dd><t> flags:</t> | ||||
| <dl><dt>bit 0 and 1:</dt><dd><t> dictionary source:</t> | ||||
| <dl><dt>00:</dt><dd> Internal dictionary reference to a full resource | <dt>Per dictionary reference:</dt> | |||
| <dd><t><br/></t> | ||||
| <dl><dt>1 byte:</dt><dd><t> Flags:</t> | ||||
| <dl><dt>bits 0 and 1:</dt><dd><t> Dictionary source:</t> | ||||
| <dl indent="5"><dt>00:</dt><dd> Internal dictionary reference to a full r | ||||
| esource | ||||
| by pointer, which can span one or more chunks. | by pointer, which can span one or more chunks. | |||
| Must point to a full data chunk or a first | Must point to a full data chunk or a first | |||
| partial data chunk.</dd> | partial data chunk.</dd> | |||
| <dt>01:</dt><dd> Internal dictionary reference to single c hunk | <dt>01:</dt><dd> Internal dictionary reference to single c hunk | |||
| contents by pointer. May point to any chunk with | contents by pointer. May point to any chunk with | |||
| content (data or metadata). If partial data | content (data or metadata). If a partial data | |||
| chunk, only this part is the dictionary. In this | chunk, only this part is the dictionary. In this | |||
| case, the dictionary type is not allowed to be a | case, the dictionary type is not allowed to be a | |||
| serialised dictionary. | serialized dictionary. | |||
| </dd> | </dd> | |||
| <dt>10:</dt><dd> Reference to a dictionary by hash code of a | <dt>10:</dt><dd> Reference to a dictionary by hash code of a | |||
| resource. The dictionary can come from an | resource. The dictionary can come from an | |||
| external source such as a different container. | external source, such as a different container. | |||
| The user of the decoder must be able to provide | The user of the decoder must be able to provide | |||
| the dictionary contents given its hash code (even | the dictionary contents given its hash code (even | |||
| if it comes from this container itself), or treat | if it comes from this container itself) or treat | |||
| it as an error when the user does not have it | it as an error when the user does not have it | |||
| available.</dd> | available.</dd> | |||
| <dt>11:</dt><dd> invalid bit combination</dd> | <dt>11:</dt><dd> Invalid bit combination</dd> | |||
| </dl> | ||||
| </dd> | ||||
| <dt> bit 2 and 3:</dt><dd> dictionary type:</d | ||||
| d> | ||||
| <dt>00:</dt><dd> <t>prefix dictionary, set in front of the sliding | ||||
| window</t> | ||||
| <dl> | ||||
| <dt> 01:</dt><dd> serialized dictionary in | ||||
| the shared brotli | ||||
| format as specified in section 5.</dd> | ||||
| <dt> | ||||
| 10:</dt><dd> invalid bit combination</dd> | ||||
| <dt>11:</dt><dd> invalid bit combination</dd> | ||||
| <dt>bit 4-7:</dt><dd> must be 0</dd> | ||||
| <dt>if hash-based:</dt> | ||||
| <dd> | ||||
| <dl><dt>1 byte:</dt><dd> type of hash used. Only supported value: 3, | ||||
| indicating 256-bit Highwayhash <xref target="HWYHASH" for | ||||
| mat="default"/>. | ||||
| </dd> | ||||
| </dl> | ||||
| </dd> | ||||
| </dl> | </dl> | |||
| </dd> | </dd> | |||
| <dt> 32 bytes:</dt><dd><t> 256-bit Highwayhash | <dt> bits 2 and 3:</dt><dd><t>Dictionary type: | |||
| checksum to refer to | </t> | |||
| dictionary.</t> | <dl indent="5"> | |||
| <dl> | <dt>00:</dt><dd> <t>Prefix dictionary, set in front of the sliding | |||
| <dt>if pointer based:</dt><dd> varint encoded pointer to | window</t></dd> | |||
| its | ||||
| chunk in this container. The chunk must come earlier | ||||
| in the container than the current chunk.</dd> | ||||
| <dt>X bytes:</dt><dd> extra header bytes, depending on | ||||
| CHUNK_TYPE. If present, | ||||
| they are specified in the subsequent sections. | ||||
| </dd> | ||||
| </dl> | ||||
| </dd> | ||||
| <dt>remaining bytes:</dt><dd> <t>the chunk contents. The | <dt>01:</dt><dd>Serialized dictionary in the shared brot | |||
| uncompressed data | li | |||
| format as specified in <xref target="sect-5"/>.</dd> | ||||
| <dt> | ||||
| 10:</dt><dd> Invalid bit combination</dd> | ||||
| <dt>11:</dt><dd> Invalid bit combination</dd></dl></dd> | ||||
| <dt>bits 4-7:</dt><dd> Must be 0</dd></dl></dd> | ||||
| <dt>If hash-based:</dt><dd><t><br/></t> | ||||
| <dl><dt>1 byte:</dt><dd>Type of hash used. Only supported value: 3, | ||||
| indicating 256-bit HighwayHash <xref target="HWYHASH" for | ||||
| mat="default"/>. | ||||
| </dd> | ||||
| <dt>32 bytes:</dt><dd><t> 256-bit HighwayHash checksum to re | ||||
| fer to | ||||
| dictionary.</t></dd></dl></dd> | ||||
| <dt>If pointer based:</dt><dd>Varint-encoded pointer to | ||||
| its | ||||
| chunk in this container. The chunk must come in the container ear | ||||
| lier | ||||
| than the current chunk.</dd></dl></dd></dl></dd> | ||||
| <dt>X bytes:</dt><dd>Extra header bytes, depending on | ||||
| CHUNK_TYPE. If present, | ||||
| they are specified in the subsequent sections.</dd> | ||||
| <dt>remaining bytes:</dt><dd> <t>The chunk contents. The | ||||
| uncompressed data | ||||
| in the chunk content depends on CHUNK_TYPE | in the chunk content depends on CHUNK_TYPE | |||
| and is specified in the subsequent sections. | and is specified in the subsequent sections. | |||
| The compressed data has following | The compressed data has following | |||
| format depending on CODEC:</t> | format depending on CODEC:</t> | |||
| <ul><li>uncompressed: the raw bytes</li> | <ul><li>uncompressed: The raw bytes.</li> | |||
| <li>if "keep decoder", the continuation of the compressed | <li>If "keep decoder", the continuation of the compressed | |||
| stream which was interrupted at the end of the previous | stream that was interrupted at the end of the previous | |||
| chunk. The decoder from the previous chunk must be used | chunk. The decoder from the previous chunk must be used | |||
| and its state it had at the end of the previous chunk | and its state it had at the end of the previous chunk | |||
| must be kept at the start of the decoding of this chunk. | must be kept at the start of the decoding of this chunk. | |||
| </li> | </li> | |||
| <li>brotli: the bytes are in brotli format | <li>brotli: The bytes are in brotli format | |||
| <xref target="RFC7932" format="default"/> | <xref target="RFC7932" format="default"/>. | |||
| </li> | </li> | |||
| <li>shared brotli: the bytes are in the | <li>shared brotli: The bytes are in the | |||
| shared brotli format specified in section | shared brotli format specified in <xref target="sect-7"/>.</li></ul></dd> | |||
| 7</li></ul></dd> | ||||
| </dl> | </dl> | |||
| </dd> | </section> | |||
| </dl> | ||||
| </section> | ||||
| <section anchor="sect-8.3" numbered="true" toc="default"> | <section anchor="sect-8.3" numbered="true" toc="default"> | |||
| <name>Metadata Format</name> | <name>Metadata Format</name> | |||
| <t>All the metadata chunk types use the following format for the | <t>All the metadata chunk types use the following format for the | |||
| uncompressed content:</t> | uncompressed content:</t> | |||
| <dl newline="true"> | <dl newline="true"> | |||
| <dt>Per field:</dt> | <dt>Per field:</dt> | |||
| <dd> | <dd> | |||
| <dl><dt>2 bytes:</dt> | <dl><dt>2 bytes:</dt> | |||
| <dd><t> code to identify this metadata field. This must be | <dd><t>Code to identify this metadata field. This must be | |||
| two lowercase or two uppercase alpha ascii | two lowercase or two uppercase alpha ASCII | |||
| characters. If the decoder encounters a lowercase | characters. If the decoder encounters a lowercase | |||
| field that it does not recognise for the current | field that it does not recognize for the current | |||
| chunk type, non-ascii characters or non-alpha | chunk type, non-ASCII characters, or non-alpha | |||
| characters, the decoder must reject the data stream | characters, the decoder must reject the data stream | |||
| as invalid. Uppercase codes may be used for custom | as invalid. Uppercase codes may be used for custom | |||
| user metadata and can be ignored by a compliant | user metadata and can be ignored by a compliant | |||
| decoder.</t></dd> | decoder.</t></dd> | |||
| <dt>varint:</dt> | <dt>varint:</dt> | |||
| <dd> <t>length of the content of this field in bytes, | <dd><t>Length of the content of this field in bytes, | |||
| excluding the code bytes and this varint</t> | excluding the code bytes and this varint.</t></dd> | |||
| <dl> | ||||
| <dt>N bytes:</dt> | <dt>N bytes:</dt> | |||
| <dd> the contents of this field</dd> | <dd>The contents of this field.</dd> | |||
| </dl> | </dl> | |||
| </dd> | </dd> | |||
| </dl> | </dl> | |||
| </dd> | ||||
| </dl> | ||||
| <t> | <t> | |||
| The last field is reached when the chunk content end is reached. If | The last field is reached when the chunk content end is reached. If | |||
| the length of the last field does not end at the same byte as the end | the length of the last field does not end at the same byte as the end | |||
| of the uncompressed content of the chunk, the decoder must reject the | of the uncompressed content of the chunk, the decoder must reject the | |||
| data stream as invalid.</t> | data stream as invalid.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4" numbered="true" toc="default"> | <section anchor="sect-8.4" numbered="true" toc="default"> | |||
| <name>Chunk Specifications</name> | <name>Chunk Specifications</name> | |||
| <section anchor="sect-8.4.1" numbered="true" toc="default"> | <section anchor="sect-8.4.1" numbered="true" toc="default"> | |||
| <name>Padding Chunk (Type 0)</name> | <name>Padding Chunk (Type 0)</name> | |||
| <t> | <t> | |||
| All bytes in this chunk must be zero, except for the initial varint | All bytes in this chunk must be zero except for the initial varint | |||
| that specifies the remaining chunk length.</t> | that specifies the remaining chunk length.</t> | |||
| <t> | <t> | |||
| Since the varint itself takes up bytes as well, when the goal is to | Since the varint itself takes up bytes as well, when the goal is to | |||
| introduce an amount of padding bytes, the dependence of the length of | introduce a number of padding bytes, the dependence of the length of | |||
| the varint on the value it encodes must be taken into account.</t> | the varint on the value it encodes must be taken into account.</t> | |||
| <t> | <t> | |||
| A single byte varint with value 0 is a padding chunk of length 1. | A single byte varint with a value of 0 is a padding chunk of length 1. | |||
| For more padding, use higher varint values. Do not use multiple | For more padding, use higher varint values. Do not use multiple | |||
| shorter padding chunks, since this is slower to decode.</t> | shorter padding chunks since this is slower to decode.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4.2" numbered="true" toc="default"> | <section anchor="sect-8.4.2" numbered="true" toc="default"> | |||
| <name>Metadata Chunk (Type 1)</name> | <name>Metadata Chunk (Type 1)</name> | |||
| <t> | <t> | |||
| This chunk contains metadata that applies to the resource whose | This chunk contains metadata that applies to the resource whose | |||
| beginning is encoded in the subsequent data chunk or first partial | beginning is encoded in the subsequent data chunk or first partial | |||
| data chunk.</t> | data chunk.</t> | |||
| <t> | <t> | |||
| The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> | The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> | |||
| <t>The following field types are recognised:</t> | <t>The following field types are recognized:</t> | |||
| <dl><dt>id:</dt><dd><t> name field. May appear 0 or 1 times. Has the following | <dl><dt>id (N bytes):</dt><dd>Name field. May appear 0 or 1 times. Has the follo | |||
| format:</t> | wing | |||
| <dl> | format: name in UTF-8 encoding, length | |||
| <dt> N bytes:</dt><dd><t> name in UTF-8 encoding, leng | determined by the field length. Treated generically but may | |||
| th determined by the | be used as a filename. If used as a filename, forward slashes | |||
| field length. Treated generically but may be used as | '/' should be used as directory separators, relative paths | |||
| filename. If used as filename, forward slashes '/' | should be used, and filenames ending in a slash with 0-length | |||
| should be used as directory separator, relative paths | content in the matching data chunk should be treated as an | |||
| should be used and filenames ending in a slash with | empty directory.</dd> | |||
| 0-length content in the matching data chunk should be | <dt>mt (8 bytes):</dt><dd>Modification type. May appear 0 or 1 | |||
| treated as an empty directory.</t> | times. Has the following format: contains microseconds since epoch, as a little | |||
| <dl> | -endian, signed two's complement 64-bit integer.</dd> | |||
| <dt>mt:</dt> <dd><t>modification type. May appear 0 or 1 tim | <dt>custom user field:</dt><dd>Any two uppercase ASCII characters.</dd> | |||
| es. Has the following format:</t> | </dl> | |||
| <dl> | ||||
| <dt>8 bytes:</dt><dd> microseconds since epoch, as a little endi | ||||
| an signed twos complement 64-bit integer</dd> | ||||
| <dt> custom user field:</dt><dd> any two uppercase ASCII characters.</dd> | ||||
| </dl> | ||||
| </dd> | ||||
| </dl> | ||||
| </dd> | ||||
| </dl> | ||||
| </dd> | ||||
| </dl> | ||||
| </section> | </section> | |||
| <section anchor="sect-8.4.3" numbered="true" toc="default"> | <section anchor="sect-8.4.3" numbered="true" toc="default"> | |||
| <name>Data Chunk (Type 2)</name> | <name>Data Chunk (Type 2)</name> | |||
| <t> | <t> | |||
| A data chunk contains the actual data of a resource.</t> | A data chunk contains the actual data of a resource.</t> | |||
| <t>This chunk has the following extra header bytes:</t> | <t>This chunk has the following extra header bytes:</t> | |||
| <dl> | <dl> | |||
| <dt>1 byte: </dt> <dd><t>flags: | ||||
| <dt>1 byte: </dt> <dd><t>Flags: | ||||
| </t> | </t> | |||
| <dl> | <dl> | |||
| <dt> bit 0:</dt><dd> if true, indicates this is not a r esource that should be | <dt> bit 0:</dt><dd> If true, indicates this is not a r esource that should be | |||
| output implicitly as part of extracting resources from | output implicitly as part of extracting resources from | |||
| this container. Instead, it may be referred to only | this container. Instead, it may be referred to only | |||
| explicitly, e.g. as a dictionary reference by hash code | explicitly, e.g., as a dictionary reference by hash code | |||
| or offset. This flag should be set for data used as | or offset. This flag should be set for data used as | |||
| dictionary to improve compression of actual resources.</dd> | dictionary to improve compression of actual resources.</dd> | |||
| <dt> | <dt> | |||
| bit 1:</dt><dd> if true, hash code is given</dd> | bit 1:</dt><dd>If true, hash code is given.</dd> | |||
| <dt> | <dt> | |||
| bits 2-7:</dt><dd> must be zero</dd></dl> | bits 2-7:</dt><dd>Must be zero.</dd></dl></dd> | |||
| <t>if hash code is given:</t> | <dt>If hash code is given:</dt><dd><t><br/></t> | |||
| <dl> | <dl> | |||
| <dt>1 byte:</dt><dd> type of hash used. Only supported value: 3, | <dt>1 byte:</dt><dd>Type of hash used. Only supported value: 3, | |||
| indicating 256-bit Highwayhash <xref target="HWYHASH" format="de | indicating 256-bit HighwayHash <xref target="HWYHASH" format="de | |||
| fault"/>. | fault"/>. | |||
| </dd> | </dd> | |||
| <dt> 32 bytes:</dt><dd> 256-bit Highwayhash checksum of t | <dt> 32 bytes:</dt><dd> 256-bit HighwayHash checksum of t | |||
| he uncompressed | he uncompressed | |||
| data</dd> | data.</dd> | |||
| </dl> | </dl> | |||
| </dd> | </dd> | |||
| </dl> | </dl> | |||
| <t> | <t> | |||
| The uncompressed content bytes of this chunk are the actual data of | The uncompressed content bytes of this chunk are the actual data of | |||
| the resource.</t> | the resource.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4.4" numbered="true" toc="default"> | <section anchor="sect-8.4.4" numbered="true" toc="default"> | |||
| <name>First Partial Data Chunk (Type 3)</name> | <name>First Partial Data Chunk (Type 3)</name> | |||
| <t> | <t> | |||
| This chunk contains partial data of a resource. This is the first | This chunk contains partial data of a resource. This is the first | |||
| chunk in a series containing the entire data of the resource.</t> | chunk in a series containing the entire data of the resource.</t> | |||
| <t> | <t> | |||
| The format of this chunk is the same as the format of a Data Chunk | The format of this chunk is the same as the format of a data chunk | |||
| (<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t> | (<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t> | |||
| <t> | <t> | |||
| The second bit of flags must be set to 0 and no hash code given.</t> | The second bit of flags must be set to 0 and no hash code given.</t> | |||
| <t> | <t> | |||
| The uncompressed data size is only of this part of the resource, not | The uncompressed data size is only of this part of the resource, not | |||
| of the full resource.</t> | of the full resource.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4.5" numbered="true" toc="default"> | <section anchor="sect-8.4.5" numbered="true" toc="default"> | |||
| <name>Middle Partial Data Chunk (Type 4)</name> | <name>Middle Partial Data Chunk (Type 4)</name> | |||
| <t> | <t> | |||
| This chunk contains partial data of a resource, and is neither the | This chunk contains partial data of a resource and is neither the | |||
| first nor the last part of the full resource.</t> | first nor the last part of the full resource.</t> | |||
| <t> | <t> | |||
| The format of this chunk is the same as the format of a Data Chunk | The format of this chunk is the same as the format of a data chunk | |||
| (<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t> | (<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t> | |||
| <t> | <t> | |||
| The first and second bits of flags must be set to 0.</t> | The first and second bits of flags must be set to 0.</t> | |||
| <t> | <t> | |||
| The uncompressed data size is only of this part of the resource, not | The uncompressed data size is only of this part of the resource, not | |||
| of the full resource.</t> | of the full resource.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4.6" numbered="true" toc="default"> | <section anchor="sect-8.4.6" numbered="true" toc="default"> | |||
| <name>Last Partial Data Chunk (Type 5)</name> | <name>Last Partial Data Chunk (Type 5)</name> | |||
| <t> | <t> | |||
| This chunk contains the final piece of partial data of a resource.</t> | This chunk contains the final piece of partial data of a resource.</t> | |||
| <t> | <t> | |||
| The format of this chunk is the same as the format of a Data Chunk | The format of this chunk is the same as the format of a data chunk | |||
| (<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t> | (<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t> | |||
| <t> | <t> | |||
| The first bit of the flags must be set to 0.</t> | The first bit of flags must be set to 0.</t> | |||
| <t> | <t> | |||
| If a hash code is given, the hash code of the full resource | If a hash code is given, the hash code of the full resource | |||
| (concatenated from all previous chunks and this chunk) is given in | (concatenated from all previous chunks and this chunk) is given in | |||
| this chunk.</t> | this chunk.</t> | |||
| <t> | <t> | |||
| The uncompressed data size is only of this part of the resource, not | The uncompressed data size is only of this part of the resource, not | |||
| of the full resource.</t> | of the full resource.</t> | |||
| <t> | <t> | |||
| The type of this chunk indicates that there are no further chunk | The type of this chunk indicates that there are no further chunk | |||
| encoding this resource, so the full resource is now known.</t> | encoding this resource, so the full resource is now known.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4.7" numbered="true" toc="default"> | <section anchor="sect-8.4.7" numbered="true" toc="default"> | |||
| <name>Footer Metadata Chunk (Type 6)</name> | <name>Footer Metadata Chunk (Type 6)</name> | |||
| <t> | <t> | |||
| This metadata applies to the resource whose encoding ended in the | This metadata applies to the resource whose encoding ended in the | |||
| preceding data chunk or last partial data chunk.</t> | preceding data chunk or last partial data chunk.</t> | |||
| <t> | <t> | |||
| The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> | The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> | |||
| <t> | <t> | |||
| There are no lowercase field types defined for global metadata. | There are no lowercase field types defined for footer metadata. | |||
| Uppercase field types can be used as custom user data.</t> | Uppercase field types can be used as custom user data.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4.8" numbered="true" toc="default"> | <section anchor="sect-8.4.8" numbered="true" toc="default"> | |||
| <name>Global Metadata Chunk (Type 7)</name> | <name>Global Metadata Chunk (Type 7)</name> | |||
| <t> | <t> | |||
| This metadata applies to the whole container instead of a single | This metadata applies to the whole container instead of a single | |||
| resource.</t> | resource.</t> | |||
| <t> | <t> | |||
| The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> | The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> | |||
| <t> | <t> | |||
| There are no lowercase field types defined for footer metadata. | There are no lowercase field types defined for global metadata. | |||
| Uppercase field types can be used as custom user data.</t> | Uppercase field types can be used as custom user data.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4.9" numbered="true" toc="default"> | <section anchor="sect-8.4.9" numbered="true" toc="default"> | |||
| <name>Repeat Metadata Chunk (Type 8)</name> | <name>Repeat Metadata Chunk (Type 8)</name> | |||
| <t> | <t> | |||
| These chunks optionally repeat metadata that is interleaved between | These chunks optionally repeat metadata that is interleaved between | |||
| data chunks. To use these chunks, it is necessary to also read | data chunks. To use these chunks, it is necessary to also read | |||
| additional information, such as pointers to the original chunks, from | additional information, such as pointers to the original chunks, from | |||
| the central directory.</t> | the central directory.</t> | |||
| <t> | <t> | |||
| The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> | The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> | |||
| <t>This chunk has an extra header byte:</t> | <t>This chunk has an extra header byte:</t> | |||
| <dl> <dt> | <dl> <dt> | |||
| 1 byte:</dt><dd> chunk type of repeated chunk (metadata chunk | 1 byte:</dt><dd>Chunk type of repeated chunk (metadata chunk | |||
| or footer metadata chunk) | or footer metadata chunk). | |||
| </dd></dl> | </dd></dl> | |||
| <t>This set of chunks must follow the following restrictions:</t> | <t>This set of chunks must follow the following restrictions:</t> | |||
| <ul><li> | <ul><li> | |||
| It is optional whether or not repeat metadata chunks are | It is optional whether or not repeat metadata chunks are | |||
| present.</li> | present.</li> | |||
| <li>If they are present, then they must be present for all | <li>If they are present, then they must be present for all | |||
| metadata chunks and footer metadata chunks. | metadata chunks and footer metadata chunks. | |||
| </li> | </li> | |||
| <li>There may be only 1 repeat metadata chunk per repeated metad ata chunk.</li> | <li>There may be only 1 repeat metadata chunk per repeated metad ata chunk.</li> | |||
| <li>They must appear in the same order as the chunks appear in t he container, which is also the same order as listed in the | <li>They must appear in the same order as the chunks appear in t he container, which is also the same order as listed in the | |||
| central directory. | central directory. | |||
| </li> | </li> | |||
| <li>Compression of these chunks is allowed, however it is not al lowed | <li>Compression of these chunks is allowed; however, it is not a llowed | |||
| to use any internal dictionary except an earlier repeat | to use any internal dictionary except an earlier repeat | |||
| metadata chunk of this series, and it is not allowed for a | metadata chunk of this series, and it is not allowed for a | |||
| metadata chunk to keep the decoder state if the previous chunk | metadata chunk to keep the decoder state if the previous chunk | |||
| is not a repeat metadata chunk. That is, the series of | is not a repeat metadata chunk. That is, the series of | |||
| metadata chunks must be decompressible without using other | metadata chunks must be decompressible without using other | |||
| chunks of the framing format file. | chunks of the framing format file. | |||
| </li> | </li> | |||
| </ul> | </ul> | |||
| <t> | <t> | |||
| The fields contained in this metadata chunk must follow the following | The fields contained in this metadata chunk must follow the following | |||
| restrictions:</t> | restrictions:</t> | |||
| <ul> | <ul> | |||
| <li>If a field is present, it must | <li>If a field is present, it must | |||
| exactly match the corresponding field of the copied chunk.</li> | exactly match the corresponding field of the copied chunk.</li> | |||
| <li>It is allowed to leave out a field that is present | <li>It is allowed to leave out a field that is present | |||
| in the copied chunk. | in the copied chunk. | |||
| </li> | </li> | |||
| <li>If a field is present, then it must be present in *all* othe r | <li>If a field is present, then it must be present in <strong>all</strong> other | |||
| repeat metadata chunks when the copied chunk contains this | repeat metadata chunks when the copied chunk contains this | |||
| field. In other words, if you know you can get the name field | field. In other words, if you know you can get the name field | |||
| from a repeat chunk, you know that you will be able to get all | from a repeat chunk, you know that you will be able to get all | |||
| names of all resources from all repeat chunks. | names of all resources from all repeat chunks. | |||
| </li> | </li> | |||
| </ul> | </ul> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4.10" numbered="true" toc="default"> | <section anchor="sect-8.4.10" numbered="true" toc="default"> | |||
| <name>Central Directory Chunk (Type 9)</name> | <name>Central Directory Chunk (Type 9)</name> | |||
| <t> | <t> | |||
| The central directory chunk, along with the repeat metadata chunks, | The central directory chunk along with the repeat metadata chunks | |||
| allow to quickly find and list compressed resources in the container | allow quickly finding and listing compressed resources in the container | |||
| file.</t> | file.</t> | |||
| <t> | <t> | |||
| The central directory chunk is always uncompressed and does not have | The central directory chunk is always uncompressed and does not have | |||
| the codec byte. It instead has the following format:</t> | the codec byte. It instead has the following format:</t> | |||
| <dl> | <dl> | |||
| <dt>varint:</dt><dd> <t>pointer into the file where the repeat m | <dt>varint:</dt><dd> <t>Pointer into the file where the repeat m | |||
| etadata chunks are located, or 0 if they are not present per chunk listed:</t> | etadata chunks are located or 0 if they are not present.</t></dd> | |||
| <dt>per chunk listed:</dt><dd><t><br/></t> | ||||
| <dl> <dt> | <dl> <dt> | |||
| varint:</dt><dd> pointer into the file where this chunk begins</dd> | varint:</dt><dd>Pointer into the file where this chunk begins.</dd> | |||
| <dt>varint:</dt><dd> amount of header bytes N used below</dd> | <dt>varint:</dt><dd>Number of header bytes N used below.</dd> | |||
| <dt>N bytes:</dt><dd> copy of all the header bytes of the poin | <dt>N bytes:</dt><dd>Copy of all the header bytes of the point | |||
| ted at chunk, | ed at chunk, | |||
| including total size, chunk type byte, codec, | including total size, chunk type byte, codec, | |||
| uncompressed size, dictionary references, X extra | uncompressed size, dictionary references, and X extra | |||
| header bytes. The content is not repeated here. | header bytes. The content is not repeated here. | |||
| </dd> | </dd> | |||
| </dl> | </dl> | |||
| </dd> | </dd> | |||
| </dl> | </dl> | |||
| <t> | <t> | |||
| The last listed chunk is reached when the end of the contents of the | The last listed chunk is reached when the end of the contents of the | |||
| central directory are reached. If the end does not match the last | central directory are reached. If the end does not match the last | |||
| byte of the central directory, the decoder must reject the data | byte of the central directory, the decoder must reject the data | |||
| stream as invalid.</t> | stream as invalid.</t> | |||
| <t> | <t> | |||
| If present, the central directory must list all data and metadata | If present, the central directory must list all data and metadata | |||
| chunks of all types.</t> | chunks of all types.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4.11" numbered="true" toc="default"> | <section anchor="sect-8.4.11" numbered="true" toc="default"> | |||
| <name>Final Footer Chunk (Type 10)</name> | <name>Final Footer Chunk (Type 10)</name> | |||
| <t> | <t> | |||
| Chunk that closes the file, only present if in the initial container | ||||
| header flags bit 2 was set.</t> | ||||
| <t>This chunk has the following content, always uncompressed:</t> | The final footer chunk closes the file and is only present if bit 2 of the | |||
| initial container flags was set.</t> | ||||
| <t>This chunk has the following content, which is always uncompresse | ||||
| d:</t> | ||||
| <dl> | <dl> | |||
| <dt> | <dt> | |||
| reversed varint:</dt><dd><t> size of this entire framing format file, | reversed varint:</dt><dd><t>Size of this entire framing format file, | |||
| including these bytes themselves, or 0 if this | including these bytes themselves, or 0 if this | |||
| size is not given</t> | size is not given.</t></dd> | |||
| <dl> | <dt>reversed varint:</dt><dd>Pointer to the start of the central | |||
| <dt>reversed varint:</dt><dd> pointer to the start of the centra | directory, or 0 if there is none. | |||
| l directory,or 0 if there is none | ||||
| </dd> | </dd> | |||
| </dl> | </dl> | |||
| </dd> | ||||
| </dl> | ||||
| <t> | <t> | |||
| A reversed varint has the same format as a varint, but has its bytes | A reversed varint has the same format as a varint but its bytes | |||
| in reversed order and is designed to be parsed from end of file | are in reversed order, and it is designed to be parsed from the end of the fi | |||
| le | ||||
| towards the beginning.</t> | towards the beginning.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-8.4.12" numbered="true" toc="default"> | <section anchor="sect-8.4.12" numbered="true" toc="default"> | |||
| <name>Chunk ordering</name> | <name>Chunk Ordering</name> | |||
| <t> | <t> | |||
| The chunk ordering must follow the rules described below, if the | The chunk ordering must follow the rules described below. If the | |||
| decoder sees otherwise, it must reject the data stream as invalid.</t> | decoder sees otherwise, it must reject the data stream as invalid.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| Padding chunks may be inserted anywhere, even between chunks for | Padding chunks may be inserted anywhere, even between chunks for | |||
| which the rules below say no other chunk types may come in | which the rules below say no other chunk types may come in | |||
| between.</t> | between.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| Metadata chunks must come immediately before the Data chunks of | Metadata chunks must come immediately before the data chunks of | |||
| the resource they apply to.</t> | the resource they apply to.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| Footer metadata chunks must come immediately after the Data | Footer metadata chunks must come immediately after the data | |||
| chunks of the resource they apply to.</t> | chunks of the resource they apply to.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| There may be only 0 or 1 metadata chunks per resource.</t> | There may be only 0 or 1 metadata chunks per resource.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| There may be only 0 or 1 footer metadata chunks per resource.</t> | There may be only 0 or 1 footer metadata chunks per resource.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| A resource must exist out of either 1 data chunk, or 1 first | A resource must exist out of either 1 data chunk or 1 first | |||
| partial data chunk, 0 or more middle partial data | partial data chunk, 0 or more middle partial data | |||
| chunks, and 1 last partial data chunk, in that order.</t> | chunks, and 1 last partial data chunk, in that order.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| Repeat metadata chunks must follow the rules of section 8.4.9.</t> | Repeat metadata chunks must follow the rules of <xref target="sect-8.4.9"/ >.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| There may be only 0 or 1 central directory chunks.</t> | There may be only 0 or 1 central directory chunks.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| If bit 2 of the container flags is set, there may be only a | If bit 2 of the container flags is set, there may be only a | |||
| single resource, no metadata chunks of any type, no central | single resource, no metadata chunks of any type, no central | |||
| directory, and no final footer.</t> | directory, and no final footer.</t> | |||
| <t indent="3"> | <t indent="3"> | |||
| If bit 2 of the container flags is not set, there must be exactly | If bit 2 of the container flags is not set, there must be exactly | |||
| 1 final footer chunk and it must be the last chunk in the file.</t> | 1 final footer chunk, and it must be the last chunk in the file.</t> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section anchor="sect-9" numbered="true" toc="default"> | <section anchor="sect-9" numbered="true" toc="default"> | |||
| <name>Security Considerations</name> | <name>Security Considerations</name> | |||
| <t> | <t> | |||
| The security considerations for brotli <xref target="RFC7932" format="default "/> apply to shared | The security considerations for brotli <xref target="RFC7932" format="default "/> apply to shared | |||
| brotli as well.</t> | brotli as well.</t> | |||
| <t> | <t> | |||
| In addition, the same considerations apply to the decoding of new | In addition, the same considerations apply to the decoding of new | |||
| file format streams for shared brotli, including shared dictionaries, | file format streams for shared brotli, including shared dictionaries, | |||
| the framing format and the shared brotli format.</t> | the framing format, and the shared brotli format.</t> | |||
| <t> | <t> | |||
| The dictionary must be treated with the same security precautions as | The dictionary must be treated with the same security precautions as | |||
| the content, because a change to the dictionary can result in a | the content because a change to the dictionary can result in a | |||
| change to the decompressed content.</t> | change to the decompressed content.</t> | |||
| <t> | <t> | |||
| The CRIME attack <xref target="CRIME" format="default"/> shows that it's a ba d idea to compress data | The CRIME attack <xref target="CRIME" format="default"/> shows that it's a ba d idea to compress data | |||
| from mixed (e.g. public and private) sources -- the data sources | from mixed (e.g., public and private) sources -- the data sources | |||
| include not only the compressed data but also the dictionaries. For | include not only the compressed data but also the dictionaries. For | |||
| example, if you compress secret cookies using a public-data-only | example, if you compress secret cookies using a public-data-only | |||
| dictionary, you still leak information about the cookies.</t> | dictionary, you still leak information about the cookies.</t> | |||
| <t> | <t>The dictionary can reveal information about the compressed data and vic | |||
| Not only can the dictionary reveal information about the compressed | e versa. That is, data compressed with the dictionary can reveal contents of the | |||
| data, but vice versa, data compressed with the dictionary can reveal | dictionary when an adversary can control parts of the data to compress and see | |||
| the contents of the dictionary when an adversary can control parts of | the compressed size. On the other hand, if | |||
| data to compress and see the compressed size. On the other hand, if | ||||
| the adversary can control the dictionary, the adversary can learn | the adversary can control the dictionary, the adversary can learn | |||
| information about the compressed data.</t> | information about the compressed data.</t> | |||
| <t> | <t> | |||
| The most robust defense against CRIME is not to compress private data | The most robust defense against CRIME is not to compress private data, e.g., | |||
| (e.g., sensitive headers like cookies or any content with PII). The | sensitive headers like cookies or any content with personally identifiable infor | |||
| challenge has been to identify secrets within a vast amount of to be | mation (PII). The | |||
| compressed data. Cloudflare uses a regular expression <xref target="CLOUDFLAR | challenge has been to identify secrets within a vast amount of data to be com | |||
| E" format="default"/>. | pressed. | |||
| Cloudflare uses a regular expression <xref target="CLOUDFLARE" format="defaul | ||||
| t"/>. | ||||
| Another idea is to extend existing web template systems (e.g., Soy | Another idea is to extend existing web template systems (e.g., Soy | |||
| <xref target="SOY" format="default"/>) to allow developers to mark secrets th at must not be | <xref target="SOY" format="default"/>) to allow developers to mark secrets th at must not be | |||
| compressed.</t> | compressed.</t> | |||
| <t> | <t> | |||
| A less robust idea, but easier to implement, is to randomize the | A less robust idea, but easier to implement, is to randomize the | |||
| compression algorithm, i.e., adding randomly generated padding, | compression algorithm, i.e., adding randomly generated padding, | |||
| varying the compression ratio, etc. The tricky part is to find the | varying the compression ratio, etc. The tricky part is to find the | |||
| right balance between cost and security, i.e., on one hand we don't | right balance between cost and security (i.e., on one hand, we don't | |||
| want to add too much padding because it adds a cost to data, on the | want to add too much padding because it adds a cost to data, but on the | |||
| other hand we don't want to add too little because the adversary can | other hand, we don't want to add too little because the adversary can | |||
| detect a small amount of padding with traffic analysis.</t> | detect a small amount of padding with traffic analysis).</t> | |||
| <t> | <t> | |||
| Another defense in addition is to not use dictionaries for cross- | Additionally, another defense is to not use dictionaries for cross- | |||
| domain requests, and only use shared brotli for the response when the | domain requests and to only use shared brotli for the response when the | |||
| origin is the same as where the content is hosted (using CORS). This | origin is the same as where the content is hosted (using CORS). This | |||
| prevents an adversary from using a private dictionary with user | prevents an adversary from using a private dictionary with user | |||
| secrets to compress content hosted on the adversary's origin. It | secrets to compress content hosted on the adversary's origin. It | |||
| also helps prevent CRIME attacks that try to benefit from a public | also helps prevent CRIME attacks that try to benefit from a public | |||
| dictionary by preventing data compression with dictionaries for | dictionary by preventing data compression with dictionaries for | |||
| requests that do not originate from the host itself.</t> | requests that do not originate from the host itself.</t> | |||
| <t> | <t> | |||
| The content of the dictionary itself should not be affected by | The content of the dictionary itself should not be affected by | |||
| external users, allowing adversaries to control the dictionary allows | external users; allowing adversaries to control the dictionary allows | |||
| a form of chosen plaintext attack. Instead, only base the dictionary | a form of chosen plaintext attack. Instead, only base the dictionary | |||
| on content you control or generic large scale content such as a | on content you control or generic large scale content such as a | |||
| spoken language, and update the dictionary with large time intervals | spoken language and update the dictionary with large time intervals | |||
| (days, not seconds) to prevent fast probing.</t> | (days, not seconds) to prevent fast probing.</t> | |||
| <t> | <t> | |||
| The use of Highwayhash <xref target="HWYHASH" format="default"/> for dictiona ry identifiers does not | The use of HighwayHash <xref target="HWYHASH" format="default"/> for dictiona ry identifiers does not | |||
| guarantee against collisions in an adversarial environment and is | guarantee against collisions in an adversarial environment and is | |||
| intended to be used for identifying the dictionary within a trusted, | intended to be used for identifying the dictionary within a trusted, | |||
| known set of dictionaries. In an adversarial environment, users of | known set of dictionaries. In an adversarial environment, users of | |||
| shared brotli should use another mechanism to validate a negotiated | shared brotli should use another mechanism to validate a negotiated | |||
| dictionary, such as using a cryptographically-proven secure hash.</t> | dictionary such as a cryptographically proven secure hash.</t> | |||
| </section> | </section> | |||
| <section anchor="sect-10" numbered="true" toc="default"> | <section anchor="sect-10" numbered="true" toc="default"> | |||
| <name>IANA Considerations</name> | <name>IANA Considerations</name> | |||
| <t> | <t> | |||
| This document has no IANA actions.</t> | This document has no IANA actions.</t> | |||
| </section> | </section> | |||
| </middle> | </middle> | |||
| <back> | <back> | |||
| <references> | <references> | |||
| <name>References</name> | <name>References</name> | |||
| <references> | <references> | |||
| <name>Normative References</name> | <name>Normative References</name> | |||
| <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7 932.xml"/> | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7 932.xml"/> | |||
| <reference anchor="HWYHASH" target="https://arxiv.org/abs/1612.06257"> | <reference anchor="HWYHASH" target="https://arxiv.org/abs/1612.06257"> | |||
| <front> | <front> | |||
| <title>Fast keyed hash/pseudo-random function using SIMD multiply an d permute</title> | <title>Fast keyed hash/pseudo-random function using SIMD multiply an d permute</title> | |||
| <author><organization> | <author fullname="Jyrki Alakuijala"/> | |||
| Alakuijala, J., Cox, B., Wassenberg, J.</organization> | <author fullname="Bill Cox"/> | |||
| </author> | <author fullname="Jan Wassenberg"/> | |||
| <date month="February" year="2017"/> | ||||
| <date/> | ||||
| </front> | </front> | |||
| <seriesInfo name="DOI" value="10.48550/arXiv.1612.06257"/> | ||||
| </reference> | </reference> | |||
| </references> | </references> | |||
| <references> | <references> | |||
| <name>Informative References</name> | <name>Informative References</name> | |||
| <reference anchor="LZ77"> | <reference anchor="LZ77"> | |||
| <front> | <front> | |||
| <title>A Universal Algorithm for Sequential Data Compression</title> | <title>A Universal Algorithm for Sequential Data Compression</title> | |||
| <author initials="J." surname="Ziv" fullname="J. Ziv"/> | <author initials="J." surname="Ziv" fullname="J. Ziv"/> | |||
| <author initials="A." surname="Lempel" fullname="A. Lempel"/> | <author initials="A." surname="Lempel" fullname="A. Lempel"/> | |||
| <date month="May" year="1997"/> | <date month="May" year="1977"/> | |||
| </front> | </front> | |||
| <refcontent>IEEE Transactions on Information Theory. 23 (3): 337-343</refcontent | <seriesInfo name="DOI" value="10.1109/TIT.1977.1055714"/> | |||
| > | <refcontent>IEEE Transactions on Information Theory, vol. 23, no. 3, p | |||
| p. 337-343</refcontent> | ||||
| </reference> | </reference> | |||
| <reference anchor="CLOUDFLARE" target="https://blog.cloudflare.com/a-sol ution-to-compression-oracles-on-the-web/"> | <reference anchor="CLOUDFLARE" target="https://blog.cloudflare.com/a-sol ution-to-compression-oracles-on-the-web/"> | |||
| <front> | <front> | |||
| <title/> | <title>A Solution to Compression Oracles on the Web</title> | |||
| <author> | <author fullname="Blake Loring"/> | |||
| </author> | <date day="27" month="March" year="2018"/> | |||
| <date/> | ||||
| </front> | </front> | |||
| <refcontent>The Cloudflare Blog</refcontent> | ||||
| </reference> | </reference> | |||
| <reference anchor="SOY" target="https://developers.google.com/closure/te | ||||
| mplates/"> | <reference anchor="SOY" target="https://developers.google.com/closure"> | |||
| <front> | <front> | |||
| <title/> | <title>Closure Tools</title> | |||
| <author> | <author> | |||
| </author> | <organization>Google Developers</organization> | |||
| </author> | ||||
| <date/> | <date/> | |||
| </front> | </front> | |||
| </reference> | </reference> | |||
| <reference anchor="CRIME" target="https://www.cve.org/CVERecord?id=CVE-2 012-4929"> | <reference anchor="CRIME" target="https://www.cve.org/CVERecord?id=CVE-2 012-4929"> | |||
| <front> | <front> | |||
| <title/> | <title>CVE-2012-4929</title> | |||
| <author> | <author> | |||
| <organization>CVE Program</organization> | ||||
| </author> | </author> | |||
| <date/> | <date/> | |||
| </front> | </front> | |||
| </reference> | </reference> | |||
| </references> | </references> | |||
| </references> | </references> | |||
| <section numbered="false" anchor="acknowledgments" toc="default"> | <section numbered="false" anchor="acknowledgments" toc="default"> | |||
| <name>Acknowledgments</name> | <name>Acknowledgments</name> | |||
| <t> | <t> | |||
| The authors would like to thank Robert Obryk for suggesting | The authors would like to thank <contact fullname="Robert Obryk"/> for sugges ting | |||
| improvements to the format and the text of the specification.</t> | improvements to the format and the text of the specification.</t> | |||
| </section> | </section> | |||
| </back> | ||||
| </rfc> | </back> </rfc> | |||
| End of changes. 214 change blocks. | ||||
| 455 lines changed or deleted | 471 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||