Update README to describe whitespace minification; do not destroy whole whitespace in content-first elements

This commit is contained in:
Wilson Lin 2019-12-28 17:15:23 +11:00
parent 5f8da411b3
commit 492eb64e4f
4 changed files with 128 additions and 32 deletions

132
README.md
View File

@ -18,13 +18,24 @@ hyperbuild --src /path/to/src.html --out /path/to/output.min.html
### Whitespace
hyperbuild has advanced whitespace minification that can allow strategies such as:
hyperbuild has advanced context-aware whitespace minification that does things such as:
- Leave whitespace untouched in `pre` and `code`, which are whitespace sensitive.
- Trim and collapse whitespace in content tags, as whitespace is collapsed anyway when rendered.
- Remove whitespace in layout tags, which allows the use of inline layouts while keeping formatted code.
#### Collapsing whitespace
#### Methods
There are three whitespace minification methods. When processing text content, hyperbuild chooses which ones to use depending on the containing element.
<details>
<summary>
##### Collapse whitespace
</summary>
> **Applies to:** text in root and any element except [whitespace sensitive](./src/spec/tag/wss.rs) elements.
Reduce a sequence of whitespace characters in text nodes to a single space (U+0020).
@ -45,13 +56,19 @@ Reduce a sequence of whitespace characters in text nodes to a single space (U+00
```
</table>
</details>
#### Destroying whole whitespace
<details>
<summary>
##### Destroy whole whitespace
</summary>
> **Applies to:** text in root and any element except [whitespace sensitive](./src/spec/tag/wss.rs), [content](./src/spec/tag/content.rs), [content-first](./src/spec/tag/contentfirst.rs), and [formatting](./src/spec/tag/formatting.rs) elements.
Remove any text nodes that only consist of whitespace characters.
Especially useful when using `display: inline-block` so that whitespace between elements (e.g. indentation) does not alter layout and styling.
<table><thead><tr><th>Before<th>After<tbody><tr><td>
```html
@ -65,20 +82,24 @@ Especially useful when using `display: inline-block` so that whitespace between
<td>
```html
<ul><li>A</li><li>B</li><li>C</li></ul>
<ul>
··<li>A</li><li>B</li><li>C</li>
</ul>
```
</table>
</details>
#### Trimming whitespace
<details>
<summary>
Remove any whitespace from the start and end of a tag, if the first and/or last node is a text node.
##### Trim whitespace
Useful when combined with whitespace collapsing.
</summary>
Other whitespace between text nodes and tags are not removed, as it is not recommended to mix non-formatting tags with raw text.
> **Applies to:** text in root and any element except [whitespace sensitive](./src/spec/tag/wss.rs) and [formatting](./src/spec/tag/formatting.rs) elements.
Basically, a tag should only either contain text and [formatting tags](#formatting-tags), or only non-formatting tags.
Remove any leading/trailing whitespace from any leading/trailing text nodes of a tag.
<table><thead><tr><th>Before<th>After<tbody><tr><td>
@ -86,7 +107,7 @@ Basically, a tag should only either contain text and [formatting tags](#formatti
<p>
··Hey,·I·<em>just</em>·found↵
··out·about·this·<strong>cool</strong>·website!↵
··<div></div>
··<sup>[1]</sup>
</p>
```
@ -95,10 +116,95 @@ Basically, a tag should only either contain text and [formatting tags](#formatti
```html
<p>Hey,·I·<em>just</em>·found↵
··out·about·this·<strong>cool</strong>·website!↵
··<div></div></p>
··<sup>[1]</sup></p>
```
</table>
</details>
#### Element types
hyperbuild groups elements based on how it assumes they are used. By making these assumptions, it can apply optimal whitespace minification strategies.
|Group|Elements|Expected children|
|---|---|---|
|[Formatting](#formatting-elements)|`a`, `strong`, [and others](./src/spec/tag/formatting.rs)|Formatting elements, text.|
|[Content](#content-elements)|`h1`, `p`, [and others](./src/spec/tag/content.rs)|Formatting elements, text.|
|[Layout](#layout-elements)|`div`, `ul`, [and others](./src/spec/tag/layout.rs)|Layout elements, content elements.|
|[Content-first](#content-first-elements)|`label`, `li`, [and others](./src/spec/tag/contentfirst.rs)|Like content element but could have exactly one of an layout element's expected content elements.|
##### Formatting elements
> Whitespace is collapsed.
Formatting elements are usually inline elements that wrap around part of some text in a content element, so its whitespace isn't trimmed as they're probably part of the content.
##### Content elements
> Whitespace is trimmed and collapsed.
Content elements usually represent a contiguous and complete unit of content such as a paragraph. As such, whitespace is significant but sequences of them are most likely due to formatting.
###### Before
```html
<p>
··Hey,·I·<em>just</em>·found↵
··out·about·this·<strong>cool</strong>·website!↵
··<sup>[1]</sup>
</p>
```
###### After
```html
<p>Hey,·I·<em>just</em>·found·out·about·this·<strong>cool</strong>·website!·<sup>[1]</sup></p>
```
##### Layout elements
> Whitespace is trimmed and collapsed. [Whole whitespace](#destroy-whole-whitespace) is removed.
These elements should only contain other elements and no text. This makes it possible to [remove whole whitespace](#destroy-whole-whitespace), which is useful when using `display: inline-block` so that whitespace between elements (e.g. indentation) does not alter layout and styling.
###### Before
```html
<ul>
··<li>A</li>
··<li>B</li>
··<li>C</li>
</ul>
```
###### After
```html
<ul><li>A</li><li>B</li><li>C</li></ul>
```
##### Content-first elements
> Whitespace is trimmed and collapsed.
These elements are usually like [content elements](#content-elements) but are occasionally used like a layout element with one child. Whole whitespace is not removed as it might contain content, but this is OK for using as layout as there is only one child and whitespace is trimmed.
###### Before
```html
<li>
··<article>
····<section></section>
····<section></section>
··</article>
</li>
```
###### After
```html
<li><article><section></section><section></section></article></li>
```
### Attributes

View File

@ -18,7 +18,7 @@ pub static CONTENT_TAGS: Set<&'static [u8]> = phf_set! {
b"object",
b"option",
b"p",
b"summary", // Can also contain a heading.
b"summary",
b"textarea",
b"video",
};

View File

@ -1,12 +1,6 @@
pub mod content;
pub mod contentfirst;
pub mod formatting;
pub mod heading;
pub mod html;
pub mod layout;
pub mod media;
pub mod sectioning;
pub mod specific;
pub mod svg;
pub mod void;
pub mod wss;

View File

@ -8,6 +8,7 @@ use crate::unit::bang::process_bang;
use crate::unit::comment::process_comment;
use crate::unit::entity::{EntityType, maybe_process_entity};
use crate::unit::tag::process_tag;
use crate::spec::tag::contentfirst::CONTENT_FIRST_TAGS;
#[derive(Copy, Clone, PartialEq, Eq, Debug)]
enum ContentType {
@ -63,27 +64,22 @@ impl ContentType {
}
pub fn process_content(proc: &mut Processor, parent: Option<ProcessorRange>) -> ProcessingResult<()> {
let should_collapse_whitespace = match parent {
let collapse_whitespace = match parent {
Some(tag_name) => !WSS_TAGS.contains(&proc[tag_name]),
// Should collapse whitespace for root content.
None => true,
};
let should_destroy_whole_whitespace = match parent {
Some(tag_name) => !WSS_TAGS.contains(&proc[tag_name]) && !CONTENT_TAGS.contains(&proc[tag_name]) && !FORMATTING_TAGS.contains(&proc[tag_name]),
let destroy_whole_whitespace = match parent {
Some(tag_name) => !WSS_TAGS.contains(&proc[tag_name]) && !CONTENT_TAGS.contains(&proc[tag_name]) && !CONTENT_FIRST_TAGS.contains(&proc[tag_name]) && !FORMATTING_TAGS.contains(&proc[tag_name]),
// Should destroy whole whitespace for root content.
None => true,
};
let should_trim_whitespace = match parent {
let trim_whitespace = match parent {
Some(tag_name) => !WSS_TAGS.contains(&proc[tag_name]) && !FORMATTING_TAGS.contains(&proc[tag_name]),
// Should trim whitespace for root content.
None => true,
};
// Trim leading whitespace if configured to do so.
if should_trim_whitespace {
chain!(proc.match_while_pred(is_whitespace).discard());
};
let mut last_non_whitespace_content_type = ContentType::Start;
// Whether or not currently in whitespace.
let mut whitespace_checkpoint_opt: Option<Checkpoint> = None;
@ -128,13 +124,13 @@ pub fn process_content(proc: &mut Processor, parent: Option<ProcessorRange>) ->
// Next character is not whitespace, so handle any previously ignored whitespace.
if let Some(ws) = whitespace_checkpoint_opt {
if should_destroy_whole_whitespace && last_non_whitespace_content_type.is_comment_bang_opening_tag() && next_content_type.is_comment_bang_opening_tag() {
if destroy_whole_whitespace && last_non_whitespace_content_type.is_comment_bang_opening_tag() && next_content_type.is_comment_bang_opening_tag() {
// Whitespace is between two tags, comments, or bangs.
// destroy_whole_whitespace is on, so don't write it.
} else if should_trim_whitespace && (next_content_type == ContentType::End || last_non_whitespace_content_type == ContentType::Start) {
} else if trim_whitespace && (next_content_type == ContentType::End || last_non_whitespace_content_type == ContentType::Start) {
// Whitespace is leading or trailing.
// should_trim_whitespace is on, so don't write it.
} else if should_collapse_whitespace {
} else if collapse_whitespace {
// Current contiguous whitespace needs to be reduced to a single space character.
proc.write(b' ');
} else {