Clarify script data handling and entity minification

This commit is contained in:
Wilson Lin 2020-01-14 23:59:25 +11:00
parent fd9e90983f
commit d474e4a097
2 changed files with 27 additions and 8 deletions

View File

@ -309,10 +309,15 @@ If an attribute value is empty after any processing, it is completely removed (i
Spaces are removed between attributes if possible.
### Other
### Entities
- Comments are removed.
- Entities are decoded if valid (see relevant parsing section). If an entity is unintentionally formed after decoding, the leading ampersand is encoded, e.g. `&` becomes `&ampamp;`. This is done as `&amp` is equal to or shorter than all other entity versions of characters that could be encoded as part of an entity (`[&#a-zA-Z0-9;]`).
Entities are decoded if valid (see relevant parsing section). If an entity is unintentionally formed after decoding, the leading ampersand is encoded, e.g. `&` becomes `&ampamp;`.
This is done as `&amp` is equal to or shorter than all other entity representations of characters that could be encoded as part of an entity (`[&#a-zA-Z0-9;]`), and there is no other conflicting entity name that starts with `amp`.
### Comments
Comments are removed.
### Ignored

View File

@ -1,6 +1,24 @@
# Script data
For legacy reasons, special handling is required for content inside a script tag; see https://www.w3.org/TR/html52/syntax.html#script-data-state for more details.
## Summary
For legacy reasons, HTML comments can appear within a script tag, and if there is a `<script` in it, the first following `</script>` within the comment does **not** close the main script tag.
hyperbuild does **not** do this special handling, as it adds complexity and slows down performance dramatically, for a legacy feature that is not recommended to be (and almost never) used.
See https://www.w3.org/TR/html52/syntax.html#script-data-state for more details.
Commit [20c59769](https://github.com/wilsonzlin/hyperbuild/commit/20c59769fea6bfb8a9d5ecea47d979dc9b1dcda5) removed support.
## States and transitions
|State|`<script`|`</script`|`<!--`|`-->`|
|---|---|---|---|---|
|Normal|-|End|Escaped|-|
|Escaped|DoubleEscaped|End|-|Normal|
|DoubleEscaped|-|Escaped|-|Normal|
## Examples
```html
<script type="text/html">
@ -17,12 +35,8 @@ These are true about the above snippet:
- `!window.exec1 && window.exec2`.
- `document.querySelector('script[type="text/html"]')` has exactly one child node and it's a text node.
## Comments
If there is one or more `<script>` inside an HTML comment before any `</script>`, the first `</script>` will not end the main script.
### Examples
Ending tag inside comment works because there are no nested script tags.
```html