2018-06-30 06:37:24 -04:00
# hyperbuild
2018-10-25 07:54:54 -04:00
A fast HTML minifier written in C, heavily influenced by [kangax's html-minifier ](https://github.com/kangax/html-minifier ).
Available in different flavours:
- Standalone 64-bit Linux executable (this)
- [Node.js ](https://github.com/wilsonzlin/hyperbuild-nodejs )
- [Express ](https://github.com/wilsonzlin/hyperbuild-express )
- [Webpack ](https://github.com/wilsonzlin/hyperbuild-webpack )
- [Apache ](https://github.com/wilsonzlin/hyperbuild-apache )
- [Nginx ](https://github.com/wilsonzlin/hyperbuild-nginx )
2018-06-30 06:37:24 -04:00
## Features
### Streaming minification
hyperbuild minifies as it parses, directly streaming processed HTML to the output without having to build a DOM/AST or iterate/traverse around in multiple passes, allowing for super-fast compilation times and near-constant memory usage.
2018-09-29 09:00:27 -04:00
### Super fast
2018-06-30 06:37:24 -04:00
2018-10-25 07:54:54 -04:00
hyperbuild is written in C, and uses technologies like Emscripten and Cython to preserve performance in higher-level languages.
2018-06-30 06:37:24 -04:00
2018-09-29 09:00:27 -04:00
### Smart whitespace handling
2018-08-07 21:48:36 -04:00
2018-09-29 09:00:27 -04:00
hyperbuild has advanced whitespace minification with smart defaults that leaves whitespace untouched in `pre` and `code` , trims and collapses them in content tags, and removes them in layout tags allowing the use of `inline-block` without ugly syntax or CSS hacks.
2018-08-07 21:48:36 -04:00
2018-06-30 06:37:24 -04:00
## Parsing
Current limitations:
2018-08-21 11:07:34 -04:00
- UTF-8 in, UTF-8 out, no BOM.
2018-06-30 06:37:24 -04:00
- Not aware of exotic Unicode whitespace characters.
- Tested and designed for Linux only.
2018-07-02 05:21:00 -04:00
- Follows HTML5 only.
2018-06-30 06:37:24 -04:00
2018-07-02 05:21:00 -04:00
### Errors
2018-06-30 06:37:24 -04:00
2018-07-06 01:24:28 -04:00
Errors marked with a `⌫` can be suppressed using the [`--suppress` ](#--suppress ) option.
2018-08-03 07:02:10 -04:00
Use the error name without the `HBE_PARSE_` prefix.
#### `HBE_PARSE_MALFORMED_ENTITY` ⌫
It's an error if the sequence of characters following an ampersand (`& `) does not form a valid entity.
2018-08-04 00:54:52 -04:00
Entities must be of one of the following forms:
2018-08-03 07:02:10 -04:00
- `&name;` , where *name* is a reference to a valid HTML entity
- `&nnnn;` , where *nnnn* is a Unicode code point in base 10
- `&#xhhhh;` , where *hhhh* is a Unicode code point in base 16
2018-08-07 19:09:06 -04:00
A malformed entity is an ampersand not followed by a sequence of characters that matches one of the above forms. This includes when the semicolon is missing.
2018-08-03 07:02:10 -04:00
2018-08-04 00:54:52 -04:00
Note that this is different from `HBE_PARSE_INVALID_ENTITY` , which is when a well-formed entity references a non-existent entity name or Unicode code point.
2018-08-03 07:02:10 -04:00
2018-08-07 19:09:06 -04:00
While an ampersand by itself (i.e. followed by whitespace or as the last character) is a malformed entity, it is covered by `HBE_PARSE_BARE_AMPERSAND` .
#### `HBE_PARSE_BARE_AMPERSAND` ⌫
It's an error to have an ampersand followed by whitespace or as the last character.
This is intentionally a different error to `HBE_PARSE_MALFORMED_ENTITY` due to the ubiquity of bare ampersands.
2018-08-03 07:02:10 -04:00
An ampersand by itself is not *necessarily* an invalid entity. However, HTML parsers and browsers may have different interpretations of bare ampersands, so it's a good idea to always use the encoded form (`& `).
2018-08-07 19:09:06 -04:00
When this error is suppressed, bare ampersands are outputted untouched.
2018-06-30 06:37:24 -04:00
2018-07-05 05:17:31 -04:00
#### `HBE_PARSE_INVALID_ENTITY` ⌫
2018-06-30 06:37:24 -04:00
2018-07-02 05:21:00 -04:00
It's an error if an invalid HTML entity is detected.
2018-08-03 07:02:10 -04:00
If suppressed, invalid entities are outputted untouched.
2018-07-04 07:23:17 -04:00
See [entityrefs.c ](src/main/c/rule/entity/entityrefs.c ) for the list of entity references considered valid by hyperbuild.
2018-08-03 07:02:10 -04:00
2018-07-02 05:21:00 -04:00
Valid entities that reference a Unicode code point must be between 0x0 and 0x10FFFF (inclusive).
2018-06-30 06:37:24 -04:00
2018-07-05 05:17:31 -04:00
#### `HBE_PARSE_NONSTANDARD_TAG` ⌫
2018-06-30 06:37:24 -04:00
2018-07-02 05:21:00 -04:00
It's an error if an unknown (non-standard) tag is reached.
2018-07-04 07:23:17 -04:00
See [tags.c ](src/main/c/rule/tag/tags.c ) for the list of tags considered valid by hyperbuild.
2018-06-30 06:37:24 -04:00
2018-07-05 05:17:31 -04:00
#### `HBE_PARSE_UCASE_TAG` ⌫
2018-06-30 06:37:24 -04:00
2018-07-02 05:21:00 -04:00
It's an error if an opening or closing tag's name has any uppercase characters.
2018-06-30 06:37:24 -04:00
2018-07-05 05:17:31 -04:00
#### `HBE_PARSE_UCASE_ATTR` ⌫
2018-06-30 06:37:24 -04:00
2018-07-02 05:21:00 -04:00
It's an error if an attribute's name has any uppercase characters.
2018-06-30 06:37:24 -04:00
2018-07-05 05:17:31 -04:00
#### `HBE_PARSE_UNQUOTED_ATTR` ⌫
2018-06-30 06:37:24 -04:00
2018-08-04 21:04:01 -04:00
It's an error if an attribute's value is not quoted with `"` (U+0022) or `'` (U+0027).
This means that `` ` ` ` is not a valid quote mark regardless of whether this error is suppressed or not. Backticks are valid attribute value quotes in Internet Explorer.
2018-06-30 06:37:24 -04:00
2018-07-05 05:17:31 -04:00
#### `HBE_PARSE_ILLEGAL_CHILD`
2018-06-30 06:37:24 -04:00
2018-07-02 05:21:00 -04:00
It's an error if a tag is declared where it can't be a child of.
2018-08-03 07:02:10 -04:00
This is a very simple check, and does not cover the comprehensive HTML rules, which involve backtracking, tree traversal, and lots of conditionals.
2018-06-30 06:37:24 -04:00
2018-07-04 07:23:17 -04:00
This rule is enforced in four parts:
[whitelistparents.c ](src/main/c/rule/relation/whitelistparents.c ),
[blacklistparents.c ](src/main/c/rule/relation/blacklistparents.c ),
[whitelistchildren.c ](src/main/c/rule/relation/whitelistchildren.c ), and
[blacklistchildren.c ](src/main/c/rule/relation/blacklistchildren.c ).
2018-06-30 06:37:24 -04:00
2018-07-05 05:17:31 -04:00
#### `HBE_PARSE_UNCLOSED_TAG`
2018-06-30 06:37:24 -04:00
2018-07-02 05:21:00 -04:00
It's an error if a non-void tag is not closed.
2018-07-04 07:23:17 -04:00
See [voidtags.c ](src/main/c/rule/tag/voidtags.c ) for the list of tags considered void by hyperbuild.
2018-06-30 06:37:24 -04:00
2018-07-02 05:21:00 -04:00
This includes tags that close automatically because of siblings (e.g. `<li><li>` ), as it greatly simplifies the complexity of the minifier due to guarantees about the structure.
2018-06-30 06:37:24 -04:00
2018-08-07 18:38:11 -04:00
#### `HBE_PARSE_SELF_CLOSING_TAG` ⌫
2018-08-04 07:41:44 -04:00
It's an error if a tag is self-closed. Valid in XML, not in HTML.
2018-08-07 22:32:51 -04:00
#### `HBE_PARSE_NO_SPACE_BEFORE_ATTR`
2018-08-07 22:31:28 -04:00
It's an error if there is no whitespace before an attribute.
Most likely, the cause of this error is either invalid syntax or something like:
```html
< div class = "a" name = "1" > < / div >
```
(Note the lack of space between the end of the `class` attribute and the beginning of the `name` attribute.)
2018-07-05 05:17:31 -04:00
#### `HBE_PARSE_UNEXPECTED_END` and `HBE_PARSE_EXPECTED_NOT_FOUND`
General syntax errors.
2018-07-05 03:40:47 -04:00
#### Additional errors
2018-07-02 05:21:00 -04:00
2018-08-04 00:54:52 -04:00
There are additional implicit errors that are considered as general syntax errors due to the way the parser works:
2018-07-02 05:21:00 -04:00
2018-07-05 03:40:47 -04:00
- Closing void tags; see [voidtags.c ](src/main/c/rule/tag/voidtags.c ) for the list of tags considered void by hyperbuild.
- Placing whitespace between `=` and attribute names/values.
- Placing whitespace before the tag name in an opening tag.
- Placing whitespace around the tag name in a closing tag.
2018-08-07 21:40:25 -04:00
- Not closing a tag before the end of the file/input.
2018-07-02 05:21:00 -04:00
2018-08-07 21:48:36 -04:00
#### Notes
- Closing `</script>` tags end single-line and multi-line JavaScript comments in `script` tags.
For this to be detected by hyperbuild, the closing tag must not contain any whitespace (e.g. `</script >` ).
2018-07-02 05:21:00 -04:00
### Options
#### `--in`
2018-09-29 09:00:27 -04:00
Path to a file to process. If omitted, hyperbuild will read from `stdin` .
2018-07-02 05:21:00 -04:00
#### `--out`
Path to a file to write to; it will be created if it doesn't exist already. If omitted, the output will be streamed to `stdout` .
#### `--keep`
2018-06-30 06:37:24 -04:00
2018-07-05 23:18:58 -04:00
Don't automatically delete the output file if an error occurred. If the output is `stdout` , or the output is a file but `--buffer` is provided, this option does nothing.
2018-07-02 05:21:00 -04:00
#### `--buffer`
2018-07-05 23:18:58 -04:00
Buffer all output until the process is complete and successful. This won't truncate or write anything to the output until the build process is done, but will use a non-constant amount of memory.
This applies even when the output is `stdout` .
2018-07-02 05:21:00 -04:00
2018-07-05 23:18:58 -04:00
#### `--suppress`
2018-06-30 06:37:24 -04:00
2018-07-02 05:21:00 -04:00
Suppress errors specified by this option. hyperbuild will quitely ignore and continue processing when otherwise one of the provided errors would occur.
2018-08-21 11:02:14 -04:00
Suppressible errors are marked with a `⌫` in the [Errors ](#errors ) section. Omit the `HBE_PARSE_` prefix. Separate the error names with commas.
2018-06-30 06:37:24 -04:00
## Minification
2018-07-04 07:23:17 -04:00
### Theory
#### Whitespace
##### Beginning and end
2018-08-03 08:47:59 -04:00
```html
< p > ↵
··The·quick·brown·fox↵
< / p >
```
2018-07-04 07:23:17 -04:00
##### Between text and tags
2018-08-03 08:47:59 -04:00
```html
< p > The·quick·brown·fox·< strong > jumps< / strong > ·over·the·lazy·dog.< / p >
```
2018-07-04 07:23:17 -04:00
##### Contiguous
2018-08-03 08:47:59 -04:00
```html
2018-08-03 08:54:37 -04:00
< select > ↵
··< option > Jan:·········1< / option > ↵
··< option > Feb:········10< / option > ↵
··< option > Mar:·······100< / option > ↵
··< option > Apr:······1000< / option > ↵
··< option > May:·····10000< / option > ↵
··< option > Jun:····100000< / option > ↵
< / select >
2018-08-03 08:47:59 -04:00
```
2018-07-04 07:23:17 -04:00
##### Whole text
2018-08-03 08:47:59 -04:00
```html
< p > ↵
···↵
< / p >
```
2018-07-04 07:23:17 -04:00
#### Content
##### Specific tags
2018-07-05 07:59:01 -04:00
Tags not in one of the categories below are **specific tags** .
2018-07-04 07:23:17 -04:00
##### Formatting tags
```html
< strong > moat < / strong >
```
##### Content tags
```html
< p > Some < strong > content< / strong > < / p >
```
##### Content-first tags
```html
< li > Anthony< / li >
```
```html
< li >
< div >
< / div >
< / li >
```
2018-07-05 07:59:01 -04:00
##### Layout tags
2018-07-04 07:23:17 -04:00
2018-08-03 08:47:59 -04:00
```html
< div >
< div > < / div >
< / div >
```
2018-07-04 07:23:17 -04:00
##### Overview
|Type|Content|
|---|---|
|Formatting tags|Text nodes|
|Content tags|Formatting tags, text nodes|
2018-08-03 08:47:59 -04:00
|Layout tags|Layout tags, content tags|
|Content-first tags|Content of content tags or layout tags (but not both)|
2018-07-04 07:23:17 -04:00
2018-06-30 06:37:24 -04:00
### Options
2018-08-07 00:25:15 -04:00
Note that only existing whitespace will be up for removal via minification. Entities that represent whitespace will not be decoded and then removed.
2018-08-10 05:57:49 -04:00
For options that have a list of tags as their value, the tags should be separated by a comma.
2018-07-06 06:59:01 -04:00
2018-08-10 05:57:49 -04:00
An `*` (asterisk, U+002A) can be used to represent the complete set of possible tags. Providing no value represents the empty set.
Both values essentially fully enables or disables the option.
2018-07-06 06:59:01 -04:00
2018-07-04 07:23:17 -04:00
For brevity, hyperbuild has built-in sets of tags that can be used in place of declaring all their members; they begin with a `$` sign:
2018-06-30 06:37:24 -04:00
2018-07-05 18:00:58 -04:00
|Name|Tags|Source|
2018-06-30 06:37:24 -04:00
|---|---|---|
2018-07-05 18:00:58 -04:00
|`$content`|`address`, `audio` , `button` , `canvas` , `caption` , `figcaption` , `h1` , `h2` , `h3` , `h4` , `h5` , `h6` , `legend` , `meter` , `object` , `option` , `p` , `summary` , `textarea` , `video` |[contenttags.c](src/main/c/rule/tag/contenttags.c)|
|`$contentfirst`|`dd`, `details` , `dt` , `iframe` , `label` , `li` , `noscript` , `output` , `progress` , `slot` , `td` , `template` , `th` |[contentfirsttags.c](src/main/c/rule/tag/contentfirsttags.c)|
2018-07-06 06:06:50 -04:00
|`$formatting`|`a`, `abbr` , `b` , `bdi` , `bdo` , `cite` , `data` , `del` , `dfn` , `em` , `i` , `ins` , `kbd` , `mark` , `q` , `rp` , `rt` , `rtc` , `ruby` , `s` , `samp` , `small` , `span` , `strong` , `sub` , `sup` , `time` , `u` , `var` , `wbr` |[formattingtags.c](src/main/c/rule/tag/formattingtags.c)|
2018-07-05 18:00:58 -04:00
|`$layout`|`blockquote`, `body` , `colgroup` , `datalist` , `dialog` , `div` , `dl` , `fieldset` , `figure` , `footer` , `form` , `head` , `header` , `hgroup` , `html` , `main` , `map` , `menu` , `nav` , `ol` , `optgroup` , `picture` , `section` , `select` , `table` , `tbody` , `tfoot` , `thead` , `tr` , `ul` |[layouttags.c](src/main/c/rule/tag/layouttags.c)|
2018-08-04 00:01:04 -04:00
|`$specific`|All [SVG tags ](src/main/c/rule/tag/svgtags.c ), `area` , `base` , `br` , `code` , `col` , `embed` , `hr` , `img` , `input` , `param` , `pre` , `script` , `source` , `track` |[specifictags.c](src/main/c/rule/tag/specifictags.c)|
2018-07-05 18:00:58 -04:00
|`$heading`|`hgroup`, `h1` , `h2` , `h3` , `h4` , `h5` , `h6` |[headingtags.c](src/main/c/rule/tag/headingtags.c)|
|`$media`|`audio`, `video` |[mediatags.c](src/main/c/rule/tag/mediatags.c)|
|`$sectioning`|`article`, `aside` , `nav` , `section` |[sectioningtags.c](src/main/c/rule/tag/sectioningtags.c)|
|`$void`|`area`, `base` , `br` , `col` , `embed` , `hr` , `img` , `input` , `keygen` , `link` , `meta` , `param` , `source` , `track` , `wbr` |[voidtags.c](src/main/c/rule/tag/voidtags.c)|
|`$wss`|`pre`, `code` |[wsstags.c](src/main/c/rule/tag/wsstags.c)|
2018-06-30 06:37:24 -04:00
2018-08-10 05:57:49 -04:00
As an example, for `--MXcollapseWhitespace` , here are some possible values:
|Arguments|Description|
|---|---|
|`--MXcollapseWhitespace $wss`|Collapse whitespace in all tags except `$wss` ones|
|`--MXcollapseWhitespace $content,$wss`|Collapse whitespace in all tags except `$content` and `$wss` ones|
|`--MXcollapseWhitespace $content,$wss,dd`|Collapse whitespace in all tags except `$content` and `$wss` ones, as well as the `dd` tag|
|`--MXcollapseWhitespace sup,dd`|Collapse whitespace in all tags except `sup` and `dd` |
|`--MXcollapseWhitespace`|Collapse whitespace in all tags|
|`--MXcollapseWhitespace *`|Don't collapse whitespace in any tag|
2018-07-05 23:18:58 -04:00
#### `--MXcollapseWhitespace $wss`
2018-06-30 06:37:24 -04:00
Reduce a sequence of whitespace characters in text nodes to a single space (U+0020), unless they are a child of the tags specified by this option.
< table > < thead > < tr > < th > Before< th > After< tbody > < tr > < td >
```html
< p > ↵
··The·quick·brown·fox↵
··jumps·over·the·lazy↵
··dog.↵
< / p >
```
< td >
```html
2018-07-04 07:23:17 -04:00
< p > ·The·quick·brown·fox·jumps·over·the·lazy·dog.·< / p >
2018-06-30 06:37:24 -04:00
```
< / table >
2018-07-05 23:18:58 -04:00
#### `--MXdestroyWholeWhitespace $wss,$content,$formatting`
2018-06-30 06:37:24 -04:00
Remove any text nodes that only consist of whitespace characters, unless they are a child of the tags specified by this option.
Especially useful when using `display: inline-block` so that whitespace between elements (e.g. indentation) does not alter layout and styling.
< table > < thead > < tr > < th > Before< th > After< tbody > < tr > < td >
```html
< div > ↵
··< h1 > < / h1 > ↵
··< ul > < / ul > ↵
··A·quick·< strong > brown< / strong > ·< em > fox< / em > .↵
< / div >
```
< td >
```html
< div > < h1 > < / h1 > < ul > < / ul > ↵
··A·quick·< strong > brown< / strong > < em > fox< / em > .↵
< / div >
```
< / table >
2018-07-05 23:18:58 -04:00
#### `--MXtrimWhitespace $wss,$formatting`
2018-06-30 06:37:24 -04:00
Remove any whitespace from the start and end of a tag, if the first and/or last node is a text node, unless the tag is one of the tags specified by this option.
Useful when combined with whitespace collapsing.
2018-07-06 01:25:02 -04:00
Other whitespace between text nodes and tags are not removed, as it is not recommended to mix non-formatting tags with raw text.
Basically, a tag should only either contain text and [formatting tags ](#formatting-tags ), or only non-formatting tags.
2018-06-30 06:37:24 -04:00
< table > < thead > < tr > < th > Before< th > After< tbody > < tr > < td >
```html
< p > ↵
··Hey,·I·< em > just< / em > ·found↵
··out·about·this·< strong > cool< / strong > ·website!↵
2018-07-03 08:18:10 -04:00
··< div > < / div > ↵
2018-06-30 06:37:24 -04:00
< / p >
```
< td >
```html
< p > Hey,·I·< em > just< / em > ·found↵
2018-07-03 08:18:10 -04:00
··out·about·this·< strong > cool< / strong > ·website!↵
··< div > < / div > < / p >
2018-06-30 06:37:24 -04:00
```
< / table >
2018-07-05 23:18:58 -04:00
#### `--MXtrimClassAttribute`
2018-06-30 06:37:24 -04:00
2018-07-05 23:18:58 -04:00
Don't trim and collapse whitespace in `class` attribute values.
2018-06-30 06:37:24 -04:00
< table > < thead > < tr > < th > Before< th > After< tbody > < tr > < td >
```html
< div class = "
hi
lo
a b c
d e
f g
">< / div >
```
< td >
```html
< div class = "hi lo a b c d e f g" > < / div >
```
< / table >
2018-07-06 01:00:45 -04:00
#### `--MXdecEnt`
2018-06-30 06:37:24 -04:00
2018-07-05 23:18:58 -04:00
Don't decode any valid entities into their UTF-8 values.
2018-06-30 06:37:24 -04:00
2018-07-06 01:00:45 -04:00
#### `--MXcondComments`
2018-06-30 06:37:24 -04:00
2018-07-06 00:07:34 -04:00
Don't minify the contents of conditional comments, including downlevel-revealed conditional comments.
2018-06-30 06:37:24 -04:00
2018-07-06 01:00:45 -04:00
#### `--MXattrQuotes`
2018-06-30 06:37:24 -04:00
2018-07-05 23:18:58 -04:00
Don't remove quotes around attribute values when possible.
2018-06-30 06:37:24 -04:00
2018-07-06 01:00:45 -04:00
#### `--MXcomments`
2018-06-30 06:37:24 -04:00
2018-08-08 01:14:12 -04:00
Don't remove any comments. Conditional comments are never removed regardless of this setting.
2018-06-30 06:37:24 -04:00
2018-07-06 01:00:45 -04:00
#### `--MXoptTags`
2018-06-30 06:37:24 -04:00
2018-07-05 23:18:58 -04:00
Don't remove optional starting or ending tags.
2018-06-30 06:37:24 -04:00
2018-07-06 01:00:45 -04:00
#### `--MXtagWS`
2018-06-30 06:37:24 -04:00
2018-07-05 23:18:58 -04:00
Don't remove spaces between attributes when possible.
2018-06-30 06:37:24 -04:00
### Non-options
2018-07-06 01:24:43 -04:00
#### Explicitly important
2018-06-30 06:37:24 -04:00
2018-07-06 01:24:43 -04:00
The following removal of attributes and tags as minification strategies are not available in hyperbuild, as they should not have been declared in the first place.
2018-06-30 06:37:24 -04:00
2018-07-06 01:24:43 -04:00
If they exist, it is assumed there is a special reason for being so.
2018-06-30 06:37:24 -04:00
2018-08-07 00:25:15 -04:00
- Remove empty attributes (including ones that would be empty after minification e.g. `class=" "` )
2018-07-06 01:24:43 -04:00
- Remove empty elements
- Remove redundant attributes
- Remove `type` attribute on `<script>` tags
- Remove `type` attribute on `<style>` and `<link>` tags