Progress README

This commit is contained in:
Wilson Lin 2018-07-02 21:21:00 +12:00
parent 522408b923
commit d88d3ada69
1 changed files with 66 additions and 31 deletions

View File

@ -2,7 +2,7 @@
A fast HTML parser, preprocessor, and minifier, written in C. A fast HTML parser, preprocessor, and minifier, written in C.
Designed to be used in C projects, but also runnable on Node.js thanks to Emscripten. Designed to be used in C projects, but also runnable on Node.js thanks to Emscripten.
Heavily influenced by [kangax's html-minifier](https://github.com/kangax/html-minifier). Minifier heavily influenced by [kangax's html-minifier](https://github.com/kangax/html-minifier).
## Features ## Features
@ -25,51 +25,88 @@ Current limitations:
- UTF-8 in, UTF-8 out, no BOM at any time. - UTF-8 in, UTF-8 out, no BOM at any time.
- Not aware of exotic Unicode whitespace characters. - Not aware of exotic Unicode whitespace characters.
- Tested and designed for Linux only. - Tested and designed for Linux only.
- Follows HTML5 only.
### Errors
Errors marked with a `⌫` can be suppressed using the [`--errorEx`](#--errorEx) option.
#### `EBADENT`
It's an error if an invalid HTML entity is detected.
If suppressed, invalid entities are simply interpreted literally.
See [entityrefs.c](src/main/c/rule/entityrefs.c) for the list of entity references considered valid by hyperbuild.
Valid entities that reference a Unicode code point must be between 0x0 and 0x10FFFF (inclusive).
#### `EBADTAG`
It's an error if an unknown (non-standard) tag is reached.
See [tags.c](src/main/c/rule/tags.c) for the list of tags considered valid by hyperbuild.
#### `EUCASETAG`
It's an error if an opening or closing tag's name has any uppercase characters.
#### `EUCASEATTR`
It's an error if an attribute's name has any uppercase characters.
#### `EUQOTATTR`
It's an error if an attribute's value is not quoted with `"` (U+0022).
This means that `` ` `` and `'` are not valid quote marks.
#### `EBADCHILD`
It's an error if a tag is declared where it can't be a child of.
This is a very simple check, and does not cover the comprehensive HTML rules, as they involve backtracking, tree traversal, and lots of conditionals.
This rule is enforced in four parts:
[whitelistparents.c](src/main/c/rule/whitelistparents.c),
[blacklistparents.c](src/main/c/rule/blacklistparents.c),
[whitelistchildren.c](src/main/c/rule/whitelistchildren.c), and
[blacklistchildren.c](src/main/c/rule/blacklistchildren.c).
#### `EUNCTAG`
It's an error if a non-void tag is not closed.
See [voidtags.c](src/main/c/rule/voidtags.c) for the list of tags considered void by hyperbuild.
This includes tags that close automatically because of siblings (e.g. `<li><li>`), as it greatly simplifies the complexity of the minifier due to guarantees about the structure.
#### `ECLOSVOID`
It's an error if a void tag is closed.
See [voidtags.c](src/main/c/rule/voidtags.c) for the list of tags considered void by hyperbuild.
#### `ESELFCLOS`
It's an error if a tag is self-closed like XML.
### Options ### Options
#### I/O #### `--in`
General options for input and output.
##### `--in`
Path to a file to process. If omitted, hyperbuild will read from `stdin`, and imports will be relative to the working directory. Path to a file to process. If omitted, hyperbuild will read from `stdin`, and imports will be relative to the working directory.
##### `--out` #### `--out`
Path to a file to write to; it will be created if it doesn't exist already. If omitted, the output will be streamed to `stdout`. Path to a file to write to; it will be created if it doesn't exist already. If omitted, the output will be streamed to `stdout`.
##### `--keep` #### `--keep`
Don't automatically delete the output file if an error occurred. This option does nothing if the output is `stdout`, and cannot be used with `--buffer`. Don't automatically delete the output file if an error occurred. This option does nothing if the output is `stdout`, and cannot be used with `--buffer`.
##### `--buffer` #### `--buffer`
Buffer all output until the process is complete and successful. This can prevent many writes to storage (and won't cause any writes on error), but will use a non-constant amount of memory. Buffer all output until the process is complete and successful. This can prevent many writes to storage (and won't cause any writes on error), but will use a non-constant amount of memory.
This applies even when the output is `stdout`, and cannot be used with `--keep`. This applies even when the output is `stdout`, and cannot be used with `--keep`.
#### Error #### `--errorEx`
When to stop parsing with an error. Suppress errors specified by this option. hyperbuild will quitely ignore and continue processing when otherwise one of the provided errors would occur.
##### `-Einvalid-entity` Separate the error names by a comma. Suppressible errors are marked with a `⌫` in the [Errors](#Errors) section.
It's an error if an invalid HTML entity is detected.
If omitted, invalid entities are simply interpreted literally.
##### `-Einvalid-tag`
It's an error if an unknown (non-standard) tag is reached.
A definitive list will be published soon. In the meantime, use the [MDN article](https://developer.mozilla.org/en-US/docs/Web/HTML/Element) as a reference.
##### `-Eucase-tag`
It's an error if an opening or closing tag's name has any uppercase characters.
##### `-Eucase-attr`
It's an error if an attribute's name has any uppercase characters.
## Processing ## Processing
@ -230,9 +267,7 @@ Trim and collapse whitespace in `class` attribute values.
#### `--decodeEntities` #### `--decodeEntities`
Decode any entities into their UTF-8 values. Decode any valid entities into their UTF-8 values.
Invalid entities will result in an error.
#### `--processConditionalComments` #### `--processConditionalComments`