Update README; cleanup comments

This commit is contained in:
Wilson Lin 2019-12-27 22:14:03 +11:00
parent e15381c1cb
commit a14def709f
5 changed files with 39 additions and 64 deletions

View File

@ -2,7 +2,7 @@
A fast one-pass in-place HTML minifier written in Rust with advanced whitespace handling.
Currently in beta, working on documentation and tests. Issues and pull requests welcome!
Currently in beta, working on documentation and tests. Issues and pull requests welcome! Guide below is currently WIP.
## Features
@ -18,21 +18,49 @@ hyperbuild --src /path/to/src.html --out /path/to/output.min.html
## Minification
Guide below is currently WIP.
### Whitespace
hyperbuild has advanced whitespace minification that can allow strategies such as:
- leave whitespace untouched in `pre` and `code`, which are whitespace sensitive
- trim and collapse whitespace in content tags, as whitespace is collapsed anyway when rendered
- remove whitespace in layout tags, which allows the use of inline layouts while keeping formatted code
- Leave whitespace untouched in `pre` and `code`, which are whitespace sensitive.
- Trim and collapse whitespace in content tags, as whitespace is collapsed anyway when rendered.
- Remove whitespace in layout tags, which allows the use of inline layouts while keeping formatted code.
### Attributes
Any entities in attribute values are decoded, and then the most optimal representation is calculated and used:
- Double quoted, with any `"` encoded.
- Single quoted, with any `'` encoded.
- Unquoted, with `"`/`'` first char (if applicable) and `>` last char (if applicable), and any whitespace, encoded.
Some attributes have their whitespace (after decoding) trimmed and collapsed, such as `class`.
If the attribute value is empty after any processing, it is completely removed (i.e. no `=`).
Spaces are removed between attributes if possible.
### Other
- Comments are removed.
- Entities are decoded if valid (see relevant parsing section).
### WIP
- Removal of [optional tags](https://html.spec.whatwg.org/multipage/syntax.html#syntax-tag-omission).
- Removal of boolean attribute values.
- Removal of redundant attributes (empty or default value).
- Handling of conditional or special comments.
### Explicitly important
Empty elements and bangs are not removed as it is assumed there is a special reason for their declaration.
## Parsing
hyperbuild is an HTML minifier and simply does HTML minification. In addition to keeping to one role, hyperbuild almost does no syntax checking or standards enforcement for performance and code complexity reasons.
For example, this means that it's not an error to have self-closing tags, having multiple `<body>` elements, using incorrect attribute names and values, or using `<br>` like `<br>alert('');</br>`
For example, this means that it's not an error to have self-closing tags, declare multiple `<body>` elements, use incorrect attribute names and values, or write something like `<br>alert('');</br>`
However, there are some syntax requirements for speed and sanity reasons.
@ -71,10 +99,12 @@ Most likely, the cause of this error is either invalid syntax or something like:
### Script and style
`script` and `style` tags must be closed with `</script>` and `</style>` respectively (case-sensitive).
`script` and `style` tags must be closed with `</script>` and `</style>` respectively (case sensitive).
Note that the closing tag must not contain any whitespace (e.g. `</script >`).
[hyperbuild can handle text script content.](notes/Text script content.md)
## Development
Currently, hyperbuild has a few limitations:

View File

@ -213,39 +213,3 @@ Don't trim and collapse whitespace in `class` attribute values.
```
</table>
#### `--MXdecEnt`
Don't decode any valid entities into their UTF-8 values.
#### `--MXcondComments`
Don't minify the contents of conditional comments, including downlevel-revealed conditional comments.
#### `--MXattrQuotes`
Don't remove quotes around attribute values when possible.
#### `--MXcomments`
Don't remove any comments. Conditional comments are never removed regardless of this setting.
#### `--MXoptTags`
Don't remove optional starting or ending tags.
#### `--MXtagWS`
Don't remove spaces between attributes when possible.
### Non-options
#### Explicitly important
The following removal of attributes and tags as minification strategies are not available in hyperbuild, as it is assumed there is a special reason for their declaration:
- empty attributes (including ones that would be empty after minification e.g. `class=" "`)
- empty elements
- redundant attributes
- `type` attribute on `<script>` tags
- `type` attribute on `<style>` and `<link>` tags

View File

@ -9,18 +9,6 @@ mod proc;
mod spec;
mod unit;
/**
* Run hyperbuild on an input array and write to {@param output}. Output will be
* null terminated if no error occurs. WARNING: Input must end with '\xFF' or
* '\0', and {@param input_size} must not include the terminator. WARNING: Does
* not check if {@param output} is large enough. It should at least match the
* size of the input.
*
* @param input input array to process
* @param output output array to write to
* @param cfg configuration to use
* @return result where to write any resulting error information
*/
pub fn hyperbuild(code: &mut [u8]) -> Result<usize, (ErrorType, usize)> {
let mut proc = Processor::new(code);
match process_content(&mut proc, None) {

View File

@ -31,8 +31,7 @@ impl ContentType {
}
fn peek(proc: &mut Processor) -> ContentType {
// TODO Optimise to trie.
// TODO Optimise.
if proc.at_end() || chain!(proc.match_seq(b"</").matched()) {
return ContentType::End;
};

View File

@ -183,12 +183,6 @@ pub fn maybe_process_entity(proc: &mut Processor) -> ProcessingResult<ParsedEnti
Ok(ParsedEntity { entity, checkpoint })
}
/**
* Process an HTML entity.
*
* @return Unicode code point of the entity, or HB_UNIT_ENTITY_NONE if the
* entity is malformed or invalid
*/
pub fn process_entity(proc: &mut Processor) -> ProcessingResult<EntityType> {
let entity = maybe_process_entity(proc)?;
entity.keep(proc);