From 522408b923799a7c8daba03d2d3ea14e16ef8315 Mon Sep 17 00:00:00 2001 From: Wilson Lin Date: Sat, 30 Jun 2018 22:37:24 +1200 Subject: [PATCH] Initial commit --- .gitignore | 1 + LICENSE | 21 ++++ README.md | 272 ++++++++++++++++++++++++++++++++++++++++ package.json | 44 +++++++ src/main/c/main.c | 0 src/main/c/stream/tag.c | 0 6 files changed, 338 insertions(+) create mode 100644 .gitignore create mode 100644 LICENSE create mode 100644 README.md create mode 100644 package.json create mode 100644 src/main/c/main.c create mode 100644 src/main/c/stream/tag.c diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..6a3417b --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +/out/ diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..df6f5bd --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2018 Wilson Lin + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md new file mode 100644 index 0000000..ec2ca4b --- /dev/null +++ b/README.md @@ -0,0 +1,272 @@ +# hyperbuild + +A fast HTML parser, preprocessor, and minifier, written in C. +Designed to be used in C projects, but also runnable on Node.js thanks to Emscripten. +Heavily influenced by [kangax's html-minifier](https://github.com/kangax/html-minifier). + +## Features + +### Streaming minification + +hyperbuild minifies as it parses, directly streaming processed HTML to the output without having to build a DOM/AST or iterate/traverse around in multiple passes, allowing for super-fast compilation times and near-constant memory usage. + +### Smart parsing + +hyperbuild is aware of strings and comments in JS and CSS sections, and deals with them correctly. + +### Super low level + +hyperbuild is written in C, and exposed to Node.js using Emscripten. + +## Parsing + +Current limitations: + +- UTF-8 in, UTF-8 out, no BOM at any time. +- Not aware of exotic Unicode whitespace characters. +- Tested and designed for Linux only. + +### Options + +#### I/O + +General options for input and output. + +##### `--in` + +Path to a file to process. If omitted, hyperbuild will read from `stdin`, and imports will be relative to the working directory. + +##### `--out` + +Path to a file to write to; it will be created if it doesn't exist already. If omitted, the output will be streamed to `stdout`. + +##### `--keep` + +Don't automatically delete the output file if an error occurred. This option does nothing if the output is `stdout`, and cannot be used with `--buffer`. + +##### `--buffer` + +Buffer all output until the process is complete and successful. This can prevent many writes to storage (and won't cause any writes on error), but will use a non-constant amount of memory. +This applies even when the output is `stdout`, and cannot be used with `--keep`. + +#### Error + +When to stop parsing with an error. + +##### `-Einvalid-entity` + +It's an error if an invalid HTML entity is detected. +If omitted, invalid entities are simply interpreted literally. + +##### `-Einvalid-tag` + +It's an error if an unknown (non-standard) tag is reached. +A definitive list will be published soon. In the meantime, use the [MDN article](https://developer.mozilla.org/en-US/docs/Web/HTML/Element) as a reference. + +##### `-Eucase-tag` + +It's an error if an opening or closing tag's name has any uppercase characters. + +##### `-Eucase-attr` + +It's an error if an attribute's name has any uppercase characters. + +## Processing + +hyperbuild sits somewhere between Server Side Includes and a templating library, and is designed for simplistic compilation of apps statically rather than dynamic generation of live content. + +To achieve this, hyperbuild has special **directives** that allow special action to be taken when it's processing some HTML code. +This includes importing files, getting and setting variables, and escaping text for HTML. + +Directives are like functions in any common language: they take some arguments, and return some value. +In hyperbuild, all arguments are simple strings, and the return value is directly streamed while processing. + +### Using directives + +There are two methods of getting hyperbuild's attention: using a special tag, and using a special entity. + +#### Directive tags + +```html +valarg +``` + +- Replace `dir` with a hyperbuild directive name +- The value for the argument `value` is provided via the inner content of the tag +- All other arguments are provided via attributes +- Directive entities inside argument values, and nested directive tags, will be processed + +#### Directive entities + +```html +&hb-dir(arg1=val1, arg2=val2); +``` + +- Replace `dir` with a hyperbuild directive name +- Arguments are provided in name-value pairs between parentheses, separated by commas +- All characters between the `=` and next `,` or `)` count as the argument's value, including whitespace characters +- To use commas or right parentheses in argument values, use the HTML entity (`,` and `)`) +- Directive entities inside argument values will be processed + +### Available directives + +#### `import` + +Read, parse, process, and minify another file, and stream the result. + +|Argument|Format|Required|Description| +|---|---|---|---| +|path|Relative or absolute file system path|Y|The path to the file. If it starts with a slash, it is interpreted as an absolute path; otherwise, it's a path relative to the directory of the importee, or the working directory if the input is `stdin`.| + +## Minification + +### Options + +For options that have a list of tags as their values, the tags should be separated by a comma. +For brevity, hyperbuild has built-in sets of tags that can be used in place of declaring all their members; they begin with a `$` sign: + +|Name|Tags|Description| +|---|---|---| +|`$inline`|`a`, `abbr`, `b`, `bdi`, `bdo`, `cite`, `code`, `data`, `dfn`, `em`, `i`, `kbd`, `mark`, `q`, `rt`, `rtc`, `ruby`, `s`, `samp`, `small`, `span`, `strong`, `sub`, `sup`, `time`, `u`, `var`, `wbr`|Inline text semantics (see https://developer.mozilla.org/en-US/docs/Web/HTML/Element#Inline_text_semantics). + +#### `--collapseWhitespaceEx pre,code` + +Reduce a sequence of whitespace characters in text nodes to a single space (U+0020), unless they are a child of the tags specified by this option. + +
BeforeAfter
+ +```html +

↵ +··The·quick·brown·fox↵ +··jumps·over·the·lazy↵ +··dog.↵ +

+``` + +
+ +```html +

·The·quick·brown·fox·jumps·over·the·lazy·dog.·

+``` + +
+ +#### `--destroyWholeWhitespaceEx pre,code,p,$inline` + +Remove any text nodes that only consist of whitespace characters, unless they are a child of the tags specified by this option. + +Especially useful when using `display: inline-block` so that whitespace between elements (e.g. indentation) does not alter layout and styling. + +
BeforeAfter
+ +```html +
↵ +··

↵ +··
    ↵ +··A·quick·brown·fox.↵ +
    +``` + +
    + +```html +

      ↵ +··A·quick·brownfox.↵ +
      +``` + +
      + +#### `--trimWhitespaceEx pre,code` + +Remove any whitespace from the start and end of a tag, if the first and/or last node is a text node, unless the tag is one of the tags specified by this option. + +Useful when combined with whitespace collapsing. + +Other whitespace between text nodes and tags are not removed, as it is not recommended to mix non-inline tags with raw text; wrap text in an appropriate tag. +Basically, a tag should only either contain text and [inline text semantics](#https://developer.mozilla.org/en-US/docs/Web/HTML/Element#Inline_text_semantics), or tags. + +
      BeforeAfter
      + +```html +

      ↵ +··Hey,·I·just·found↵ +··out·about·this·cool·website!↵ +

      +``` + +
      + +```html +

      Hey,·I·just·found↵ +··out·about·this·cool·website!

      +``` + +
      + +#### `--trimClassAttribute` + +Trim and collapse whitespace in `class` attribute values. + +
      BeforeAfter
      + +```html +
      +``` + +
      + +```html +
      +``` + +
      + +#### `--decodeEntities` + +Decode any entities into their UTF-8 values. + +Invalid entities will result in an error. + +#### `--processConditionalComments` + +Process the contents of conditional comments, including downlevel-revealed conditional comments. + +#### `--removeAttributeQuotes` + +Remove quotes around attribute values when possible. + +#### `--removeComments` + +Remove any comments, except conditional comments. + +#### `--removeOptionalTags` + +Remove optional starting or ending tags. + +#### `--removeTagWhitespace` + +Remove spaces between attributes when possible. + +### Non-options + +#### Collapse boolean attributes + +Not provided, as they should not have been declared in the first place. +(If they exist, it is assumed there is a special reason for being so.) + +#### Remove empty attributes + +#### Remove empty elements + +#### Remove redundant attributes + +#### Remove `type` attribute on `