Update README; clear old files; fix long lived ranges

This commit is contained in:
Wilson Lin 2019-12-26 17:16:13 +11:00
parent 4ddcb36e42
commit 4ef7574487
18 changed files with 140 additions and 918 deletions

120
README.md
View File

@ -12,7 +12,9 @@ Currently in beta, working on documentation and tests. Issues and pull requests
## Usage
TODO
```bash
hyperbuild --src /path/to/src.html --out /path/to/output.min.html
```
## Minification
@ -26,94 +28,58 @@ hyperbuild has advanced whitespace minification that can allow strategies such a
- trim and collapse whitespace in content tags, as whitespace is collapsed anyway when rendered
- remove whitespace in layout tags, which allows the use of inline layouts while keeping formatted code
#### Beginning and end
## Parsing
hyperbuild is an HTML minifier and simply does HTML minification. In addition to keeping to one role, hyperbuild almost does no syntax checking or standards enforcement for performance and code complexity reasons.
For example, this means that it's not an error to have self-closing tags, having multiple `<body>` elements, using incorrect attribute names and values, or using `<br>` like `<br>alert('');</br>`
However, there are some syntax requirements for speed and sanity reasons.
### Tags
Tag names are case sensitive.
### Entities
Well-formed entities are decoded, including in attribute values.
They are considered as a single character representing their decoded value. This means that `&#9;` is considered a whitespace character and could be minified.
If a named entity is an invalid reference as per the [spec](https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references), it is considered malformed and will be interpreted literally.
Numeric character references that reference to numbers below 0x00 or above 0x10FFFF are considered malformed. It will be decoded if it falls within this range, even if it does not refer to a valid Unicode code point.
### Attributes
Backticks (`` ` ``) are not valid quote marks and are not interpreted as such.
However, backticks are valid attribute value quotes in Internet Explorer.
It's an error to place whitespace between `=` and attribute names/values.
Special handling of some attributes require case-sensitive names and values. For example, `class` and `type="text/javascript"`.
It's an error if there is no whitespace before an attribute.
Most likely, the cause of this error is either invalid syntax or something like:
```html
<p>
··The·quick·brown·fox↵
</p>
<div class="a"name="1"></div>
```
#### Between text and tags
(Note the lack of space between the end of the `class` attribute and the beginning of the `name` attribute.)
```html
<p>The·quick·brown·fox·<strong>jumps</strong>·over·the·lazy·dog.</p>
```
### Script and style
#### Contiguous
`script` and `style` tags must be closed with `</script>` and `</style>` respectively (case-sensitive).
```html
<select>
··<option>Jan:·········1</option>
··<option>Feb:········10</option>
··<option>Mar:·······100</option>
··<option>Apr:······1000</option>
··<option>May:·····10000</option>
··<option>Jun:····100000</option>
</select>
```
Note that the closing tag must not contain any whitespace (e.g. `</script >`).
#### Whole text
```html
<p>
···↵
</p>
```
### Tag classification
|Type|Content|
|---|---|
|Formatting tags|Text nodes|
|Content tags|Formatting tags, text nodes|
|Layout tags|Layout tags, content tags|
|Content-first tags|Content of content tags or layout tags (but not both)|
#### Specific tags
Tags not in one of the categories below are **specific tags**.
#### Formatting tags
```html
<strong> moat </strong>
```
#### Content tags
```html
<p>Some <strong>content</strong></p>
```
#### Content-first tags
```html
<li>Anthony</li>
```
```html
<li>
<div>
</div>
</li>
```
#### Layout tags
```html
<div>
<div></div>
</div>
```
## Development
Currently, hyperbuild has a few limitations:
- Only UTF-8 is supported.
- Only UTF-8/ASCII is supported.
- Not aware of exotic Unicode whitespace characters.
- Follows HTML5 only.
- Only works on Linux.
Patches to change any of these welcome!

View File

@ -1,151 +1,86 @@
## Parsing
Current limitations:
### Errors
Errors marked with a `⌫` can be suppressed using the [`--suppress`](#--suppress) option.
#### `MALFORMED_ENTITY`
It's an error if the sequence of characters following an ampersand (`&`) does not form a valid entity.
Entities must be of one of the following forms:
- `&name;`, where *name* is a reference to a valid HTML entity
- `&#nnnn;`, where *nnnn* is a Unicode code point in base 10
- `&#xhhhh;`, where *hhhh* is a Unicode code point in base 16
A malformed entity is an ampersand not followed by a sequence of characters that matches one of the above forms. This includes when the semicolon is missing.
Note that this is different from `INVALID_ENTITY`, which is when a well-formed entity references a non-existent entity name or Unicode code point.
While an ampersand by itself (i.e. followed by whitespace or as the last character) is a malformed entity, it is covered by `BARE_AMPERSAND`.
#### `BARE_AMPERSAND`
It's an error to have an ampersand followed by whitespace or as the last character.
This is intentionally a different error to `MALFORMED_ENTITY` due to the ubiquity of bare ampersands.
An ampersand by itself is not *necessarily* an invalid entity. However, HTML parsers and browsers may have different interpretations of bare ampersands, so it's a good idea to always use the encoded form (`&amp;`).
When this error is suppressed, bare ampersands are outputted untouched.
#### `INVALID_ENTITY`
It's an error if an invalid HTML entity is detected.
If suppressed, invalid entities are outputted untouched.
See [entityrefs.c](src/main/c/rule/entity/entityrefs.c) for the list of entity references considered valid by hyperbuild.
Valid entities that reference a Unicode code point must be between 0x0 and 0x10FFFF (inclusive).
#### `NONSTANDARD_TAG`
It's an error if an unknown (non-standard) tag is reached.
See [tags.c](src/main/c/rule/tag/tags.c) for the list of tags considered valid by hyperbuild.
#### `UCASE_TAG`
It's an error if an opening or closing tag's name has any uppercase characters.
#### `UCASE_ATTR`
It's an error if an attribute's name has any uppercase characters.
#### `UNQUOTED_ATTR`
It's an error if an attribute's value is not quoted with `"` (U+0022) or `'` (U+0027).
This means that `` ` `` is not a valid quote mark regardless of whether this error is suppressed or not. Backticks are valid attribute value quotes in Internet Explorer.
#### `ILLEGAL_CHILD`
It's an error if a tag is declared where it can't be a child of.
This is a very simple check, and does not cover the comprehensive HTML rules, which involve backtracking, tree traversal, and lots of conditionals.
This rule is enforced in four parts:
[whitelistparents.c](src/main/c/rule/relation/whitelistparents.c),
[blacklistparents.c](src/main/c/rule/relation/blacklistparents.c),
[whitelistchildren.c](src/main/c/rule/relation/whitelistchildren.c), and
[blacklistchildren.c](src/main/c/rule/relation/blacklistchildren.c).
#### `UNCLOSED_TAG`
It's an error if a non-void tag is not closed.
See [voidtags.c](src/main/c/rule/tag/voidtags.c) for the list of tags considered void by hyperbuild.
This includes tags that close automatically because of siblings (e.g. `<li><li>`), as it greatly simplifies the complexity of the minifier due to guarantees about the structure.
#### `SELF_CLOSING_TAG`
It's an error if a tag is self-closed. Valid in XML, not in HTML.
#### `NO_SPACE_BEFORE_ATTR`
It's an error if there is no whitespace before an attribute.
Most likely, the cause of this error is either invalid syntax or something like:
#### Beginning and end
```html
<div class="a"name="1"></div>
<p>
··The·quick·brown·fox↵
</p>
```
(Note the lack of space between the end of the `class` attribute and the beginning of the `name` attribute.)
#### Between text and tags
#### `UNEXPECTED_END` and `EXPECTED_NOT_FOUND`
```html
<p>The·quick·brown·fox·<strong>jumps</strong>·over·the·lazy·dog.</p>
```
General syntax errors.
#### Contiguous
#### Additional errors
```html
<select>
··<option>Jan:·········1</option>
··<option>Feb:········10</option>
··<option>Mar:·······100</option>
··<option>Apr:······1000</option>
··<option>May:·····10000</option>
··<option>Jun:····100000</option>
</select>
```
There are additional implicit errors that are considered as general syntax errors due to the way the parser works:
#### Whole text
- Closing void tags; see [voidtags.c](src/main/c/rule/tag/voidtags.c) for the list of tags considered void by hyperbuild.
- Placing whitespace between `=` and attribute names/values.
- Placing whitespace before the tag name in an opening tag.
- Placing whitespace around the tag name in a closing tag.
- Not closing a tag before the end of the file/input.
```html
<p>
···↵
</p>
```
#### Notes
### Tag classification
- Closing `</script>` tags end single-line and multi-line JavaScript comments in `script` tags.
For this to be detected by hyperbuild, the closing tag must not contain any whitespace (e.g. `</script >`).
|Type|Content|
|---|---|
|Formatting tags|Text nodes|
|Content tags|Formatting tags, text nodes|
|Layout tags|Layout tags, content tags|
|Content-first tags|Content of content tags or layout tags (but not both)|
#### Specific tags
Tags not in one of the categories below are **specific tags**.
#### Formatting tags
```html
<strong> moat </strong>
```
#### Content tags
```html
<p>Some <strong>content</strong></p>
```
#### Content-first tags
```html
<li>Anthony</li>
```
```html
<li>
<div>
</div>
</li>
```
#### Layout tags
```html
<div>
<div></div>
</div>
```
### Options
#### `--in`
Path to a file to process. If omitted, hyperbuild will read from `stdin`.
#### `--out`
Path to a file to write to; it will be created if it doesn't exist already. If omitted, the output will be streamed to `stdout`.
#### `--keep`
Don't automatically delete the output file if an error occurred. If the output is `stdout`, or the output is a file but `--buffer` is provided, this option does nothing.
#### `--buffer`
Buffer all output until the process is complete and successful. This won't truncate or write anything to the output until the build process is done, but will use a non-constant amount of memory.
This applies even when the output is `stdout`.
#### `--suppress`
Suppress errors specified by this option. hyperbuild will quitely ignore and continue processing when otherwise one of the provided errors would occur.
Suppressible errors are marked with a `⌫` in the [Errors](#errors) section. Omit the `` prefix. Separate the error names with commas.
### Options
Note that only existing whitespace will be up for removal via minification. Entities that represent whitespace will not be decoded and then removed.
For options that have a list of tags as their value, the tags should be separated by a comma.
An `*` (asterisk, U+002A) can be used to represent the complete set of possible tags. Providing no value represents the empty set.
@ -303,8 +238,6 @@ Don't remove optional starting or ending tags.
Don't remove spaces between attributes when possible.
### Non-options
#### Explicitly important

View File

@ -1 +0,0 @@
#pragma once

View File

@ -1,36 +0,0 @@
#include <stddef.h>
#include <string.h>
#include <hb/err.h>
#include <hbcli/arg.h>
#include <hbcli/err.h>
void hbcli_arg_suppress_parse(hb_err_set* suppressed_errors, char *argv) {
if (argv == NULL) {
return;
}
if (strcmp(argv, "MALFORMED_ENTITY") == 0) {
hb_err_set_add(suppressed_errors, HB_ERR_PARSE_MALFORMED_ENTITY);
} else if (strcmp(argv, "INVALID_ENTITY") == 0) {
hb_err_set_add(suppressed_errors, HB_ERR_PARSE_INVALID_ENTITY);
} else if (strcmp(argv, "NONSTANDARD_TAG") == 0) {
hb_err_set_add(suppressed_errors, HB_ERR_PARSE_NONSTANDARD_TAG);
} else if (strcmp(argv, "UCASE_ATTR") == 0) {
hb_err_set_add(suppressed_errors, HB_ERR_PARSE_UCASE_ATTR);
} else if (strcmp(argv, "UCASE_TAG") == 0) {
hb_err_set_add(suppressed_errors, HB_ERR_PARSE_UCASE_TAG);
} else if (strcmp(argv, "UNQUOTED_ATTR") == 0) {
hb_err_set_add(suppressed_errors, HB_ERR_PARSE_UNQUOTED_ATTR);
} else if (strcmp(argv, "SELF_CLOSING_TAG") == 0) {
hb_err_set_add(suppressed_errors, HB_ERR_PARSE_SELF_CLOSING_TAG);
} else {
hbcli_err("Unrecognised suppressable error `%s`", argv);
}
}

View File

@ -1,5 +0,0 @@
#include <hbcli/arg.h>
#include <hb/collection.h>
void hbcli_arg_tags_parse(char* raw, hb_set_tag_names* set) {
}

View File

@ -1,25 +0,0 @@
#include <stdarg.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void hbcli_err(char *format, ...) {
va_list args;
va_start(args, format);
bool is_tty = isatty(fileno(stdout));
if (is_tty) {
fprintf(stderr, "\x1B[31m\x1B[1m");
}
vfprintf(stderr, format, args);
if (is_tty) {
fprintf(stderr, "\x1B[0m");
}
va_end(args);
fprintf(stderr, "\n");
exit(1);
}

View File

@ -1,3 +0,0 @@
#pragma once
void hbcli_err(char* format, ...);

View File

@ -1,102 +0,0 @@
#include <getopt.h>
#include <hbcli/cli.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdarg.h>
#include <hb/hyperbuild.h>
#include <hb/err.h>
nh_set_str_t hbu_streamoptions_parse_list_of_tags(hbe_err_t *hbe_err, char *argv) {
nh_set_str_t set = NULL;
hb_list_charlist_t list = NULL;
if (argv != NULL && strcmp(argv, "*")) {
return NULL;
}
set = nh_set_str_create();
if (argv == NULL) {
return set;
}
list = hb_list_charlist_create_from_split((hb_proc_char_t *) argv, ',');
for (size_t i = 0; i < list->length; i++) {
hb_list_char_t part = hb_list_charlist_get(list, i);
hb_proc_char_t *part_c = hb_list_char_underlying(part);
if (hb_list_char_get(part, 0) == '$') {
// Set of tags
hb_list_char_shift(part);
HBE_CATCH_F(hbu_streamoptions_parse_and_add_tag_set, (char *) part_c, set);
} else {
// Single tag
if (!hb_rule_tags_check(part_c)) {
HBE_THROW_F(HBE_CLI_INVALID_TAG, "%s is not a standard tag and was provided as part of an argument's value", part_c);
}
nh_set_str_add(set, (char *) hb_list_char_underlying_copy(part));
}
}
finally:
if (list != NULL) {
hb_list_charlist_destroy_from_split(list);
list = NULL;
}
if (*hbe_err != NULL) {
if (set != NULL) {
nh_set_str_destroy(set);
set = NULL;
}
}
return set;
}
void hbu_streamoptions_parse_and_add_tag_set(hbe_err_t *hbe_err, char *set_name, nh_set_str_t set) {
if (strcmp(set_name, "content") == 0) {
hb_rule_contenttags_add_elems(set);
} else if (strcmp(set_name, "contentfirst") == 0) {
hb_rule_contentfirsttags_add_elems(set);
} else if (strcmp(set_name, "formatting") == 0) {
hb_rule_formattingtags_add_elems(set);
} else if (strcmp(set_name, "layout") == 0) {
hb_rule_layouttags_add_elems(set);
} else if (strcmp(set_name, "specific") == 0) {
hb_rule_specifictags_add_elems(set);
} else if (strcmp(set_name, "heading") == 0) {
hb_rule_headingtags_add_elems(set);
} else if (strcmp(set_name, "media") == 0) {
hb_rule_mediatags_add_elems(set);
} else if (strcmp(set_name, "sectioning") == 0) {
hb_rule_sectioningtags_add_elems(set);
} else if (strcmp(set_name, "void") == 0) {
hb_rule_voidtags_add_elems(set);
} else if (strcmp(set_name, "wss") == 0) {
hb_rule_wsstags_add_elems(set);
} else {
HBE_THROW_V(HBE_CLI_INVALID_TAG_SET, "Unrecognised tag set `%s`", set_name);
}
}
int main(int argc, char **argv) {
hyperbuild_init();
hb_rune* output = hyperbuild_from_file(input_path, &cfg, &result);
if (result->code != HB_ERR_OK) {
// An error occurred.
_cli_error(result->msg);
}
open(output_path, )
}

View File

@ -1,94 +0,0 @@
#pragma once
#include <stddef.h>
#include <stdbool.h>
#include "hb-rule.h"
#include "hb-config.h"
static struct hb_config_ex_s _ex_collapse_whitespace_default;
static struct hb_config_ex_s _ex_destroy_whole_whitespace_default;
static struct hb_config_ex_s _ex_trim_whitespace_default;
// WARNING: Rules must be initialised before calling this function
void hb_config_init(void)
{
nh_set_str ex_collapse_whitespace_set = nh_set_str_create();
hb_rule_wsstags_add_elems(ex_collapse_whitespace_set);
_ex_collapse_whitespace_default = {HB_CONFIG_EX_MODE_DEFAULT,
ex_collapse_whitespace_set};
nh_set_str ex_destroy_whole_whitespace_set = nh_set_str_create();
hb_rule_wsstags_add_elems(ex_destroy_whole_whitespace_set);
hb_rule_contenttags_add_elems(ex_destroy_whole_whitespace_set);
hb_rule_formattingtags_add_elems(ex_destroy_whole_whitespace_set);
_ex_destroy_whole_whitespace_default = {
HB_CONFIG_EX_MODE_DEFAULT, ex_destroy_whole_whitespace_set};
nh_set_str ex_trim_whitespace_set = nh_set_str_create();
hb_rule_wsstags_add_elems(ex_trim_whitespace_set);
hb_rule_formattingtags_add_elems(ex_trim_whitespace_set);
_ex_trim_whitespace_default = {HB_CONFIG_EX_MODE_DEFAULT,
ex_trim_whitespace_set};
}
hb_config_t* hb_config_create(void)
{
hb_config_t* config = malloc(sizeof(struct hb_config_s));
config->ex_collapse_whitespace = _ex_collapse_whitespace_default;
config->ex_destroy_whole_whitespace =
_ex_destroy_whole_whitespace_default;
config->ex_trim_whitespace = _ex_trim_whitespace_default;
config->suppressed_errors = nh_set_int32_create();
config->trim_class_attr = true;
config->decode_entities = true;
config->min_conditional_comments = true;
config->remove_attr_quotes = true;
config->remove_comments = true;
config->remove_optional_tags = true;
config->remove_tag_whitespace = true;
return config;
}
void hb_config_ex_use_none(hb_config_ex_t* config_ex)
{
*config_ex = {HB_CONFIG_EX_MODE_NONE, NULL};
}
void hb_config_ex_use_custom(hb_config_ex_t* config_ex, nh_set_str custom_set)
{
*config_ex = {HB_CONFIG_EX_MODE_CUSTOM, custom_set};
}
void hb_config_ex_use_all(hb_config_ex_t* config_ex)
{
*config_ex = {HB_CONFIG_EX_MODE_ALL};
}
void hb_config_destroy(hb_config_t* opt)
{
nh_set_int32_destroy(opt->suppressed_errors);
free(opt);
}
bool hb_config_supressed_error_check(hb_config_t opt, hb_error_t errcode)
{
return nh_set_int32_has(&opt->suppressed_errors, errcode);
}
bool hb_config_ex_check(hb_config_t* config, hb_proc_char_t* query)
{
switch (config->mode) {
case HB_CONFIG_EX_MODE_ALL:
return true;
case HB_CONFIG_EX_MODE_NONE:
return false;
default:
return nh_set_str_has(config->set, query);
}
if (config->mode == HB_CONFIG_EX_MODE_ALL) {
return true;
}
}

View File

@ -1,36 +0,0 @@
#pragma once
typedef enum {
HB_CONFIG_EX_MODE_NONE, // i.e. minify all without exeption
HB_CONFIG_EX_MODE_DEFAULT, // entire struct will not be destroyed
HB_CONFIG_EX_MODE_CUSTOM, // set will be destroyed
HB_CONFIG_EX_MODE_ALL, // i.e. don't minify
} hb_config_ex_mode_t;
typedef struct {
hb_config_ex_mode_t mode;
nh_set_str set;
} hb_config_ex_t;
typedef struct {
hb_config_ex_t ex_collapse_whitespace;
hb_config_ex_t ex_destroy_whole_whitespace;
hb_config_ex_t ex_trim_whitespace;
nh_set_int32 suppressed_errors;
bool trim_class_attributes;
bool decode_entities;
bool remove_attr_quotes;
bool remove_comments;
bool remove_optional_tags;
bool remove_tag_whitespace;
} hb_config_t;
// WARNING: Rules must be initialised before calling this function
void hb_config_init(void);
hb_config_t* hb_config_create(void);
void hb_config_ex_use_none(hb_config_ex_t* config_ex);
void hb_config_ex_use_custom(hb_config_ex_t* config_ex, nh_set_str custom_set);
void hb_config_ex_use_all(hb_config_ex_t* config_ex);
void hb_config_destroy(hb_config_t* opt);
bool hb_config_supressed_error_check(hb_config_t opt, hb_error_t errcode);
bool hb_config_ex_check(hb_config_ex_t* config, hb_proc_char_t* query);

View File

@ -1,162 +0,0 @@
#pragma once
#include <stdbool.h>
#include <string.h>
#include "../../rule/char/ucalpha.c"
#include "../char/char.c"
#include "../execution/error.c"
#include "../list/char.c"
#include "../fstream/fstreamin.c"
#include "../fstream/fstreamout.c"
// Use macro to prevent having to allocate (and therefore free/manage) memory
#define HB_PROC_FORMAT_WITH_POS(fn, a, format, ...) fn(a, format " at %s [line %d, column %d]", __VA_ARGS__, proc->name, proc->line, proc->column);
/**
* Creates an error using a message with the current position appended.
*
* @param proc proc
* @param errcode error code
* @param reason message
* @return error
*/
hbe_err_t hb_proc_error(hb_proc_t* proc, hb_error_t errcode, const char *reason, ...) {
va_list args;
va_start(args, reason);
char *msg = calloc(HB_PROC_MAX_ERR_MSG_LEN + 1, SIZEOF_CHAR);
vsnprintf(msg, HB_PROC_MAX_ERR_MSG_LEN, reason, args);
va_end(args);
hbe_err_t err = HBU_FN_FORMAT_WITH_POS(hbe_err_create, errcode, "%s", msg);
free(msg);
return err;
}
/**
* Writes a character to the redirect, if enabled, otherwise output, of a proc,
* unless the output is masked.
*
* @param hbe_err pointer to hbe_err_t
* @param proc proc
* @param c character to write
* @return a freshly-created proc
* @throws on write error
*/
static void _hb_proc_write_to_output(hbe_err_t *hbe_err, hb_proc_t* proc, hb_proc_char_t c) {
if (!proc->mask) {
hb_list_char_t redirect = proc->redirect;
if (redirect != NULL) {
hb_list_char_append(redirect, c);
} else {
HBE_CATCH_V((*proc->writer), proc->output, c);
}
}
}
/*
*
* INSTANCE MANAGEMENT FUNCTIONS
*
*/
/**
* Allocates memory for a proc, and creates one with provided arguments.
*
* @param input input
* @param reader reader
* @param name name
* @param output output
* @param writer writer
* @return a freshly-created proc
*/
hb_proc_t* hb_proc_create_blank(char *name) {
hb_proc_t* proc = calloc(1, sizeof(hb_proc_t));
proc->name = name;
proc->input = NULL;
proc->reader = NULL;
proc->EOI = false;
proc->line = 1;
proc->column = 0;
proc->CR = false;
proc->output = NULL;
proc->writer = NULL;
proc->buffer = nh_list_ucp_create();
proc->mask = false;
proc->redirect = NULL;
return proc;
}
/**
* Frees all memory associated with a proc.
*
* @param proc proc
*/
void hb_proc_destroy(hb_proc_t* proc) {
nh_list_ucp_destroy(proc->buffer);
free(proc);
}
/**
* Enables or disables the output mask.
* When the output mask is enabled, all writes are simply discarded and not actually written to output.
*
* @param proc proc
* @param mask 1 to enable, 0 to disable
* @return previous state
*/
int hb_proc_toggle_output_mask(hb_proc_t* proc, int mask) {
int current = proc->mask;
proc->mask = mask;
return current;
}
/**
* Enables or disables the output redirect.
* When the output redirect is enabled, all writes are written to a buffer instead of the output.
*
* @param proc proc
* @param redirect buffer to redirect writes to, or NULL to disable
*/
void hb_proc_set_redirect(hb_proc_t* proc, hb_list_char_t redirect) {
proc->redirect = redirect;
}
void hb_proc_blank_set_input_fstreamin(hb_proc_t* proc, hbu_fstreamin_t fstreamin) {
proc->input = fstreamin;
proc->reader = (hb_proc_reader_cb_t) &hbu_fstreamin_read;
}
// Wrapper function for hb_list_char_shift to make it compatible with hb_proc_reader_cb_t
static hb_eod_char_t hb_proc_read_from_list_char_input(hbe_err_t *hbe_err, hb_list_char_t input) {
(void) hbe_err;
return hb_list_char_shift(input);
}
void hb_proc_blank_set_input_buffer(hb_proc_t* proc, hb_list_char_t buf) {
proc->input = buf;
proc->reader = (hb_proc_reader_cb_t) &hb_proc_read_from_list_char_input;
}
static void hb_proc_blank_set_output_fstreamout(hb_proc_t* proc, hbu_fstreamout_t fstreamout) {
proc->output = fstreamout;
proc->writer = (hb_proc_writer_cb_t) &hbu_fstreamout_write;
}
// Wrapper function for hb_list_char_append to make it compatible with hb_proc_writer_cb_t
void hb_proc_write_to_list_char_output(hbe_err_t *hbe_err, hb_list_char_t output, hb_proc_char_t c) {
(void) hbe_err;
hb_list_char_append(output, c);
}
void hb_proc_blank_set_output_buffer(hb_proc_t* proc, hb_list_char_t buf) {
proc->output = buf;
proc->writer = (hb_proc_writer_cb_t) &hb_proc_write_to_list_char_output;
}

View File

@ -1,73 +0,0 @@
#pragma once
#include <setjmp.h>
#include "hb-data.h"
#include "hb-config.h"
typedef int32_t hb_proc_char_t;
#define HB_PROC_CHAR_EOD -1 // End Of Data
#define HB_PROC_CHAR_SIZE sizeof(hb_proc_char_t)
typedef bool hb_proc_predicate_t(hb_proc_char_t);
// Reader and writer callbacks. The last parameter is a pointer to an error
// message. If the last parameter is not NULL, it is assumed an error occurred.
// The error message WILL BE free'd by the callee automatically, so ensure the
// message was created using malloc or strdup, and is not free'd by the function
// or anything else afterwards.
typedef hb_proc_char_t hb_proc_reader_t(void*, char**);
typedef void hb_proc_writer_t(void*, hb_proc_char_t, char**);
#define HB_PROC_MEMORY_CREATE(name) \
hb_proc_list_memory_instance_add_right_and_return( \
config->memory_instances, name##_create()); \
hb_proc_list_memory_destructor_add_right( \
config->memory_destructors, \
(hb_proc_memory_destructor_t*) &name##_destroy);
NH_LIST(hb_proc_list_memory_instance, void*, sizeof(void*), void*, NULL);
void* hb_proc_list_memory_instance_add_right_and_return(
hb_proc_list_memory_instance*, void*);
typedef void hb_proc_memory_destructor_t(void*);
NH_LIST(hb_proc_list_memory_destructor, hb_proc_memory_destructor_t*,
sizeof(hb_proc_memory_destructor_t*), hb_proc_memory_destructor_t*,
NULL);
#define HB_PROC_ERROR_MESSAGE_SIZE 1024
typedef struct {
hb_error_t code;
char* message;
} hb_proc_result_t;
typedef struct {
char* name;
jmp_buf start;
hb_proc_list_memory_instance* memory_instances;
hb_proc_list_memory_destructor* memory_destructors;
void* input;
hb_proc_reader_t* reader;
bool EOI;
int line;
int column;
bool CR;
void* output;
hb_proc_writer_t* writer;
nh_list_ucp* buffer;
bool mask;
nh_list_ucp* redirect;
hb_config_t config;
} hb_proc_t;
hb_proc_t* hb_proc_create_blank(char* name);
void hb_proc_result_destroy(hb_proc_result_t* result);
hb_proc_result_t* hb_proc_start(hb_proc_t* proc);
void _hb_proc_error(hb_proc_t* proc, hb_error_t code, char const* format, ...);

View File

@ -1,5 +0,0 @@
#include <hb/hyperbuild.h>
int main(int argc, char** argv) {
hyperbuild_init();
}

View File

@ -1,87 +0,0 @@
#include <hb/cfg.h>
#include <hbcli/opt.h>
#include <stddef.h>
#include <hb/proc.h>
#include <getopt.h>
#include <hbcli/err.h>
hb_cfg hbcli_options_parse(int argc, char** argv) {
// Prepare config
char *input_path = NULL;
char *output_path = NULL;
hb_cfg cfg;
hb_proc_result result;
bool nondefault_ex_collapse_whitespace = false;
bool nondefault_ex_destroy_whole_whitespace = false;
bool nondefault_ex_trim_whitespace = false;
struct option long_options[] = {
{"input", required_argument, NULL, 'i'},
{"output", required_argument, NULL, 'o'},
{"suppress", required_argument, NULL, 's'},
{"!collapse-ws", optional_argument, NULL, 40},
{"!destroy-whole-ws", optional_argument, NULL, 41},
{"!trim-ws", optional_argument, NULL, 42},
{"!trim-class-attr", no_argument, &(cfg.trim_class_attr), false},
{"!decode-ent", no_argument, &(cfg.decode_entities), false},
{"!remove-attr-quotes", no_argument, &(cfg.decode_entities), false},
{"!remove-comments", no_argument, &(cfg.remove_comments), false},
{"!remove-tag-ws", no_argument, &(cfg.remove_tag_whitespace), false},
{0, 0, 0, 0}
};
// Parse arguments
while (1) {
int option_index = 0;
int c = getopt_long(argc, argv, "+i:o:s:", long_options, &option_index);
if (c == -1) {
if (optind != argc) {
hbcli_err("Too many options");
}
break;
}
switch (c) {
case 'i':
input_path = optarg;
break;
case 'o':
output_path = optarg;
break;
case 's':
HBE_CATCH_F(hbu_streamoptions_parse_and_add_errors_to_suppress, config_stream->suppressed_errors, optarg);
break;
case 40:
nondefault_ex_collapse_whitespace = 1;
config_stream->ex_collapse_whitespace = HBE_CATCH_F(hbu_streamoptions_parse_list_of_tags, optarg);
break;
case 41:
nondefault_ex_destroy_whole_whitespace = 1;
config_stream->ex_destroy_whole_whitespace = HBE_CATCH_F(hbu_streamoptions_parse_list_of_tags, optarg);
break;
case 42:
nondefault_ex_trim_whitespace = 1;
config_stream->ex_trim_whitespace = HBE_CATCH_F(hbu_streamoptions_parse_list_of_tags, optarg);
break;
default:
cli_error("Internal error: unknown option %c");
}
}
if (!nondefault_ex_collapse_whitespace) config_stream->ex_collapse_whitespace = hbu_streamoptions_default_ex_collapse_whitespace();
if (!nondefault_ex_destroy_whole_whitespace) config_stream->ex_destroy_whole_whitespace = hbu_streamoptions_default_ex_destroy_whole_whitespace();
if (!nondefault_ex_trim_whitespace) config_stream->ex_trim_whitespace = hbu_streamoptions_default_ex_trim_whitespace();
}

View File

@ -1,5 +0,0 @@
#pragma once
#include <hb/cfg.h>
hb_cfg hbcli_opt_parse(int argc, char** argv);

View File

@ -56,6 +56,8 @@ pub struct Processor<'d> {
// Match.
// Need to record start as we might get slice after keeping or skipping.
match_start: usize,
// Position in output match has been written to. Useful for long term slices where source would already be overwritten.
match_dest: usize,
// Guaranteed amount of characters that exist from `start` at time of creation of this struct.
match_len: usize,
// Character matched, if any. Only exists for single-character matches and if matched.
@ -88,7 +90,7 @@ impl<'d> Index<ProcessorRange> for Processor<'d> {
impl<'d> Processor<'d> {
// Constructor.
pub fn new(code: &mut [u8]) -> Processor {
Processor { write_next: 0, read_next: 0, code, match_start: 0, match_len: 0, match_char: None, match_reason: RequireReason::Custom }
Processor { write_next: 0, read_next: 0, code, match_start: 0, match_dest: 0, match_len: 0, match_char: None, match_reason: RequireReason::Custom }
}
// INTERNAL APIs.
@ -192,9 +194,15 @@ impl<'d> Processor<'d> {
pub fn range(&self) -> ProcessorRange {
ProcessorRange { start: self.match_start, end: self.match_start + self.match_len }
}
pub fn out_range(&self) -> ProcessorRange {
ProcessorRange { start: self.match_dest, end: self.match_dest + self.match_len }
}
pub fn slice(&self) -> &[u8] {
&self.code[self.match_start..self.match_start + self.match_len]
}
pub fn out_slice(&self) -> &[u8] {
&self.code[self.match_dest..self.match_dest + self.match_len]
}
// Assert match.
pub fn require(&self) -> ProcessingResult<()> {
@ -211,6 +219,7 @@ impl<'d> Processor<'d> {
// Take action on match.
// Note that match_len has already been verified to be valid, so don't need to bounds check again.
pub fn keep(&mut self) -> () {
self.match_dest = self.write_next;
self._shift(self.match_len);
}
pub fn discard(&mut self) -> () {

View File

@ -13,13 +13,26 @@ fn is_valid_tag_name_char(c: u8) -> bool {
is_alphanumeric(c) || c == b':' || c == b'-'
}
enum TagType {
Script,
Style,
Other,
}
pub fn process_tag(proc: &mut Processor) -> ProcessingResult<()> {
// TODO Minify opening and closing tag whitespace before name and after name/last attr.
// TODO DOC No checking if opening and closing names match.
// Expect to be currently at an opening tag.
chain!(proc.match_char(b'<').expect().keep());
// May not be valid tag name at current position, so require instead of expect.
let opening_name_range = chain!(proc.match_while_pred(is_valid_tag_name_char).require_with_reason("tag name")?.keep().range());
let opening_name_range = chain!(proc.match_while_pred(is_valid_tag_name_char).require_with_reason("tag name")?.keep().out_range());
// TODO DOC: Tags must be case sensitive.
let tag_type = match &proc[opening_name_range] {
b"script" => TagType::Script,
b"style" => TagType::Style,
_ => TagType::Other,
};
let mut last_attr_type: Option<AttrType> = None;
let mut self_closing = false;
@ -47,7 +60,7 @@ pub fn process_tag(proc: &mut Processor) -> ProcessingResult<()> {
// Write space after tag name or unquoted/valueless attribute.
match last_attr_type {
Some(AttrType::Quoted) => {},
Some(AttrType::Quoted) => {}
_ => proc.write(b' '),
};
@ -58,10 +71,9 @@ pub fn process_tag(proc: &mut Processor) -> ProcessingResult<()> {
return Ok(());
};
// TODO DOC: Tags must be case sensitive.
match &proc[opening_name_range] {
b"script" => process_script(proc)?,
b"style" => process_style(proc)?,
match tag_type {
TagType::Script => process_script(proc)?,
TagType::Style => process_style(proc)?,
_ => process_content(proc, Some(opening_name_range))?,
};

View File

@ -1,64 +0,0 @@
#include <hb/hyperbuild.h>
#include <hb/unit.h>
#include <hbtest.h>
#include <stdio.h>
#include <string.h>
// An attribute value:
// - delimited by double quotes
// - containing one single quote literal
// - containing one single quote encoded
// - containing three double quotes encoded
// - with multiple whitespace sequences of length 2 and higher, including at
// the start and end
#define INPUT "\" abc&apos;'&quot; &quot;&quot; a \" 1"
int main(void)
{
hyperbuild_init();
hb_err_set* suppressed = hb_err_set_create();
hb_rune* src = malloc(sizeof(INPUT) + 1);
memcpy(src, INPUT "\xFF", sizeof(INPUT) + 1);
hb_cfg cfg = {
.collapse_whitespace =
{
.mode = HB_CFG_TAGS_SET_MODE_ALL,
.set = NULL,
},
.destroy_whole_whitespace =
{
.mode = HB_CFG_TAGS_SET_MODE_ALL,
.set = NULL,
},
.trim_whitespace =
{
.mode = HB_CFG_TAGS_SET_MODE_ALL,
.set = NULL,
},
.suppressed_errors = *suppressed,
.trim_class_attributes = true,
.decode_entities = true,
.remove_attr_quotes = true,
.remove_comments = true,
.remove_tag_whitespace = true,
};
hb_proc proc = {
.cfg = &cfg,
.src = src,
.src_len = sizeof(INPUT) - 1,
.src_next = 0,
.out = src,
.out_next = 0,
};
hb_unit_attr_val_quoted(&proc, true);
src[proc.out_next] = 0;
printf("%s\n", src);
hb_err_set_destroy(suppressed);
}