mirror of
https://github.com/woodpecker-ci/woodpecker.git
synced 2025-01-27 09:38:37 +00:00
425 lines
14 KiB
Markdown
425 lines
14 KiB
Markdown
|
[![Build Status](https://travis-ci.org/client9/misspell.svg?branch=master)](https://travis-ci.org/client9/misspell) [![Go Report Card](https://goreportcard.com/badge/github.com/client9/misspell)](https://goreportcard.com/report/github.com/client9/misspell) [![GoDoc](https://godoc.org/github.com/client9/misspell?status.svg)](https://godoc.org/github.com/client9/misspell) [![Coverage](http://gocover.io/_badge/github.com/client9/misspell)](http://gocover.io/github.com/client9/misspell) [![license](https://img.shields.io/badge/license-MIT-blue.svg?style=flat)](https://raw.githubusercontent.com/client9/misspell/master/LICENSE)
|
|||
|
|
|||
|
Correct commonly misspelled English words... quickly.
|
|||
|
|
|||
|
### Install
|
|||
|
|
|||
|
|
|||
|
If you just want a binary and to start using `misspell`:
|
|||
|
|
|||
|
```
|
|||
|
curl -L -o ./install-misspell.sh https://git.io/misspell
|
|||
|
sh ./install-misspell.sh
|
|||
|
```
|
|||
|
|
|||
|
|
|||
|
Both will install as `./bin/misspell`. You can adjust the download location using the `-b` flag. File a ticket if you want another platform supported.
|
|||
|
|
|||
|
|
|||
|
If you use [Go](https://golang.org/), the best way to run `misspell` is by using [gometalinter](#gometalinter). Otherwise, install `misspell` the old-fashioned way:
|
|||
|
|
|||
|
```
|
|||
|
go get -u github.com/client9/misspell/cmd/misspell
|
|||
|
```
|
|||
|
|
|||
|
and misspell will be in your `GOPATH`
|
|||
|
|
|||
|
|
|||
|
Also if you like to live dangerously, one could do
|
|||
|
|
|||
|
```bash
|
|||
|
curl -L https://git.io/misspell | bash
|
|||
|
```
|
|||
|
|
|||
|
### Usage
|
|||
|
|
|||
|
|
|||
|
```bash
|
|||
|
$ misspell all.html your.txt important.md files.go
|
|||
|
your.txt:42:10 found "langauge" a misspelling of "language"
|
|||
|
|
|||
|
# ^ file, line, column
|
|||
|
```
|
|||
|
|
|||
|
```
|
|||
|
$ misspell -help
|
|||
|
Usage of misspell:
|
|||
|
-debug
|
|||
|
Debug matching, very slow
|
|||
|
-error
|
|||
|
Exit with 2 if misspelling found
|
|||
|
-f string
|
|||
|
'csv', 'sqlite3' or custom Golang template for output
|
|||
|
-i string
|
|||
|
ignore the following corrections, comma separated
|
|||
|
-j int
|
|||
|
Number of workers, 0 = number of CPUs
|
|||
|
-legal
|
|||
|
Show legal information and exit
|
|||
|
-locale string
|
|||
|
Correct spellings using locale perferances for US or UK. Default is to use a neutral variety of English. Setting locale to US will correct the British spelling of 'colour' to 'color'
|
|||
|
-o string
|
|||
|
output file or [stderr|stdout|] (default "stdout")
|
|||
|
-q Do not emit misspelling output
|
|||
|
-source string
|
|||
|
Source mode: auto=guess, go=golang source, text=plain or markdown-like text (default "auto")
|
|||
|
-w Overwrite file with corrections (default is just to display)
|
|||
|
```
|
|||
|
|
|||
|
## FAQ
|
|||
|
|
|||
|
* [Automatic Corrections](#correct)
|
|||
|
* [Converting UK spellings to US](#locale)
|
|||
|
* [Using pipes and stdin](#stdin)
|
|||
|
* [Golang special support](#golang)
|
|||
|
* [gometalinter support](#gometalinter)
|
|||
|
* [CSV Output](#csv)
|
|||
|
* [Using SQLite3](#sqlite)
|
|||
|
* [Changing output format](#output)
|
|||
|
* [Checking a folder recursively](#recursive)
|
|||
|
* [Performance](#performance)
|
|||
|
* [Known Issues](#issues)
|
|||
|
* [Debugging](#debug)
|
|||
|
* [False Negatives and missing words](#missing)
|
|||
|
* [Origin of Word Lists](#words)
|
|||
|
* [Software License](#license)
|
|||
|
* [Problem statement](#problem)
|
|||
|
* [Other spelling correctors](#others)
|
|||
|
* [Other ideas](#otherideas)
|
|||
|
|
|||
|
<a name="correct"></a>
|
|||
|
### How can I make the corrections automatically?
|
|||
|
|
|||
|
Just add the `-w` flag!
|
|||
|
|
|||
|
```
|
|||
|
$ misspell -w all.html your.txt important.md files.go
|
|||
|
your.txt:9:21:corrected "langauge" to "language"
|
|||
|
|
|||
|
# ^ File is rewritten only if a misspelling is found
|
|||
|
```
|
|||
|
|
|||
|
<a name="locale"></a>
|
|||
|
### How do I convert British spellings to American (or vice-versa)?
|
|||
|
|
|||
|
Add the `-locale US` flag!
|
|||
|
|
|||
|
```bash
|
|||
|
$ misspell -locale US important.txt
|
|||
|
important.txt:10:20 found "colour" a misspelling of "color"
|
|||
|
```
|
|||
|
|
|||
|
Add the `-locale UK` flag!
|
|||
|
|
|||
|
```bash
|
|||
|
$ echo "My favorite color is blue" | misspell -locale UK
|
|||
|
stdin:1:3:found "favorite color" a misspelling of "favourite colour"
|
|||
|
```
|
|||
|
|
|||
|
Help is appreciated as I'm neither British nor an
|
|||
|
expert in the English language.
|
|||
|
|
|||
|
<a name="recursive"></a>
|
|||
|
### How do you check an entire folder recursively?
|
|||
|
|
|||
|
Just list a directory you'd like to check
|
|||
|
|
|||
|
```bash
|
|||
|
misspell .
|
|||
|
misspell aDirectory anotherDirectory aFile
|
|||
|
```
|
|||
|
|
|||
|
You can also run misspell recursively using the following shell tricks:
|
|||
|
|
|||
|
```bash
|
|||
|
misspell directory/**/*
|
|||
|
```
|
|||
|
|
|||
|
or
|
|||
|
|
|||
|
```bash
|
|||
|
find . -type f | xargs misspell
|
|||
|
```
|
|||
|
|
|||
|
You can select a type of file as well. The following examples selects all `.txt` files that are *not* in the `vendor` directory:
|
|||
|
|
|||
|
```bash
|
|||
|
find . -type f -name '*.txt' | grep -v vendor/ | xargs misspell -error
|
|||
|
```
|
|||
|
|
|||
|
<a name="stdin"></a>
|
|||
|
### Can I use pipes or `stdin` for input?
|
|||
|
|
|||
|
Yes!
|
|||
|
|
|||
|
Print messages to `stderr` only:
|
|||
|
|
|||
|
```bash
|
|||
|
$ echo "zeebra" | misspell
|
|||
|
stdin:1:0:found "zeebra" a misspelling of "zebra"
|
|||
|
```
|
|||
|
|
|||
|
Print messages to `stderr`, and corrected text to `stdout`:
|
|||
|
|
|||
|
```bash
|
|||
|
$ echo "zeebra" | misspell -w
|
|||
|
stdin:1:0:corrected "zeebra" to "zebra"
|
|||
|
zebra
|
|||
|
```
|
|||
|
|
|||
|
Only print the corrected text to `stdout`:
|
|||
|
|
|||
|
```bash
|
|||
|
$ echo "zeebra" | misspell -w -q
|
|||
|
zebra
|
|||
|
```
|
|||
|
|
|||
|
<a name="golang"></a>
|
|||
|
### Are there special rules for golang source files?
|
|||
|
|
|||
|
Yes! If the file ends in `.go`, then misspell will only check spelling in
|
|||
|
comments.
|
|||
|
|
|||
|
If you want to force a file to be checked as a golang source, use `-source=go`
|
|||
|
on the command line. Conversely, you can check a golang source as if it were
|
|||
|
pure text by using `-source=text`. You might want to do this since many
|
|||
|
variable names have misspellings in them!
|
|||
|
|
|||
|
### Can I check only-comments in other other programming languages?
|
|||
|
|
|||
|
I'm told the using `-source=go` works well for ruby, javascript, java, c and
|
|||
|
c++.
|
|||
|
|
|||
|
It doesn't work well for python and bash.
|
|||
|
|
|||
|
<a name="gometalinter"></a>
|
|||
|
### Does this work with gometalinter?
|
|||
|
|
|||
|
[gometalinter](https://github.com/alecthomas/gometalinter) runs
|
|||
|
multiple golang linters. Starting on [2016-06-12](https://github.com/alecthomas/gometalinter/pull/134)
|
|||
|
gometalinter supports `misspell` natively but it is disabled by default.
|
|||
|
|
|||
|
```bash
|
|||
|
# update your copy of gometalinter
|
|||
|
go get -u github.com/alecthomas/gometalinter
|
|||
|
|
|||
|
# install updates and misspell
|
|||
|
gometalinter --install --update
|
|||
|
```
|
|||
|
|
|||
|
To use, just enable `misspell`
|
|||
|
|
|||
|
```
|
|||
|
gometalinter --enable misspell ./...
|
|||
|
```
|
|||
|
|
|||
|
Note that gometalinter only checks golang files, and uses the default options
|
|||
|
of `misspell`
|
|||
|
|
|||
|
You may wish to run this on your plaintext (.txt) and/or markdown files too.
|
|||
|
|
|||
|
|
|||
|
<a name="csv"></a>
|
|||
|
### How Can I Get CSV Output?
|
|||
|
|
|||
|
Using `-f csv`, the output is standard comma-seprated values with headers in the first row.
|
|||
|
|
|||
|
```
|
|||
|
misspell -f csv *
|
|||
|
file,line,column,typo,corrected
|
|||
|
"README.md",9,22,langauge,language
|
|||
|
"README.md",47,25,langauge,language
|
|||
|
```
|
|||
|
|
|||
|
<a name="sqlite"></a>
|
|||
|
### How can I export to SQLite3?
|
|||
|
|
|||
|
Using `-f sqlite`, the output is a [sqlite3](https://www.sqlite.org/index.html) dump-file.
|
|||
|
|
|||
|
```bash
|
|||
|
$ misspell -f sqlite * > /tmp/misspell.sql
|
|||
|
$ cat /tmp/misspell.sql
|
|||
|
|
|||
|
PRAGMA foreign_keys=OFF;
|
|||
|
BEGIN TRANSACTION;
|
|||
|
CREATE TABLE misspell(
|
|||
|
"file" TEXT,
|
|||
|
"line" INTEGER,i
|
|||
|
"column" INTEGER,i
|
|||
|
"typo" TEXT,
|
|||
|
"corrected" TEXT
|
|||
|
);
|
|||
|
INSERT INTO misspell VALUES("install.txt",202,31,"immediatly","immediately");
|
|||
|
# etc...
|
|||
|
COMMIT;
|
|||
|
```
|
|||
|
|
|||
|
```bash
|
|||
|
$ sqlite3 -init /tmp/misspell.sql :memory: 'select count(*) from misspell'
|
|||
|
1
|
|||
|
```
|
|||
|
|
|||
|
With some tricks you can directly pipe output to sqlite3 by using `-init /dev/stdin`:
|
|||
|
|
|||
|
```
|
|||
|
misspell -f sqlite * | sqlite3 -init /dev/stdin -column -cmd '.width 60 15' ':memory' \
|
|||
|
'select substr(file,35),typo,count(*) as count from misspell group by file, typo order by count desc;'
|
|||
|
```
|
|||
|
|
|||
|
<a name="ignore"></a>
|
|||
|
### How can I ignore rules?
|
|||
|
|
|||
|
Using the `-i "comma,separated,rules"` flag you can specify corrections to ignore.
|
|||
|
|
|||
|
For example, if you were to run `misspell -w -error -source=text` against document that contains the string `Guy Finkelshteyn Braswell`, misspell would change the text to `Guy Finkelstheyn Bras well`. You can then
|
|||
|
determine the rules to ignore by reverting the change and running the with the `-debug` flag. You can then see
|
|||
|
that the corrections were `htey -> they` and `aswell -> as well`. To ignore these two rules, you add `-i "htey,aswell"` to
|
|||
|
your command. With debug mode on, you can see it print the corrections, but it will no longer make them.
|
|||
|
|
|||
|
<a name="output"></a>
|
|||
|
### How can I change the output format?
|
|||
|
|
|||
|
Using the `-f template` flag you can pass in a
|
|||
|
[golang text template](https://golang.org/pkg/text/template/) to format the output.
|
|||
|
|
|||
|
One can use `printf "%q" VALUE` to safely quote a value.
|
|||
|
|
|||
|
The default template is compatible with [gometalinter](https://github.com/alecthomas/gometalinter)
|
|||
|
```
|
|||
|
{{ .Filename }}:{{ .Line }}:{{ .Column }}:corrected {{ printf "%q" .Original }} to "{{ printf "%q" .Corrected }}"
|
|||
|
```
|
|||
|
|
|||
|
To just print probable misspellings:
|
|||
|
|
|||
|
```
|
|||
|
-f '{{ .Original }}'
|
|||
|
```
|
|||
|
|
|||
|
<a name="problem"></a>
|
|||
|
### What problem does this solve?
|
|||
|
|
|||
|
This corrects commonly misspelled English words in computer source
|
|||
|
code, and other text-based formats (`.txt`, `.md`, etc).
|
|||
|
|
|||
|
It is designed to run quickly so it can be
|
|||
|
used as a [pre-commit hook](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks)
|
|||
|
with minimal burden on the developer.
|
|||
|
|
|||
|
It does not work with binary formats (e.g. Word, etc).
|
|||
|
|
|||
|
It is not a complete spell-checking program nor a grammar checker.
|
|||
|
|
|||
|
<a name="others"></a>
|
|||
|
### What are other misspelling correctors and what's wrong with them?
|
|||
|
|
|||
|
Some other misspelling correctors:
|
|||
|
|
|||
|
* https://github.com/vlajos/misspell_fixer
|
|||
|
* https://github.com/lyda/misspell-check
|
|||
|
* https://github.com/lucasdemarchi/codespell
|
|||
|
|
|||
|
They all work but had problems that prevented me from using them at scale:
|
|||
|
|
|||
|
* slow, all of the above check one misspelling at a time (i.e. linear) using regexps
|
|||
|
* not MIT/Apache2 licensed (or equivalent)
|
|||
|
* have dependencies that don't work for me (python3, bash, linux sed, etc)
|
|||
|
* don't understand American vs. British English and sometimes makes unwelcome "corrections"
|
|||
|
|
|||
|
That said, they might be perfect for you and many have more features
|
|||
|
than this project!
|
|||
|
|
|||
|
<a name="performance"></a>
|
|||
|
### How fast is it?
|
|||
|
|
|||
|
Misspell is easily 100x to 1000x faster than other spelling correctors. You
|
|||
|
should be able to check and correct 1000 files in under 250ms.
|
|||
|
|
|||
|
This uses the mighty power of golang's
|
|||
|
[strings.Replacer](https://golang.org/pkg/strings/#Replacer) which is
|
|||
|
a implementation or variation of the
|
|||
|
[Aho–Corasick algorithm](https://en.wikipedia.org/wiki/Aho–Corasick_algorithm).
|
|||
|
This makes multiple substring matches *simultaneously*.
|
|||
|
|
|||
|
In addition this uses multiple CPU cores to work on multiple files.
|
|||
|
|
|||
|
<a name="issues"></a>
|
|||
|
### What problems does it have?
|
|||
|
|
|||
|
Unlike the other projects, this doesn't know what a "word" is. There may be
|
|||
|
more false positives and false negatives due to this. On the other hand, it
|
|||
|
sometimes catches things others don't.
|
|||
|
|
|||
|
Either way, please file bugs and we'll fix them!
|
|||
|
|
|||
|
Since it operates in parallel to make corrections, it can be non-obvious to
|
|||
|
determine exactly what word was corrected.
|
|||
|
|
|||
|
<a name="debug"></a>
|
|||
|
### It's making mistakes. How can I debug?
|
|||
|
|
|||
|
Run using `-debug` flag on the file you want. It should then print what word
|
|||
|
it is trying to correct. Then [file a
|
|||
|
bug](https://github.com/client9/misspell/issues) describing the problem.
|
|||
|
Thanks!
|
|||
|
|
|||
|
<a name="missing"></a>
|
|||
|
### Why is it making mistakes or missing items in golang files?
|
|||
|
|
|||
|
The matching function is *case-sensitive*, so variable names that are multiple
|
|||
|
worlds either in all-upper or all-lower case sometimes can cause false
|
|||
|
positives. For instance a variable named `bodyreader` could trigger a false
|
|||
|
positive since `yrea` is in the middle that could be corrected to `year`.
|
|||
|
Other problems happen if the variable name uses a English contraction that
|
|||
|
should use an apostrophe. The best way of fixing this is to use the
|
|||
|
[Effective Go naming
|
|||
|
conventions](https://golang.org/doc/effective_go.html#mixed-caps) and use
|
|||
|
[camelCase](https://en.wikipedia.org/wiki/CamelCase) for variable names. You
|
|||
|
can check your code using [golint](https://github.com/golang/lint)
|
|||
|
|
|||
|
<a name="license"></a>
|
|||
|
### What license is this?
|
|||
|
|
|||
|
The main code is [MIT](https://github.com/client9/misspell/blob/master/LICENSE).
|
|||
|
|
|||
|
Misspell also makes uses of the Golang standard library and contains a modified version of Golang's [strings.Replacer](https://golang.org/pkg/strings/#Replacer)
|
|||
|
which are covered under a [BSD License](https://github.com/golang/go/blob/master/LICENSE). Type `misspell -legal` for more details or see [legal.go](https://github.com/client9/misspell/blob/master/legal.go)
|
|||
|
|
|||
|
<a name="words"></a>
|
|||
|
### Where do the word lists come from?
|
|||
|
|
|||
|
It started with a word list from
|
|||
|
[Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines).
|
|||
|
Unfortunately, this list had to be highly edited as many of the words are
|
|||
|
obsolete or based from mistakes on mechanical typewriters (I'm guessing).
|
|||
|
|
|||
|
Additional words were added based on actually mistakes seen in
|
|||
|
the wild (meaning self-generated).
|
|||
|
|
|||
|
Variations of UK and US spellings are based on many sources including:
|
|||
|
|
|||
|
* http://www.tysto.com/uk-us-spelling-list.html (with heavy editing, many are incorrect)
|
|||
|
* http://www.oxforddictionaries.com/us/words/american-and-british-spelling-american (excellent site but incomplete)
|
|||
|
* Diffing US and UK [scowl dictionaries](http://wordlist.aspell.net)
|
|||
|
|
|||
|
American English is more accepting of spelling variations than is British
|
|||
|
English, so "what is American or not" is subject to opinion. Corrections and help welcome.
|
|||
|
|
|||
|
<a name="otherideas"></a>
|
|||
|
### What are some other enhancements that could be done?
|
|||
|
|
|||
|
Here's some ideas for enhancements:
|
|||
|
|
|||
|
*Capitalization of proper nouns* could be done (e.g. weekday and month names, country names, language names)
|
|||
|
|
|||
|
*Opinionated US spellings* US English has a number of words with alternate
|
|||
|
spellings. Think [adviser vs.
|
|||
|
advisor](http://grammarist.com/spelling/adviser-advisor/). While "advisor" is not wrong, the opinionated US
|
|||
|
locale would correct "advisor" to "adviser".
|
|||
|
|
|||
|
*Versioning* Some type of versioning is needed so reporting mistakes and errors is easier.
|
|||
|
|
|||
|
*Feedback* Mistakes would be sent to some server for agregation and feedback review.
|
|||
|
|
|||
|
*Contractions and Apostrophes* This would optionally correct "isnt" to
|
|||
|
"isn't", etc.
|