Commit graph

10 commits

Author SHA1 Message Date
Gusted 5b3a82d621
[FEAT] Enable ambiguous character detection in configured contexts
- The ambiguous character detection is an important security feature to
combat against sourcebase attacks (https://trojansource.codes/).
- However there are a few problems with the feature as it stands
today (i) it's apparantly an big performance hitter, it's twice as slow
as syntax highlighting (ii) it contains false positives, because it's
reporting valid problems but not valid within the context of a
programming language (ambiguous charachters in code comments being a
prime example) that can lead to security issues (iii) charachters from
certain languages always being marked as ambiguous. It's a lot of effort
to fix the aforementioned issues.
- Therefore, make it configurable in which context the ambiguous
character detection should be run, this avoids running detection in all
contexts such as file views, but still enable it in commits and pull
requests diffs where it matters the most. Ideally this also becomes an
per-repository setting, but the code architecture doesn't allow for a
clean implementation of that.
- Adds unit test.
- Adds integration tests to ensure that the contexts and instance-wide
is respected (and that ambigious charachter detection actually work in
different places).
- Ref: https://codeberg.org/forgejo/forgejo/pulls/2395#issuecomment-1575547
- Ref: https://codeberg.org/forgejo/forgejo/issues/564
2024-02-23 13:12:17 +01:00
wxiaoguang 20929edc99
Add option to disable ambiguous unicode characters detection (#28454)
* Close #24483
* Close #28123
* Close #23682
* Close #23149

(maybe more)
2023-12-17 14:38:54 +00:00
crystal 5d9c64b3fe
Fix line spacing for plaintext previews (#22699)
Adding `<br>` between each line is not necessary since the entire file
is rendered inside a `<pre>`

fixes https://codeberg.org/Codeberg/Community/issues/915
2023-02-01 22:51:02 -06:00
zeripath 6e22605793
Ensure that plain files are rendered correctly even when containing ambiguous characters (#22017)
As recognised in #21841 the rendering of plain text files is somewhat
incorrect when there are ambiguous characters as the html code is double
escaped. In fact there are several more problems here.

We have a residual isRenderedHTML which is actually simply escaping the
file - not rendering it. This is badly named and gives the wrong
impression.

There is also unusual behaviour whether the file is called a Readme or
not and there is no way to get to the source code if the file is called
README.

In reality what should happen is different depending on whether the file
is being rendered a README at the bottom of the directory view or not.

1. If it is rendered as a README on a directory - it should simply be
escaped and rendered as `<pre>` text.
2. If it is rendered as a file then it should be rendered as source
code.

This PR therefore does:
1. Rename IsRenderedHTML to IsPlainText
2. Readme files rendered at the bottom of the directory are rendered
without line numbers
3. Otherwise plain text files are rendered as source code.

Replace #21841

Signed-off-by: Andrew Thornton <art27@cantab.net>

Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2022-12-17 22:22:25 +02:00
flynnnnnnnnnn e81ccc406b
Implement FSFE REUSE for golang files (#21840)
Change all license headers to comply with REUSE specification.

Fix #16132

Co-authored-by: flynnnnnnnnnn <flynnnnnnnnnn@github>
Co-authored-by: John Olheiser <john.olheiser@gmail.com>
2022-11-27 18:20:29 +00:00
zeripath 99efa02edf
Switch Unicode Escaping to a VSCode-like system (#19990)
This PR rewrites the invisible unicode detection algorithm to more
closely match that of the Monaco editor on the system. It provides a
technique for detecting ambiguous characters and relaxes the detection
of combining marks.

Control characters are in addition detected as invisible in this
implementation whereas they are not on monaco but this is related to
font issues.

Close #19913

Signed-off-by: Andrew Thornton <art27@cantab.net>
2022-08-13 19:32:34 +01:00
Wim cb50375e2b
Add more linters to improve code readability (#19989)
Add nakedret, unconvert, wastedassign, stylecheck and nolintlint linters to improve code readability

- nakedret - https://github.com/alexkohler/nakedret - nakedret is a Go static analysis tool to find naked returns in functions greater than a specified function length.
- unconvert - https://github.com/mdempsky/unconvert - Remove unnecessary type conversions
- wastedassign - https://github.com/sanposhiho/wastedassign -  wastedassign finds wasted assignment statements.
- notlintlint -  Reports ill-formed or insufficient nolint directives
- stylecheck - https://staticcheck.io/docs/checks/#ST - keep style consistent
  - excluded: [ST1003 - Poorly chosen identifier](https://staticcheck.io/docs/checks/#ST1003) and [ST1005 - Incorrectly formatted error string](https://staticcheck.io/docs/checks/#ST1005)
2022-06-20 12:02:49 +02:00
Gusted bf2867dec2
Don't treat BOM escape sequence as hidden character. (#18909)
* Don't treat BOM escape sequence as hidden character.

- BOM sequence is a common non-harmfull escape sequence, it shouldn't be
shown as hidden character.
- Follows GitHub's behavior.
- Resolves #18837

Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
2022-02-26 16:48:23 +00:00
zeripath 4b3ebda0e7
Fix panic in EscapeReader (#18820)
There is a potential panic due to a mistaken resetting of the length parameter when
multibyte characters go over a read boundary.

Signed-off-by: Andrew Thornton <art27@cantab.net>
2022-02-19 15:25:31 +00:00
zeripath 21ed4fd8da
Add warning for BIDI characters in page renders and in diffs (#17562)
Fix #17514

Given the comments I've adjusted this somewhat. The numbers of characters detected are increased and include things like the use of U+300 to make à instead of à and non-breaking spaces.

There is a button which can be used to escape the content to show it.

Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Gwyneth Morgan <gwymor@tilde.club>
Co-authored-by: silverwind <me@silverwind.io>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
2022-01-07 02:18:52 +01:00