* [feature] Block Google Bard/AI crawlers
* [feature] Block the other OpenAI crawler
* [feature] Block Common Crawl crawler
This is used in research, but also gleefully advertises itself as the
training source used in all LLMs and GPT-3.
Fixes: #2240
* [feature] Block Omgilikebot
Used by some shady big web data engine company.
* [feature] Block Meta's language model crawler
* [feature] Block well-known.dev crawler
* use minID properly for public timeline
* return paged response properly even when 0 items
* use gtserror
* page more consistently (for now)
* test
* aaa
* update typeconverter to use state structure
* deinterface the typeutils.TypeConverter -> typeutils.Converter
* finish copying over old type converter code comments
* fix cherry-pick merge issues, fix tests pointing to old typeutils interface type still
* love like winter! wohoah, wohoah
* domain allow side effects
* tests! logging! unallow!
* document federation modes
* linty linterson
* test
* further adventures in documentation
* finish up domain block documentation (i think)
* change wording a wee little bit
* docs, example
* consolidate shared domainPermission code
* call mode once
* fetch federation mode within domain blocked func
* read domain perm import in streaming manner
* don't use pointer to slice for domain perms
* don't bother copying blocks + allows before deleting
* admonish!
* change wording just a scooch
* update docs
* [feature] Support Actor URIs for webfinger queries
It's now possible to pass an Actor URI as the resource to query for when
doing a webfinger query. The code now extracts the username and domain
from the URI. The URI needs to be fully qualified, including having a
scheme of http or https to be recognised as such.
The acct scheme is handled as we used to, including dealing with an
erroneous leading @ on the username. We retain the ability to handle
resources without a scheme by parsing them again with the acct scheme if
the original parse failed. This can happen due to parsing ambiguities
when dealing with a string like user@domain.tld:port.
* [bugfix] Remove debugging changes
* [chore] Make TestExtractNamestring table-driven
* [chore] Unnest Trim and Split for readability
* [feature] Add http trace exporter, drop Jaeger
Jaeger supports ingesting traces using the OpenTelemetry gRPC or HTTP
methods. The Jaeger project has deprecated the old jaeger transport.
* Add support for submitting traces over HTTP
* Drop support for the old Jaeger protocol
* Upgrade the trace libraries to v1.17
Fixes: #2176Fixes: #2179
c.FullPath() is the empty string if a request doesn't match any route on
our mux. In those cases, there's no value in emitting a trace. The trace
will be empty, containing no other information beyond the fact that we
didn't match a route. Since Gin breaks off the processing early we don't
need to trace this request as it won't do anything and consumes no
further resources.
The 404 will still be emitted by our logs and will be visible from a
reverse proxy too.
* move SQLite pragmas into connection string
Signed-off-by: kim <grufwub@gmail.com>
* use url.Values type for SQLite connection preferences
Signed-off-by: kim <grufwub@gmail.com>
* set SQLite URI prefs properly using _pragma query key
Signed-off-by: kim <grufwub@gmail.com>
* add notes on SQLite connection preferences
Signed-off-by: kim <grufwub@gmail.com>
* fix typo
Signed-off-by: kim <grufwub@gmail.com>
* add one extra line regarding connection pooling
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
* wrap bun.Tx to add our own error processing
Signed-off-by: kim <grufwub@gmail.com>
* add compile-time check for updateRowError() compatibility with sql.Row, fix wrapTx() not being used properly
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
* [feature] list commands for both attachment and emojis
* use fewer commands, provide `local-only` and `remote-only` as filters
* envparsing
---------
Co-authored-by: Romain de Laage <romain.delaage@rdelaage.ovh>
Co-authored-by: tsmethurst <tobi.smethurst@protonmail.com>
* [feature] Don't emit timestamp in log lines
When running gotosocial with a service manager like systemd, or a
container runtime, the associated log driver usually emits timestamps
itself. In those cases, having the extra timestamp from our own log
lines ends up being a bit noisy and when centrally ingesting logs is
duplicate information.
This introduces a configuration flag that allows disabling emitting the
timestamp. It's only wired up for "daemonised" processes, meaning server
and testrig.
* [chore] Add docs for log-timestamp
* [feature] Simplify timestamp handling
Co-Authored-By: kim <89579420+NyaaaWhatsUpDoc@users.noreply.github.com>
* [chore] Less escaped double-quotes
* [chore] Fix help string
---------
Co-authored-by: kim <89579420+NyaaaWhatsUpDoc@users.noreply.github.com>
* init instance rules database model, admin api
* expose instance rules in public instance api
* public /api/v1/instance/rules route
* GET ruleById
* createRule route
* createRule auth check
* updateRule
* deleteRule
* list rules on about page
* ruleGet auth
* add about page ids for anchors
* process and store adding violated rules to reports
* admin api models for instance rules
* instance rule edit frontend
* change rule inputs to textareas
* database fixes after rebase (#2124)
* remove unused imports
* fix db migration column name
* fix tests
* fix more tests
* fix postgres error with wrongly used Ident
* add some tests, fiddle with rule model a bit, fix postgres migration
* swagger docs
---------
Co-authored-by: tsmethurst <tobi.smethurst@protonmail.com>
* use calculated exampleTime instead of `time.Now()` to ensure no locale data, retweak cache ratios
* update envparsing test
* update default cache memory to 100MiB
* fix envparsing with latest cache target default
---------
Signed-off-by: kim <grufwub@gmail.com>
This adds the CSP header with a policy of only loading from the same
domain. We don't make use of external media, CSS, JS, fonts, so we don't
ever need external data loaded in our context.
When building a DEBUG build, the policy gets extended to include
localhost:*, i.e localhost on any port. This keeps the live-reloading
flow for JS development working. localhost and 127.0.0.1 are considered
to be the same so mixing and matching those doesn't result in a CSP
violation.
* Add/update some DB functions.
* move async workers into subprocessor
* rename FromFederator -> FromFediAPI
* update home timeline check to include check for current status first before moving to parent status
* change streamMap to pointer to mollify linter
* update followtoas func signature
* fix merge
* remove errant debug log
* don't use separate errs.Combine() check to wrap errs
* wrap parts of workers functionality in sub-structs
* populate report using new db funcs
* embed federator (tiny bit tidier)
* flesh out error msg, add continue(!)
* fix other error messages to be more specific
* better, nicer
* give parseURI util function a bit more util
* missing headers
* use pointers for subprocessors
* Allow full BCP 47 in language inputs
Fixes#2066
* Fuse validation and normalization for languages
* Remove outdated comment line
* Move post language canonicalization test
* [chore] Remove go-playground/validator
It turns out we're not actually using the validator code. This is a
remnant from when we intended to use it, but the presence of it and its
struct tags creates the illusion we're validating a lot of things we're
not. It resulted in some confusion when we were trying to figure out
language valdiation.
Remove all this code, so that only the validation functions from the
validate package we actually use remain. I'm not touching the struct
tags in the migrations in order to avoid things potentially thinking
migrations need to be re-run.
* [chore] Bring back a struct tag on api
The validate on internal/api is Gin doing form validation, not the
validator from go-playground/validator.
* add automatic cache max size generation based on ratios of a singular fixed memory target
Signed-off-by: kim <grufwub@gmail.com>
* remove now-unused cache max-size config variables
Signed-off-by: kim <grufwub@gmail.com>
* slight ratio tweak
Signed-off-by: kim <grufwub@gmail.com>
* remove unused visibility config var
Signed-off-by: kim <grufwub@gmail.com>
* add secret little ratio config trick
Signed-off-by: kim <grufwub@gmail.com>
* fixed a word
Signed-off-by: kim <grufwub@gmail.com>
* update cache library to remove use of TTL in result caches + slice cache
Signed-off-by: kim <grufwub@gmail.com>
* update other cache usages to use correct interface
Signed-off-by: kim <grufwub@gmail.com>
* update example config to explain the cache memory target
Signed-off-by: kim <grufwub@gmail.com>
* update env parsing test with new config values
Signed-off-by: kim <grufwub@gmail.com>
* do some ratio twiddling
Signed-off-by: kim <grufwub@gmail.com>
* add missing header
* update envparsing with latest defaults
Signed-off-by: kim <grufwub@gmail.com>
* update size calculations to take into account result cache, simple cache and extra map overheads
Signed-off-by: kim <grufwub@gmail.com>
* tweak the ratios some more
Signed-off-by: kim <grufwub@gmail.com>
* more nan rampaging
Signed-off-by: kim <grufwub@gmail.com>
* fix envparsing script
Signed-off-by: kim <grufwub@gmail.com>
* update cache library, add sweep function to keep caches trim
Signed-off-by: kim <grufwub@gmail.com>
* sweep caches once a minute
Signed-off-by: kim <grufwub@gmail.com>
* add a regular job to sweep caches and keep under 80% utilisation
Signed-off-by: kim <grufwub@gmail.com>
* remove dead code
Signed-off-by: kim <grufwub@gmail.com>
* add new size library used to libraries section of readme
Signed-off-by: kim <grufwub@gmail.com>
* add better explanations for the mem-ratio numbers
Signed-off-by: kim <grufwub@gmail.com>
* update go-cache
Signed-off-by: kim <grufwub@gmail.com>
* library version bump
Signed-off-by: kim <grufwub@gmail.com>
* update cache.result{} size model estimation
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
* update DeleteEmoji to use faster relational tables for status / account finding
Signed-off-by: kim <grufwub@gmail.com>
* update Get{Accounts,Statuses}UsingEmoji() to also use relational tables
Signed-off-by: kim <grufwub@gmail.com>
* remove the now unneeded tags relation from newStatusQ()
Signed-off-by: kim <grufwub@gmail.com>
* fix table names
Signed-off-by: kim <grufwub@gmail.com>
* fix account and status selects using emojis
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
* update go-fed
* do the things
* remove unused columns from tags
* update to latest lingo from main
* further tag shenanigans
* serve stub page at tag endpoint
* we did it lads
* tests, oh tests, ohhh tests, oh tests (doo doo doo doo)
* swagger docs
* document hashtag usage + federation
* instanceGet
* don't bother parsing tag href
* rename whereStartsWith -> whereStartsLike
* remove GetOrCreateTag
* dont cache status tag timelineability
* Support setting private notes on accounts
* Reformat comment whitespace
* Add missing license headers
* Use apiutil.ParseID
* Rename Note model and cache to AccountNote
* Update golden cache config in test/envparsing.sh
* Rename gtsmodel/note.go to gtsmodel/accountnote.go
* Update AccountNote uniqueness constraint name
Now has same prefix as other indexes on this table.
---------
Co-authored-by: tobi <31960611+tsmethurst@users.noreply.github.com>
* catch SQLITE_BUSY errors, wrap bun.DB to use our own busy retrier, remove unnecessary db.Error type
Signed-off-by: kim <grufwub@gmail.com>
* remove dead code
Signed-off-by: kim <grufwub@gmail.com>
* remove more dead code, add missing error arguments
Signed-off-by: kim <grufwub@gmail.com>
* update sqlite to use maxOpenConns()
Signed-off-by: kim <grufwub@gmail.com>
* add uncommitted changes
Signed-off-by: kim <grufwub@gmail.com>
* use direct calls-through for the ConnIface to make sure we don't double query hook
Signed-off-by: kim <grufwub@gmail.com>
* expose underlying bun.DB better
Signed-off-by: kim <grufwub@gmail.com>
* retry on the correct busy error
Signed-off-by: kim <grufwub@gmail.com>
* use longer possible maxRetries for db retry-backoff
Signed-off-by: kim <grufwub@gmail.com>
* remove the note regarding max-open-conns only applying to postgres
Signed-off-by: kim <grufwub@gmail.com>
* improved code commenting
Signed-off-by: kim <grufwub@gmail.com>
* remove unnecessary infof call (just use info)
Signed-off-by: kim <grufwub@gmail.com>
* rename DBConn to WrappedDB to better follow sql package name conventions
Signed-off-by: kim <grufwub@gmail.com>
* update test error string checks
Signed-off-by: kim <grufwub@gmail.com>
* shush linter
Signed-off-by: kim <grufwub@gmail.com>
* update backoff logic to be more transparent
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
The old default of 30d can lead to a lot of media getting cached and
significant disk usage, even on small or single person instances. A lot
of deployments decrease this value, to 15 or even less. This is less of
an issue when using object storage, but for local storage which is the
more popular deployment option running out of disk space is unpleasant.
With GoToSocial's aim to fit in small places, this changes the default
to a much more conservative 7 days. In all likelihood people aren't
scrolling that far back in their timeline so this change shouldn't
result in any issue. Existing deployments will only be affected by
this change if the admin hasn't already configured this value, or didn't
bootstrap from the example configuration.
* [bugfix] Set Vary header correctly on cache-control
* Prefer activitypub types on AP endpoints
* use immutable on file server, vary by range
* vary auth on Accept
* Set default value of SMTPFrom to empty string
This parameter should contain proper e-mail address (to be provided by user during configuration).
* Update default values in example/config.yaml
Default values and related comments in example/config.yaml are aligned
with values defined in internal/config/defaults.go.
Small improvements to foramting of config.yaml file.
* Add default value for AdvancedThrottlingRetryAfter to internal/config/defaults.go
AdvancedThrottlingRetryAfter was introduced in 70739d3 (superseriousbusiness/gotosocial#1466).
* Update config.yaml snippets in documentation
This makes the serveFileRange function return the entire file
if suffix-range is larger than content-length in compliance with RFC9110
Co-authored-by: mae <git@badat.dev>
* [bugfix] Tidy up rss feed serving; don't error on empty feed
* fall back to account creation time as rss feed update time
* return feed early when account has no eligible statuses
For some reason we hit the case in CI where the
TestFingerWithHostMetaCacheStrategy seems to experience some time
dilation. It's possible this is a genuine bug, but I can't for the life
of me reproduce it locally, even after having run this test thousands of
times (-count=1000 when invoking go test etc.)
This changes the test to explicitly stop the webfinger cache, set TTL
and Sweep frequency to something well beyond the lifetime of the cache
during the test and then starts the cache again. Hopefully that does it,
because the other option that remains is that for some reason
timekeeping in CI/Docker is not as precise as when running the test on a
host.
* update go-cache library
Signed-off-by: kim <grufwub@gmail.com>
* fix broken test after cache library upgrade
Signed-off-by: kim <grufwub@gmail.com>
* fix the webfinger test
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
* get max emoji size from instance settings
* expose (hardcoded) max amount of profile fields in instance api
* basic profile field setting
* fix profile field hook structure for updates
* *twirls mustache* fix ze tests
---------
Co-authored-by: tsmethurst <tobi.smethurst@protonmail.com>
* [bugfix] Fix first item of thread dereferencing always being skipped
* tweak to status descendant item iteration
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
Co-authored-by: kim <grufwub@gmail.com>
* only attempt to populate account/statuses from DB if already up-to-date
Signed-off-by: kim <grufwub@gmail.com>
* add missing status is-up-to-date check :grimace: + ensure populated if so
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
* media manager tidy-up: de-interface and remove unused PostDataFunc
Signed-off-by: kim <grufwub@gmail.com>
* remove last traces of media.Manager being an interface
Signed-off-by: kim <grufwub@gmail.com>
* update error to provide caller, allow tuneable via build tags
Signed-off-by: kim <grufwub@gmail.com>
* remove kim-specific build script changes
Signed-off-by: kim <grufwub@gmail.com>
* fix merge conflicts
Signed-off-by: kim <grufwub@gmail.com>
* update build-script to support externally setting build variables
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
* start working on lists
* further list work
* test list db functions nicely
* more work on lists
* peepoopeepoo
* poke
* start list timeline func
* we're getting there lads
* couldn't be me working on stuff... could it?
* hook up handlers
* fiddling
* weeee
* woah
* screaming, pissing
* fix streaming being a whiny baby
* lint, small test fix, swagger
* tidying up, testing
* fucked! by the linter
* move timelines to state like a boss
* add timeline start to tests using state
* invalidate lists
* add back removed ValidateRequest() before backoff-retry loop
Signed-off-by: kim <grufwub@gmail.com>
* include response body in error response log
Signed-off-by: kim <grufwub@gmail.com>
* improved error response body draining
Signed-off-by: kim <grufwub@gmail.com>
* add more code commenting
Signed-off-by: kim <grufwub@gmail.com>
* move new error response logic to gtserror, handle instead in transport.Transport{} impl
Signed-off-by: kim <grufwub@gmail.com>
* appease ye oh mighty linter
Signed-off-by: kim <grufwub@gmail.com>
* fix mockhttpclient not setting request in http response
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
Every now and then the TestFingerWithHostMetaCacheStrategy would fail on
a time related error. I suspect suite.Equal doesn't quite work as
expected when given two time.Time's, so instead explicitly check with
the time.Equal.
* [bugfix] Fix NegotiateAccept with multi accept
There's a bug in Gin's NegotiateFormat that doesn't handle the presence
of multilpe accept headers. This lifts the code from the PR @tsmethurst
sent a year ago to Gin into our codebase to fix the issue.
* [bugfix] Concat accept header in webfinger
Some implementations bug out when there's multiple accept headers,
including Gin (see 7050112af1). But things
seem to work reliably with a single accept header with multiple parts.
Fixes: #1793
* remove info banner
* update swagger definition for AccountAction
* basic user view, suspend action
* clean up suspended user display
* basic user searching
* rename User -> Account for clarity
* refactor error boundary component to give better info
* appease the linter
* revamp http client to not limit requests, instead use sender worker
Signed-off-by: kim <grufwub@gmail.com>
* remove separate sender worker pool, spawn 2*GOMAXPROCS batch senders each time, no need for transport cache sweeping
Signed-off-by: kim <grufwub@gmail.com>
* improve batch senders to keep popping recipients until remote URL found
Signed-off-by: kim <grufwub@gmail.com>
* fix recipient looping issue
Signed-off-by: kim <grufwub@gmail.com>
* move request id ctx key to gtscontext, finish filling out more code comments, add basic support for not logging client IP
Signed-off-by: kim <grufwub@gmail.com>
* first draft of status refetching logic
Signed-off-by: kim <grufwub@gmail.com>
* fix testrig to use new federation alloc func signature
Signed-off-by: kim <grufwub@gmail.com>
* fix log format directive
Signed-off-by: kim <grufwub@gmail.com>
* add status fetched_at migration
Signed-off-by: kim <grufwub@gmail.com>
* remove unused / unchecked for error types
Signed-off-by: kim <grufwub@gmail.com>
* add back the used type...
Signed-off-by: kim <grufwub@gmail.com>
* add separate internal getStatus() function for derefThread() that doesn't recurse
Signed-off-by: kim <grufwub@gmail.com>
* improved mention and media attachment error handling
Signed-off-by: kim <grufwub@gmail.com>
* fix log and error format directives
Signed-off-by: kim <grufwub@gmail.com>
* update account deref to match status deref changes
Signed-off-by: kim <grufwub@gmail.com>
* very small code formatting change to make things clearer
Signed-off-by: kim <grufwub@gmail.com>
* add more code comments
Signed-off-by: kim <grufwub@gmail.com>
* improved code commenting
Signed-off-by: kim <grufwub@gmail.com>
* only check for required further derefs if needed
Signed-off-by: kim <grufwub@gmail.com>
* improved cache invalidation
Signed-off-by: kim <grufwub@gmail.com>
* tweak cache restarting to use a (very small) backoff
Signed-off-by: kim <grufwub@gmail.com>
* small readability changes and fixes
Signed-off-by: kim <grufwub@gmail.com>
* fix account sync issues
Signed-off-by: kim <grufwub@gmail.com>
* fix merge conflicts + update account enrichment to accept already-passed accountable
Signed-off-by: kim <grufwub@gmail.com>
* remove secondary function declaration
Signed-off-by: kim <grufwub@gmail.com>
* normalise dereferencer get status / account behaviour, fix remaining tests
Signed-off-by: kim <grufwub@gmail.com>
* fix remaining rebase conflicts, finish commenting code
Signed-off-by: kim <grufwub@gmail.com>
* appease the linter
Signed-off-by: kim <grufwub@gmail.com>
* add source file header
Signed-off-by: kim <grufwub@gmail.com>
* update to use TIMESTAMPTZ column type instead of just TIMESTAMP
Signed-off-by: kim <grufwub@gmail.com>
* don't pass in 'updated_at' to UpdateEmoji()
Signed-off-by: kim <grufwub@gmail.com>
* use new ap.Resolve{Account,Status}able() functions
Signed-off-by: kim <grufwub@gmail.com>
* remove the somewhat confusing rescoping of the same variable names
Signed-off-by: kim <grufwub@gmail.com>
* update migration file name, improved database delete error returns
Signed-off-by: kim <grufwub@gmail.com>
* formatting
Signed-off-by: kim <grufwub@gmail.com>
* improved multi-delete database functions to minimise DB calls
Signed-off-by: kim <grufwub@gmail.com>
* remove unused type
Signed-off-by: kim <grufwub@gmail.com>
* fix delete statements
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
* update go-cache version to support multi-keying
Signed-off-by: kim <grufwub@gmail.com>
* improved cache invalidation
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
It turns out that in Masto v2.3.0 the languages key was added to the V1
Instance and that it's effectively mandatory. Though in GtS we don't
really have this concept yet, some apps will explode if the languages
key is missing altogether. So at least return the empty array on V1 too
in the hopes that it makes things work well enough.
For history's sake, you can see the attributes that will get serialised
in
f877aa9d70/app/serializers/rest/v1/instance_serializer.rb (L6-L9).
Because the attribute does not have a conditional defined for it,
there isn't a filter that optionally omits it, or a def languages to
modify the behaviour the attribute is effectively always included and
serialised.
Fixes: #1662
This adds the preferences endpoint to our Mastodon Client API
implementation. It's a read-only endpoint that returns a number of
user preferences. Applications can query these settings when logging in
a user (for the first time) to configure themselves.
* replace domain block cache with an in-memory radix tree
Signed-off-by: kim <grufwub@gmail.com>
* fix domain block cache init
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>
* check for tls, x509 errors using string.Contains() since crypto/tls sucks
Signed-off-by: kim <grufwub@gmail.com>
* use 2* maxprocs
Signed-off-by: kim <grufwub@gmail.com>
---------
Signed-off-by: kim <grufwub@gmail.com>