shamoon
9ed9dbb369
Fix: ghostscript rendering error doesnt trigger frontend failure message ( #4092 )
...
* Raise ParseError from gs rendering error
* catch all parser errors as generic exception
* Differentiate generic vs parse errors during consumption
2023-08-31 19:49:00 -07:00
Dennis Brakhane
a1f2ac43f3
Don't consider better OCR as failing
...
Tesseract 5.3.0 does a better job at OCR, and correctly
reads "a webp" instead of "awebp", this is good, so we
don't want the test to fail.
2023-07-11 16:44:18 +02:00
Trenton H
6722b6e31c
Adds better handling for files with invalid utf8 content
2023-05-13 09:29:18 -07:00
Trenton H
aabcc9a1c4
Upgrades black to v23, upgrades ruff
2023-04-26 09:35:27 -07:00
Trenton H
30655f1b73
Fixes ruff not running isort against the codebase
2023-04-26 09:35:27 -07:00
Trenton H
d2c02b9102
Configures ruff as the one stop linter and resolves warnings it raised
2023-04-01 17:03:52 -07:00
Brandon Rothweiler
7d950d9e87
Add PAPERLESS_OCR_SKIP_ARCHIVE_FILE config setting
2023-02-23 22:42:57 -05:00
Brandon Rothweiler
d49e7d6693
Revert "Merge pull request #2732 from bdr99/skip_neverarchive"
...
This reverts commit 77b23d3acb573232e4e307b63a83f8ff557c0e7e, reversing
changes made to 5d8aa278315dcf92bfa1abe9e1fbd4911f8ed258.
2023-02-23 21:26:53 -05:00
Brandon Rothweiler
955546d2ef
Add a setting to disable creating an archive file
2023-02-22 15:27:17 -05:00
Trenton Holmes
acfa7d633d
Creates a mix-in for asserting file system states
2023-02-20 10:25:21 -08:00
Trenton H
09ac404148
Adding more test coverage, in particular around Tika and its parser
2023-02-05 11:01:55 -08:00
shamoon
e1d52f4884
Merge pull request #2302 from paperless-ngx/feature-fix-display-rtl-content
2023-01-10 07:30:52 -08:00
Trenton H
b91217064b
Fixes some sample test files showing as modified after running tests
2023-01-05 08:39:48 -08:00
Trenton Holmes
a185f94c4b
Try a new way of extracting text from a given PDF file
2023-01-03 12:43:31 -08:00
Trenton H
fb20c92c51
Adds testing coverage of multipage TIFF with alpha, without and with alpha/sRGB
2023-01-03 09:56:19 -08:00
Trenton H
79aecebbd2
In the case of an RTL language being extracted via pdfminer.six, fall back to forced OCR, which handles RTL text better
2022-12-29 16:02:02 -08:00
Trenton Holmes
c83d2da67e
Fixes language code checks around two part languages
2022-12-04 12:23:12 -08:00
Trenton H
68c62f3857
Allows parsing of WebP format images
2022-11-28 09:35:54 -08:00
Trenton H
ffd9cd721d
Adds a test to cover this edge case
2022-11-22 07:22:41 -08:00
Trenton Holmes
1be8f39aa0
Reverts the change around skip_noarchive to align with how it is documented to work
2022-10-20 13:34:41 -07:00
Trenton Holmes
43d2545321
Fixes the creation of an archive file, even if noarchive was specified
2022-08-20 13:47:56 -07:00
Trenton Holmes
8660103563
Changes the simple-alpha parsing test to use a tempdir so the original isn't modified in Git
2022-07-02 16:19:22 +02:00
Trenton Holmes
6635fa5f0d
Runs the pre-commit hooks over all the Python files
2022-03-11 11:34:28 -08:00
kpj
c56cb25b5f
Format Python code with black
2022-02-27 15:26:41 +01:00
Martin Müller
a662ce03ea
Modify test for PNG image with alpha
2022-02-21 22:38:25 +01:00
jonaswinkler
c9d76322eb
also apply \0 removal to sidecar contents
2021-03-22 23:08:34 +01:00
jonaswinkler
3a67462396
fixes #631
2021-03-14 14:42:48 +01:00
jonaswinkler
81b787635e
update dependencies
2021-02-28 13:01:26 +01:00
jonaswinkler
96088716d9
tests
2021-02-22 00:17:16 +01:00
jonaswinkler
26c65b29d5
tests
2021-02-21 00:18:34 +01:00
jonaswinkler
99cb371483
add some test files
2021-02-21 00:13:08 +01:00
jonaswinkler
94cc9876d9
local import of ocrmypdf so that the webserver does not load that
2021-02-15 12:18:10 +01:00
jonaswinkler
bee7a06e41
fix bugs and test cases
2021-01-02 15:37:27 +01:00
jonaswinkler
a3334293af
more tests
2020-12-19 15:54:13 +01:00
jonaswinkler
45d31f9735
fixes bauerj/paperless_app#23 and most of all other scanner apps out there.
2020-12-12 18:25:15 +01:00
jonaswinkler
1d073d2cfd
a couple fixes and more supported image files
2020-12-02 17:39:49 +01:00
jonaswinkler
0fb294d556
testing the new noarchive option.
2020-12-01 14:30:13 +01:00
jonaswinkler
20cc7e3dc0
more tests!
2020-11-29 19:58:48 +01:00
jonaswinkler
99e6906b51
test case fixes.
2020-11-27 14:06:37 +01:00
Jonas Winkler
f901def797
more tests of the new parser
2020-11-26 00:08:23 +01:00
Jonas Winkler
c00c63c639
fixed the test cases
2020-11-25 19:51:09 +01:00
Jonas Winkler
f5656222e2
removed obsolete tests.
2020-11-25 14:51:32 +01:00
Jonas Winkler
f976a0b4ba
mime type handling
2020-11-20 13:31:03 +01:00
Jonas Winkler
cbee56ae8c
testing the tesseract parser
2020-11-19 20:31:08 +01:00
Jonas Winkler
9a48d6c577
Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere.
2020-11-16 23:53:12 +01:00
Jonas Winkler
eb6805e37e
code style fixes
2020-11-12 21:09:45 +01:00
Jonas Winkler
340f9f141f
fixed most of the tests
2020-11-02 19:42:23 +01:00
Jonas Winkler
a89773ad71
removed unused code, small fixes
2020-11-02 18:20:04 +01:00
Johannes Wienke
ebcfcea05b
Handle dateparser ValueErrors
...
When parsing dates from the document text or filenames, correctly handle values
errors indicating broken dates. Newly added tests ensure that this handling
works properly.
2020-03-08 18:44:15 +01:00
Johannes Wienke
6531a67940
Remove duplicated date parsing test
...
The exact same tests existed twice in the file.
2020-03-08 18:26:29 +01:00