Trenton H
|
74e845974c
|
Updates to use a single configuration object for all settings
|
2023-12-19 10:21:51 -08:00 |
|
Trenton H
|
25cc7ada6b
|
More fixes and work
|
2023-12-19 09:19:47 -08:00 |
|
Trenton H
|
d8b254e55e
|
Moving the settings to main paperless application
|
2023-12-19 09:19:13 -08:00 |
|
Trenton H
|
c7876dc0f1
|
Fixes max m_pixels
|
2023-12-19 09:19:13 -08:00 |
|
Trenton H
|
ad5c9ef208
|
Work around that error
|
2023-12-19 09:19:13 -08:00 |
|
Trenton H
|
5266bd1590
|
Problems with migration testing need to figure out
|
2023-12-19 09:19:13 -08:00 |
|
Trenton H
|
30281bd593
|
At least partially working for the tesseract parser
|
2023-12-19 09:19:12 -08:00 |
|
Trenton Holmes
|
9867db9616
|
Saving some start on this
|
2023-12-19 09:18:41 -08:00 |
|
Trenton H
|
92a920021d
|
Apply user arguments even in the case of the safe fallback to forcing OCR (#4981)
|
2023-12-14 11:20:47 -08:00 |
|
Trenton H
|
e3f4e0b775
|
Adds new setting to control color conversions (#4709)
|
2023-11-29 12:18:44 -08:00 |
|
Trenton H
|
e1b573adeb
|
Fix: Add a warning about a low image DPI which may cause OCR to fail (#4708)
|
2023-11-29 11:28:27 -08:00 |
|
Trenton H
|
facb7226fe
|
Chore: Backend bulk updates (#4509)
|
2023-11-13 17:09:56 +00:00 |
|
Trenton H
|
8d60506884
|
Standarizes the imports across all the files and modules (#4248)
|
2023-09-23 20:17:01 -07:00 |
|
Trenton Holmes
|
650c816a7b
|
Removes support for Python 3.8 and lower from the code base
|
2023-09-10 11:42:59 -07:00 |
|
shamoon
|
e14f4c94c2
|
Fix: ghostscript rendering error doesnt trigger frontend failure message (#4092)
* Raise ParseError from gs rendering error
* catch all parser errors as generic exception
* Differentiate generic vs parse errors during consumption
|
2023-08-31 19:49:00 -07:00 |
|
Trenton H
|
7e768bfe23
|
When PDF/A rendering fails, add a warning the user may want to allow it to continue
|
2023-08-28 18:10:11 -07:00 |
|
Dennis Brakhane
|
93009c1eed
|
Don't consider better OCR as failing
Tesseract 5.3.0 does a better job at OCR, and correctly
reads "a webp" instead of "awebp", this is good, so we
don't want the test to fail.
|
2023-07-11 16:44:18 +02:00 |
|
Trenton H
|
70f3f98363
|
Let ruff autofix some things from the newest version
|
2023-06-13 20:15:18 -07:00 |
|
Trenton H
|
452c79f9a1
|
Improves the logging mixin and allows it to be typed better
|
2023-05-23 17:16:39 -07:00 |
|
Trenton H
|
111960c530
|
Adds better handling for files with invalid utf8 content
|
2023-05-13 09:29:18 -07:00 |
|
Trenton H
|
6f163111ce
|
Upgrades black to v23, upgrades ruff
|
2023-04-26 09:35:27 -07:00 |
|
Trenton H
|
3bcbd05252
|
Fixes ruff not running isort against the codebase
|
2023-04-26 09:35:27 -07:00 |
|
Trenton H
|
ce41ac9158
|
Configures ruff as the one stop linter and resolves warnings it raised
|
2023-04-01 17:03:52 -07:00 |
|
Brandon Rothweiler
|
ca412e0184
|
Add PAPERLESS_OCR_SKIP_ARCHIVE_FILE config setting
|
2023-02-23 22:42:57 -05:00 |
|
Brandon Rothweiler
|
8a89f5ae27
|
Revert "Merge pull request #2732 from bdr99/skip_neverarchive"
This reverts commit 77b23d3acb, reversing
changes made to 5d8aa27831.
|
2023-02-23 21:26:53 -05:00 |
|
Brandon Rothweiler
|
93a6391f96
|
Add a setting to disable creating an archive file
|
2023-02-22 15:27:17 -05:00 |
|
Trenton Holmes
|
0df91c31f1
|
Creates a mix-in for asserting file system states
|
2023-02-20 10:25:21 -08:00 |
|
Trenton H
|
bdcba570cb
|
Adding more test coverage, in particular around Tika and its parser
|
2023-02-05 11:01:55 -08:00 |
|
shamoon
|
985f298c46
|
Merge pull request #2302 from paperless-ngx/feature-fix-display-rtl-content
|
2023-01-10 07:30:52 -08:00 |
|
Trenton H
|
d7939ca958
|
Fixes some sample test files showing as modified after running tests
|
2023-01-05 08:39:48 -08:00 |
|
Trenton H
|
1e4923835b
|
Small tweak to use the existing tempdir instead of a new one
|
2023-01-03 13:05:44 -08:00 |
|
Trenton Holmes
|
7be9ae9c02
|
Try a new way of extracting text from a given PDF file
|
2023-01-03 12:43:31 -08:00 |
|
Trenton H
|
0fd51e35e1
|
Adds testing coverage of multipage TIFF with alpha, without and with alpha/sRGB
|
2023-01-03 09:56:19 -08:00 |
|
Trenton H
|
59e0c1fe4e
|
Let convert handle the removal of the alpha channel
|
2023-01-03 09:56:19 -08:00 |
|
Trenton Holmes
|
26c7fad005
|
If extracting text from a fallback file (ie forced), allow the text to be used
|
2023-01-01 09:57:15 -08:00 |
|
Trenton H
|
a2b7687c3b
|
In the case of an RTL language being extracted via pdfminer.six, fall back to forced OCR, which handles RTL text better
|
2022-12-29 16:02:02 -08:00 |
|
Trenton Holmes
|
55ef0d4a1b
|
Fixes language code checks around two part languages
|
2022-12-04 12:23:12 -08:00 |
|
shamoon
|
5d3a6e230d
|
Merge pull request #2057 from paperless-ngx/fix/2044-lang-code-diffs
Bugfix: Some tesseract languages aren't detected as installed.
|
2022-11-28 11:04:44 -08:00 |
|
Trenton H
|
e96d65f945
|
Allows parsing of WebP format images
|
2022-11-28 09:35:54 -08:00 |
|
Trenton Holmes
|
f0497e7744
|
Fixes how a language code like chi-sim is treated in the checks
|
2022-11-27 08:28:22 -08:00 |
|
Trenton H
|
f015556562
|
Adds a test to cover this edge case
|
2022-11-22 07:22:41 -08:00 |
|
Trenton H
|
b897d6de2e
|
Don't use the sidecar file when redoing the OCR, it only contains new text
|
2022-11-22 07:22:41 -08:00 |
|
Trenton Holmes
|
d1aa08850d
|
Reverts the change around skip_noarchive to align with how it is documented to work
|
2022-10-20 13:34:41 -07:00 |
|
Trenton Holmes
|
b3b2519bf0
|
Fixes the creation of an archive file, even if noarchive was specified
|
2022-08-20 13:47:56 -07:00 |
|
Trenton Holmes
|
b70e21a6d5
|
When raising an exception during exception handling, chain them together for slightly cleaner logs
|
2022-08-03 09:00:56 -07:00 |
|
Trenton Holmes
|
49a843dcdd
|
Changes the simple-alpha parsing test to use a tempdir so the original isn't modified in Git
|
2022-07-02 16:19:22 +02:00 |
|
Trenton Holmes
|
fc26fe0ac0
|
Updates to provide the user provided max pixel size to ocrmypdf
|
2022-05-22 16:56:08 -07:00 |
|
Trenton Holmes
|
3003bdd507
|
Runs pyupgrade to Python 3.8+ and adds a hook for it
|
2022-05-06 09:04:08 -07:00 |
|
Henning Häcker
|
3b4da70c85
|
extract OCR_MAX_IMAGE_PIXELS into settings.py
|
2022-03-30 09:23:45 +02:00 |
|
Henning Häcker
|
95199bd325
|
formatting according to black
|
2022-03-30 09:23:45 +02:00 |
|