The following Heritrix issues have a 'fix version' of 1.12.0, meaning they are fixed or expected to be fixed for Heritrix release 1.12.0. (This list is dynamically updated from the JIRA Issue Tracking project for Heritrix.)
IA Webteam JIRA
(24 issues)
|
|
Key |
Summary |
T |
Created |
Updated |
Assignee |
Reporter |
Pr |
Status |
Res |
|
HER-659
|
filehandle leak: ReplayInputStream/BufferedSeekInputStream
|
|
Feb 16, 2007
|
Apr 25, 2007
|
Karl Thiessen
|
Gordon Mohr
|
|
Closed
|
FIXED
|
|
HER-434
|
"failed get of replay" in ExtractorHTML... usu: UTF-16BE
|
|
Feb 16, 2007
|
Mar 22, 2007
|
Karl Thiessen
|
Gordon Mohr
|
|
Closed
|
FIXED
|
|
HER-4
|
robots.txt "crawl-delay" (and "allow") directive breaks parsing
|
|
Feb 13, 2007
|
Apr 25, 2007
|
Karl Thiessen
|
(sourceforge)
|
|
Closed
|
FIXED
|
|
HER-1080
|
CrawlURI.getContentDigest for DNS URIs returns digest of zero-length-input
|
|
Feb 20, 2007
|
Mar 08, 2007
|
Gordon Mohr
|
Gordon Mohr
|
|
Closed
|
FIXED
|
|
HER-1086
|
TransclusionDecideRule should offer lower cap for speculative hops
|
|
Mar 03, 2007
|
Mar 08, 2007
|
Gordon Mohr
|
Gordon Mohr
|
|
Resolved
|
FIXED
|
|
HER-1090
|
Method to remove unwanted elements added by parent settings
|
|
Mar 13, 2007
|
Mar 14, 2007
|
Karl Thiessen
|
Michael Stack
|
|
Resolved
|
FIXED
|
|
HER-1091
|
v10 of WARCReader is unusable (ClassCastException)
|
|
Mar 13, 2007
|
Mar 14, 2007
|
Karl Thiessen
|
Michael Stack
|
|
Resolved
|
FIXED
|
|
HER-1095
|
move from Filters to DecideRules is done, but still no replacement for ContentTypeRegExpFilter exists
|
|
Mar 14, 2007
|
Mar 19, 2007
|
Gordon Mohr
|
Olaf Freyer
|
|
Closed
|
FIXED
|
|
HER-804
|
avoid double-extracting identical documents
|
|
Feb 17, 2007
|
Mar 19, 2007
|
Karl Thiessen
|
Gordon Mohr
|
|
Closed
|
FIXED
|
|
HER-1079
|
Carry forward prior-fetch information (content-digest, headers) useful for future recrawls
|
|
Feb 20, 2007
|
Mar 19, 2007
|
Karl Thiessen
|
Gordon Mohr
|
|
Closed
|
FIXED
|
|
HER-1081
|
Optionally use conditional-GET headers (If-Modified-Since, If-None-Match) in FetchHTTP if history info available
|
|
Feb 20, 2007
|
Mar 19, 2007
|
Karl Thiessen
|
Gordon Mohr
|
|
Closed
|
FIXED
|
|
HER-1083
|
Make extraction and writing dependent on duplicate analysis
|
|
Feb 20, 2007
|
Mar 19, 2007
|
Karl Thiessen
|
Gordon Mohr
|
|
Resolved
|
FIXED
|
|
HER-650
|
ExtractorHTML misses inline STYLE elements with comments
|
|
Feb 16, 2007
|
Mar 20, 2007
|
Karl Thiessen
|
Vinay Goel
|
|
Resolved
|
FIXED
|
|
HER-1084
|
TestCases in SelfTest can't succeed in normal unit test suite, clutter results with spurious failures
|
|
Feb 21, 2007
|
Feb 28, 2007
|
Gordon Mohr
|
Gordon Mohr
|
|
Resolved
|
FIXED
|
|
HER-1075
|
crawl log digest field should include digest algorithm
|
|
Feb 17, 2007
|
Mar 08, 2007
|
Gordon Mohr
|
Michael Stack
|
|
Resolved
|
FIXED
|
|
HER-1069
|
WUI: determine number of URL's matching regex in frontier
|
|
Feb 17, 2007
|
Mar 09, 2007
|
Gordon Mohr
|
(sourceforge)
|
|
Closed
|
FIXED
|
|
HER-1071
|
[contrib] StripExtraSlashes canonicalization rule
|
|
Feb 17, 2007
|
Mar 09, 2007
|
(sourceforge)
|
Michael Stack
|
|
Closed
|
FIXED
|
|
HER-1072
|
[contrib] Swedish libraries Kw3WriterProcessor
|
|
Feb 17, 2007
|
Mar 09, 2007
|
(sourceforge)
|
Michael Stack
|
|
Closed
|
FIXED
|
|
HER-1074
|
Add digest using md5 option
|
|
Feb 17, 2007
|
Mar 09, 2007
|
(sourceforge)
|
Michael Stack
|
|
Closed
|
FIXED
|
|
HER-1077
|
Replace per-Processor Filters with DecideRules
|
|
Feb 18, 2007
|
Mar 09, 2007
|
Gordon Mohr
|
Gordon Mohr
|
|
Closed
|
FIXED
|
|
HER-1087
|
[arcreader] Handling of DELETED records
|
|
Mar 12, 2007
|
Mar 12, 2007
|
Karl Thiessen
|
Michael Stack
|
|
Resolved
|
FIXED
|
|
HER-1088
|
[arcreader] If ZipException, not-strict, and iterating, skip to next record
|
|
Mar 12, 2007
|
Mar 12, 2007
|
Karl Thiessen
|
Michael Stack
|
|
Resolved
|
FIXED
|
|
HER-1085
|
Httpclient failes to fetch URL with '|' in the path
|
|
Feb 26, 2007
|
Mar 20, 2007
|
Karl Thiessen
|
Igor Ranitovic
|
|
Resolved
|
FIXED
|
|
HER-1078
|
BDB-JE: use deferred writes more extensively, update to 3.2.13
|
|
Feb 20, 2007
|
Mar 20, 2007
|
Karl Thiessen
|
Gordon Mohr
|
|
Resolved
|
FIXED
|
|
|