Postprocessor API
axetract.postprocessor.axe_postprocessor.AXEPostprocessor
Bases: BasePostprocessor
Optimized PostProcessor for high-throughput batch processing.
This component handles JSON parsing, repair, and grounded XPath resolution (GXR) to map extracted values back to the original document.
Uses a parse-once indexing strategy: each document's HTML is parsed into a search index exactly once, and all extracted fields are matched against that index. This eliminates the O(fields × parse_cost) bottleneck.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Component name. |
exact_extraction |
bool
|
Whether to perform fuzzy matching to find source XPaths. |
Source code in src/axetract/postprocessor/axe_postprocessor.py
__call__(samples)
Clean, repair, and ground a batch of extraction samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples
|
List[AXESample]
|
Samples with raw LLM predictions. |
required |
Returns:
| Type | Description |
|---|---|
List[AXESample]
|
List[AXESample]: Samples with structured predictions and XPaths. |
Source code in src/axetract/postprocessor/axe_postprocessor.py
__init__(name='axe_postprocessor', exact_extraction=True)
Initialize the postprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Component name. |
'axe_postprocessor'
|
exact_extraction
|
bool
|
Enable grounded XPath resolution. |
True
|
Source code in src/axetract/postprocessor/axe_postprocessor.py
axetract.postprocessor.base_postprocessor.BasePostprocessor
Bases: ABC
Abstract base class for all postprocessors.
Source code in src/axetract/postprocessor/base_postprocessor.py
__call__(samples)
abstractmethod
__init__(name)
Initialize the postprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Component name. |
required |