AXEtract
High-performance, LoRA-powered web data extraction. Based on the Paper AXE: Low-Cost Cross-Domain Web Structured Information Extraction
-
Extreme Efficiency --- Achieve state-of-the-art extraction accuracy with 0.6B parameter models.
-
LoRA Switching --- Dynamically switch between pruning and extraction adapters in a single VRAM footprint.
-
Grounded XPath (GXR) --- Automatically map extracted data back to the original DOM XPaths.
-
vLLM Support --- Built-in support for high-throughput batch processing with vLLM.
Why Axetract?
Traditional web extractors are often a trade-off between brittle manual heuristics and the prohibitive cost of Large Language Models. Axetract provides a solution: the intelligence and flexibility of an LLM with the efficiency of a local 0.6B model via intelligent DOM pruning.
| Feature | Axetract (0.6B) |
|---|---|
| Accuracy (SWDE F1) | 88.1% |
| Compute Required | Low (0.6B) |
| Cost | Free (Local) |
| Privacy | 100% On-Prem |
Quick Start
from pydantic import BaseModel
from axetract import AXEPipeline
class Product(BaseModel):
name: str
price: float
currency: str
pipeline = AXEPipeline.from_config()
result = pipeline.extract(
input_data="https://example.com/product",
schema=Product,
)
print(result.prediction)
# Output: {'name': 'Smartphone X', 'price': 999.0, 'currency': 'USD'}