Skip to content

AXEtract Logo

AXEtract

High-performance, LoRA-powered web data extraction. Based on the Paper AXE: Low-Cost Cross-Domain Web Structured Information Extraction

Getting Started GitHub


  • Extreme Efficiency --- Achieve state-of-the-art extraction accuracy with 0.6B parameter models.

  • LoRA Switching --- Dynamically switch between pruning and extraction adapters in a single VRAM footprint.

  • Grounded XPath (GXR) --- Automatically map extracted data back to the original DOM XPaths.

  • vLLM Support --- Built-in support for high-throughput batch processing with vLLM.

Why Axetract?

Traditional web extractors are often a trade-off between brittle manual heuristics and the prohibitive cost of Large Language Models. Axetract provides a solution: the intelligence and flexibility of an LLM with the efficiency of a local 0.6B model via intelligent DOM pruning.

Feature Axetract (0.6B)
Accuracy (SWDE F1) 88.1%
Compute Required Low (0.6B)
Cost Free (Local)
Privacy 100% On-Prem

Quick Start

from pydantic import BaseModel
from axetract import AXEPipeline

class Product(BaseModel):
    name: str
    price: float
    currency: str

pipeline = AXEPipeline.from_config()

result = pipeline.extract(
    input_data="https://example.com/product",
    schema=Product,
)

print(result.prediction)
# Output: {'name': 'Smartphone X', 'price': 999.0, 'currency': 'USD'}

Get Involved