Skip to content

AXEtract

Home

abdo-Mansour/axetract

AXEtract

High-performance, LoRA-powered web data extraction. Based on the Paper AXE: Low-Cost Cross-Domain Web Structured Information Extraction

Getting Started GitHub

Extreme Efficiency --- Achieve state-of-the-art extraction accuracy with 0.6B parameter models.
LoRA Switching --- Dynamically switch between pruning and extraction adapters in a single VRAM footprint.
Grounded XPath (GXR) --- Automatically map extracted data back to the original DOM XPaths.
vLLM Support --- Built-in support for high-throughput batch processing with vLLM.

Why Axetract?

Traditional web extractors are often a trade-off between brittle manual heuristics and the prohibitive cost of Large Language Models. Axetract provides a solution: the intelligence and flexibility of an LLM with the efficiency of a local 0.6B model via intelligent DOM pruning.

Feature	Axetract (0.6B)
Accuracy (SWDE F1)	88.1%
Compute Required	Low (0.6B)
Cost	Free (Local)
Privacy	100% On-Prem

Quick Start

from pydantic import BaseModel
from axetract import AXEPipeline

class Product(BaseModel):
    name: str
    price: float
    currency: str

pipeline = AXEPipeline.from_config()

result = pipeline.extract(
    input_data="https://example.com/product",
    schema=Product,
)

print(result.prediction)
# Output: {'name': 'Smartphone X', 'price': 999.0, 'currency': 'USD'}

Get Involved