Dev.to3d ago1 min read

How to Use rs-trafilatura with spider-rs

spider is a high-performance async web crawler written in Rust. It discovers, fetches, and queues URLs — but content extraction is left to you. rs-trafilatura slots in as the extraction layer, giving you page-type-aware content extraction with quality scoring on every crawled page. Setup Add both crates to your Cargo.toml: [dependencies] rs-trafilatura = { version = "0.2", features = ["spider"] } spider = "2" tokio = { version = "1", features = ["full"] } The spider feature flag enables rs_trafi

Read original on dev.to