Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Getting help

Having trouble? We’d like to help!

First steps

Scrapy at a glance

  • Understand what Scrapy is and how it can help you.

Installation guide

  • Get Scrapy installed on your computer.

Scrapy Tutorial

  • Write your first Scrapy project.

Examples

  • Learn more by playing with a pre-made Scrapy project.

Basic concepts

Command line tool

  • Learn about the command-line tool used to manage your Scrapy project.

Spiders

  • Write the rules to crawl your websites.

Selectors

  • Extract the data from web pages using XPath.

Scrapy shell

  • Test your extraction code in an interactive environment.

Items

  • Define the data you want to scrape.

Item Loaders

  • Populate your items with the extracted data.

Item Pipeline

  • Post-process and store your scraped data.

Feed exports

  • Output your scraped data using different formats and storages.

Requests and Responses

  • Understand the classes used to represent HTTP requests and responses.

Link Extractors

  • Convenient classes to extract links to follow from pages.

Settings

Exceptions

  • See all available exceptions and their meaning.

Built-in services

Logging

  • Learn how to use Python’s builtin logging on Scrapy.

Stats Collection

  • Collect statistics about your scraping crawler.

Sending e-mail

  • Send email notifications when certain events occur.

Telnet Console

  • Inspect a running crawler using a built-in Python console.

Web Service

  • Monitor and control a crawler using a web service.

Solving specific problems

Frequently Asked Questions

  • Get answers to most frequently asked questions.

Debugging Spiders

  • Learn how to debug common problems of your scrapy spider.

Spiders Contracts

  • Learn how to use contracts for testing your spiders.

Common Practices

  • Get familiar with some Scrapy common practices.

Broad Crawls

  • Tune Scrapy for crawling a lot domains in parallel.

Using your browser’s Developer Tools for scraping

  • Learn how to scrape with your browser’s developer tools.

Selecting dynamically-loaded content

  • Read webpage data that is loaded dynamically.

Debugging memory leaks

  • Learn how to find and get rid of memory leaks in your crawler.

Downloading and processing files and images

  • Download files and/or images associated with your scraped items.

Deploying Spiders

  • Deploying your Scrapy spiders and run them in a remote server.

AutoThrottle extension

  • Adjust crawl rate dynamically based on load.

Benchmarking

  • Check how Scrapy performs on your hardware.

Jobs: pausing and resuming crawls

  • Learn how to pause and resume crawls for large spiders.

Extending Scrapy

Architecture overview

  • Understand the Scrapy architecture.

Downloader Middleware

  • Customize how pages get requested and downloaded.

Spider Middleware

  • Customize the input and output of your spiders.

Extensions

  • Extend Scrapy with your custom functionality

Core API

  • Use it on extensions and middlewares to extend Scrapy functionality

Signals

  • See all available signals and how to work with them.

Item Exporters

  • Quickly export your scraped items to a file (XML, CSV, etc).

All the rest

Release notes

  • See what has changed in recent Scrapy versions.

Contributing to Scrapy

  • Learn how to contribute to the Scrapy project.

Versioning and API Stability

  • Understand Scrapy versioning and API stability.