Using a Scrapy pipeline without using settings.py config

I'm avoiding using the Scrapy boilerplate generator because my code will be integrated as part of a wider project.

My current project tree is like this:

/ test |- items.py |- pipelines.py |- spider.py

My pipeline.py contains a pipeline that looks like this:

pipeline.py

import pymongo class MongoPipeline(object): collection_name = 'pages' [... rest of the pipeline class ...]

How can I use this class in spider.py without using a settings.py file and scrapy.conf?

spider.py

settings.py

scrapy.conf

I've tried importing the pipeline class and setting ITEM_PIPELINES in custom_settings but that throws ValueError: Error loading object 'MongoPipeline': not a full path:

ITEM_PIPELINES

custom_settings

ValueError: Error loading object 'MongoPipeline': not a full path

from pipelines import MongoPipeline class MySpider(CrawlSpider): name = 'x' allowed_domains = ['x'] start_urls = ['x'] custom_settings = 'ITEM_PIPELINES': 'MongoPipeline': 100 def parse(self, response): [...]

1 Answer
1

it should be:

custom_settings = 'ITEM_PIPELINES': 'YourProjectName.pipelines.MongoPipeline': 100

Thanks. It turned it was only pipelines.MongoPipeline for me (or module.PipelineClass for the generic case. This is when you're not using a scrapy project to run your spider.
– Juicy
Aug 12 at 9:28

pipelines.MongoPipeline

module.PipelineClass

@Juicy it's just a path to your pipelines.py
– gangabass
Aug 12 at 9:29

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy