Scrapy json output file adding unnessary square bracket
Clash Royale CLAN TAG#URR8PPP
Scrapy json output file adding unnessary square bracket
Scrapy is outputting a flawed json file. When I try to work with the said json file, with
import json
import json
I am confronted with this error
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 311 column 94 (char 28466)
. This is caused be an unnecessary square bracket being added to the front of the json file.
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 311 column 94 (char 28466)
JSON file will look like this
[["city": "New York", "state": "New York", "rank": "1n", "population": ["8,622,698n"],
"city": "Los Angeles", "state": "California", "rank": "2n", "population": ["3,999,759n"]]
[["city": "New York", "state": "New York", "rank": "1n", "population": ["8,622,698n"],
"city": "Los Angeles", "state": "California", "rank": "2n", "population": ["3,999,759n"]]
I am using this command to crawl
scrapy crawl wiki -o items.json
scrapy crawl wiki -o items.json
When I manually remove the square bracket, it runs normally. This is the other python script
import json
import requests
with open ("items1.json", "r") as read_file:
data = json.load(read_file)
print(type(data))
import json
import requests
with open ("items1.json", "r") as read_file:
data = json.load(read_file)
print(type(data))
edit
the spider in question
# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-
import scrapy
import scrapy
class WikiSpider(scrapy.Spider):
class WikiSpider(scrapy.Spider):
name = "wiki"
allowed_domains = ["en.wikipedia.org"]
start_urls = ('https://en.wikipedia.or/wiki/List_of_United_States_cities_by_population')
def parse(self, response):
table = response.xpath('//table')[4]
trs = table.xpath('.//tr')[1:]
for tr in trs:
rank = tr.xpath('.//td[1]/text()').extract_first()
city = tr.xpath('.//td[2]//text()').extract_first()
state = tr.xpath('.//td[3]//text()').extract()[1]
population = tr.xpath('.//td[4]//text()').extract()
yield
'rank':rank,
'city': city,
'state': state,
'population':population
`
1 Answer
1
Surely there an unwanted [
in your JSON, but I did run your code and it worked as expected. Are you sure you aren't mixing up items1.json
and items.json
? Both are mentioned in your question.
[
items1.json
items.json
Besides that, I notice the Wikipedia URL is wrong but I believe it is just a typo.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Can you provide all the code? Perhaps you use custom pipelines for the formation of JSON.
– starpony
14 hours ago