Scrapy json output file adding unnessary square bracket

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Scrapy json output file adding unnessary square bracket



Scrapy is outputting a flawed json file. When I try to work with the said json file, with



import json


import json



I am confronted with this error



json.decoder.JSONDecodeError: Expecting ',' delimiter: line 311 column 94 (char 28466). This is caused be an unnecessary square bracket being added to the front of the json file.


json.decoder.JSONDecodeError: Expecting ',' delimiter: line 311 column 94 (char 28466)



JSON file will look like this



[["city": "New York", "state": "New York", "rank": "1n", "population": ["8,622,698n"],
"city": "Los Angeles", "state": "California", "rank": "2n", "population": ["3,999,759n"]]


[["city": "New York", "state": "New York", "rank": "1n", "population": ["8,622,698n"],
"city": "Los Angeles", "state": "California", "rank": "2n", "population": ["3,999,759n"]]



I am using this command to crawl



scrapy crawl wiki -o items.json


scrapy crawl wiki -o items.json



When I manually remove the square bracket, it runs normally. This is the other python script



import json
import requests
with open ("items1.json", "r") as read_file:
data = json.load(read_file)
print(type(data))


import json
import requests
with open ("items1.json", "r") as read_file:
data = json.load(read_file)
print(type(data))



edit



the spider in question



# -*- coding: utf-8 -*-


# -*- coding: utf-8 -*-



import scrapy


import scrapy



class WikiSpider(scrapy.Spider):


class WikiSpider(scrapy.Spider):


name = "wiki"

allowed_domains = ["en.wikipedia.org"]

start_urls = ('https://en.wikipedia.or/wiki/List_of_United_States_cities_by_population')

def parse(self, response):
table = response.xpath('//table')[4]
trs = table.xpath('.//tr')[1:]
for tr in trs:
rank = tr.xpath('.//td[1]/text()').extract_first()
city = tr.xpath('.//td[2]//text()').extract_first()
state = tr.xpath('.//td[3]//text()').extract()[1]
population = tr.xpath('.//td[4]//text()').extract()

yield
'rank':rank,
'city': city,
'state': state,
'population':population



`





Can you provide all the code? Perhaps you use custom pipelines for the formation of JSON.
– starpony
14 hours ago




1 Answer
1



Surely there an unwanted [ in your JSON, but I did run your code and it worked as expected. Are you sure you aren't mixing up items1.json and items.json? Both are mentioned in your question.


[


items1.json


items.json



Besides that, I notice the Wikipedia URL is wrong but I believe it is just a typo.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard