Aggregation json elements by sub-string
Clash Royale CLAN TAG#URR8PPP
Aggregation json elements by sub-string
I have the following structure:
[
"name": "a-v1",
"date": "2018-05-08T08:40:35.000Z"
,
"name": "a-v2",
"date": "2018-05-20T08:40:35.000Z"
,
"name": "a-v3",
"date": "2018-05-22T08:40:35.000Z"
,
"name": "b-v1",
"date": "2018-02-08T08:40:35.000Z"
,
"name": "b-v2",
"date": "2018-05-08T08:40:35.000Z"
,
"name": "b-v3",
"date": "2018-05-10T08:40:35.000Z"
,
"name": "c-v1",
"date": "2018-10-08T08:40:35.000Z"
,
"name": "c-v2",
"date": "2018-11-08T08:40:35.000Z"
,
"name": "d-v1",
"date": "2018-08-08T08:40:35.000Z"
]
Each name
combines from type and version (In a-v1
for example, a in the name and v1 is the type).
name
a-v1
How can i create a list of all the name
which are not the 2 latest versions?
In our case, The output would be:
name
a-v1
b-v1
Any idea how to do that in Python? I've been thinking about counting sub-strings. For example: To use -
as a delimiter, And count how many times i find the left side of the string (aa, b, c). Is this possible to implement such as thing in Python? Any better ideas?
-
The output should contain also a-v3,d-v1,... ? why only a-v1 and b-v1?
– newbie
12 mins ago
or you may use something like priority queue with limit size, maybe overkill I think.
– apple apple
7 mins ago
@newbie I have 3 versions of
a
, And i want to keep only the 2 latest version, So and output would be a-v1
(Which is the oldest version). Same as for b
. As for c
and d
, I don't have more than 2 versions of each, So the output would be empty for them.– Omri
5 mins ago
a
a-v1
b
c
d
do you sort by postfix like
v1
or you account for dates as well? do you need to check the order of v-somehting is in proper date?– EPo
4 mins ago
v1
2 Answers
2
The problem would be easier with a slightly different data format.
You didn't write any code so I won't give you a complete answer:
data = ['name': 'a-v1', 'date': '2018-05-08T08:40:35.000Z', 'name': 'a-v2', 'date': '2018-05-20T08:40:35.000Z', 'name': 'a-v3', 'date': '2018-05-22T08:40:35.000Z', 'name': 'b-v1', 'date': '2018-02-08T08:40:35.000Z', 'name': 'b-v2', 'date': '2018-05-08T08:40:35.000Z', 'name': 'b-v3', 'date': '2018-05-10T08:40:35.000Z', 'name': 'c-v1', 'date': '2018-10-08T08:40:35.000Z', 'name': 'c-v2', 'date': '2018-11-08T08:40:35.000Z', 'name': 'd-v1', 'date': '2018-08-08T08:40:35.000Z']
temp = [d['name'].split('-') for d in data]
# [['a', 'v1'], ['a', 'v2'], ['a', 'v3'], ['b', 'v1'], ['b', 'v2'], ['b', 'v3'], ['c', 'v1'], ['c', 'v2'], ['d', 'v1']]
versions = [(letter, int(v[1:])) for letter, v in temp]
sorted(versions)
It outputs:
[('a', 1),
('a', 2),
('a', 3),
('b', 1),
('b', 2),
('b', 3),
('c', 1),
('c', 2),
('d', 1)]
You could now try to use itertools.groupby
to group the versions by letter and remove every version but the last two ones for each group.
itertools.groupby
Assuming your list L
is pre-sorted, you can use itertools.groupby
:
L
itertools.groupby
from itertools import groupby
from operator import itemgetter
groups = [list(vals)[:-2] for _, vals in groupby(map(itemgetter('name'), L),
key=lambda x: x.split('-')[0])]
res = list(chain.from_iterable(filter(None, groups)))
# ['a-v1', 'b-v1']
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I don't see any problem with the approach you proposed .
– apple apple
12 mins ago