Aggregation json elements by sub-string

Aggregation json elements by sub-string

I have the following structure:

[ "name": "a-v1", "date": "2018-05-08T08:40:35.000Z" , "name": "a-v2", "date": "2018-05-20T08:40:35.000Z" , "name": "a-v3", "date": "2018-05-22T08:40:35.000Z" , "name": "b-v1", "date": "2018-02-08T08:40:35.000Z" , "name": "b-v2", "date": "2018-05-08T08:40:35.000Z" , "name": "b-v3", "date": "2018-05-10T08:40:35.000Z" , "name": "c-v1", "date": "2018-10-08T08:40:35.000Z" , "name": "c-v2", "date": "2018-11-08T08:40:35.000Z" , "name": "d-v1", "date": "2018-08-08T08:40:35.000Z" ]

Each name combines from type and version (In a-v1 for example, a in the name and v1 is the type).

name

a-v1

How can i create a list of all the name which are not the 2 latest versions?
In our case, The output would be:

name

a-v1 b-v1

Any idea how to do that in Python? I've been thinking about counting sub-strings. For example: To use - as a delimiter, And count how many times i find the left side of the string (aa, b, c). Is this possible to implement such as thing in Python? Any better ideas?

-

I don't see any problem with the approach you proposed .
– apple apple
12 mins ago

The output should contain also a-v3,d-v1,... ? why only a-v1 and b-v1?
– newbie
12 mins ago

or you may use something like priority queue with limit size, maybe overkill I think.
– apple apple
7 mins ago

@newbie I have 3 versions of a, And i want to keep only the 2 latest version, So and output would be a-v1 (Which is the oldest version). Same as for b. As for c and d, I don't have more than 2 versions of each, So the output would be empty for them.
– Omri
5 mins ago

a

a-v1

b

c

d

do you sort by postfix like v1 or you account for dates as well? do you need to check the order of v-somehting is in proper date?
– EPo
4 mins ago

v1

2 Answers
2

The problem would be easier with a slightly different data format.

You didn't write any code so I won't give you a complete answer:

data = ['name': 'a-v1', 'date': '2018-05-08T08:40:35.000Z', 'name': 'a-v2', 'date': '2018-05-20T08:40:35.000Z', 'name': 'a-v3', 'date': '2018-05-22T08:40:35.000Z', 'name': 'b-v1', 'date': '2018-02-08T08:40:35.000Z', 'name': 'b-v2', 'date': '2018-05-08T08:40:35.000Z', 'name': 'b-v3', 'date': '2018-05-10T08:40:35.000Z', 'name': 'c-v1', 'date': '2018-10-08T08:40:35.000Z', 'name': 'c-v2', 'date': '2018-11-08T08:40:35.000Z', 'name': 'd-v1', 'date': '2018-08-08T08:40:35.000Z'] temp = [d['name'].split('-') for d in data] # [['a', 'v1'], ['a', 'v2'], ['a', 'v3'], ['b', 'v1'], ['b', 'v2'], ['b', 'v3'], ['c', 'v1'], ['c', 'v2'], ['d', 'v1']] versions = [(letter, int(v[1:])) for letter, v in temp] sorted(versions)

It outputs:

[('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1), ('c', 2), ('d', 1)]

You could now try to use itertools.groupby to group the versions by letter and remove every version but the last two ones for each group.

itertools.groupby

Assuming your list L is pre-sorted, you can use itertools.groupby:

L

itertools.groupby

from itertools import groupby from operator import itemgetter groups = [list(vals)[:-2] for _, vals in groupby(map(itemgetter('name'), L), key=lambda x: x.split('-')[0])] res = list(chain.from_iterable(filter(None, groups))) # ['a-v1', 'b-v1']

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

kBDrOWIOkuFitxU0LUf8GLgIINuKI7beapRZa2yHJSEtn,kmfoT5MyIye5AGwlF1c8pHBR7YKum6 uVNU10Ln,j

搜尋此網誌

Sfyjdyy