Torchtext TabularDataset: data.Field doesn't contain actual imported data?
Clash Royale CLAN TAG#URR8PPP
Torchtext TabularDataset: data.Field doesn't contain actual imported data?
I learned from the Torchtext documentation that the way to import csv files is through TabularDataset. I did it like this:
train = data.TabularDataset(path='./data.csv',
format='csv',
fields=[("label",data.Field(use_vocab=True,include_lengths=False)),
("statement",data.Field(use_vocab=True,include_lengths=True))],
skip_header=True)
"label" and "statement" are the header names of the 2 columns in my csv file. I defined them as data.Field, but "label" and "statement" don't seem to actually contain the data from my csv file, despite being recognized as data field objects by the console with no problem. I found out this issue when I tried to build a vocab list with statement.build_vocab(train, max_size=25000). I printed len(statement.vocab), the return is "2", which obviously doesn't reflect the actual data in the csv file. Did I do something wrong when importing the csv data or is my vocab building done wrong? Is there a separate method to put the data in the field objects? Thanks!!
1 Answer
1
The fields must be defined separately like this
TEXT = data.Field(sequential=True,tokenize=tokenize, lower=True, include_lengths=True)
LABEL = data.Field(sequential=True,tokenize=tokenize, lower=True)
train = data.TabularDataset(path='./data.csv',
format='csv',
fields=[("label",LABEL),
("statement",TEXT)],
skip_header=True)
test = data.TabularDataset(path='./test.csv',
format='csv',
fields=[("label",LABEL),
("statement",TEXT)],
skip_header=True)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.