Torchtext TabularDataset: data.Field doesn't contain actual imported data?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Torchtext TabularDataset: data.Field doesn't contain actual imported data?



I learned from the Torchtext documentation that the way to import csv files is through TabularDataset. I did it like this:


train = data.TabularDataset(path='./data.csv',
format='csv',
fields=[("label",data.Field(use_vocab=True,include_lengths=False)),
("statement",data.Field(use_vocab=True,include_lengths=True))],
skip_header=True)



"label" and "statement" are the header names of the 2 columns in my csv file. I defined them as data.Field, but "label" and "statement" don't seem to actually contain the data from my csv file, despite being recognized as data field objects by the console with no problem. I found out this issue when I tried to build a vocab list with statement.build_vocab(train, max_size=25000). I printed len(statement.vocab), the return is "2", which obviously doesn't reflect the actual data in the csv file. Did I do something wrong when importing the csv data or is my vocab building done wrong? Is there a separate method to put the data in the field objects? Thanks!!




1 Answer
1



The fields must be defined separately like this


TEXT = data.Field(sequential=True,tokenize=tokenize, lower=True, include_lengths=True)
LABEL = data.Field(sequential=True,tokenize=tokenize, lower=True)
train = data.TabularDataset(path='./data.csv',
format='csv',
fields=[("label",LABEL),
("statement",TEXT)],
skip_header=True)
test = data.TabularDataset(path='./test.csv',
format='csv',
fields=[("label",LABEL),
("statement",TEXT)],
skip_header=True)






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard