HeatWave Release Notes
HeatWave AutoML supports text data types. To create a sample text
data set, use the fetch_20newsgroups
data
set from
scikit-learn.
This also uses the
pandas Python
library.
Press CTRL+C to copy$> from sklearn.datasets import fetch_20newsgroups $> import pandas as pd $> categories = ['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space'] $> newsgroups_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes'), categories=categories) $> df = pd.DataFrame([newsgroups_train.data, newsgroups_train.target.tolist()]).T $> df.columns = ['text', 'target'] $> targets = pd.DataFrame( newsgroups_train.target_names) $> targets.columns=['category'] $> out = pd.merge(df, targets, left_on='target', right_index=True).drop('target', axis=1) $> out = out[(out.text != '')] #remove empty strings $> out.to_csv('20newsgroups_train.csv', index=False) $> newsgroups_test = fetch_20newsgroups(subset='test', remove=('headers', 'footers', 'quotes'), categories=categories) $> df = pd.DataFrame([newsgroups_test.data, newsgroups_test.target.tolist()]).T $> df.columns = ['text', 'target'] $> targets = pd.DataFrame( newsgroups_test.target_names) $> targets.columns=['category'] $> out = pd.merge(df, targets, left_on='target', right_index=True).drop('target', axis=1) $> out = out[(out.text != '')] #remove empty strings $> out.to_csv('20newsgroups_test.csv', index=False)
Then load the csv files into MySQL:
Press CTRL+C to copymysql>DROP TABLE IF EXISTS `20newsgroups_train`; mysql>DROP TABLE IF EXISTS `20newsgroups_test`; mysql>CREATE TABLE `20newsgroups_train` (`text` LONGTEXT DEFAULT NULL, `target` VARCHAR(255) DEFAULT NULL); mysql>CREATE TABLE `20newsgroups_test` LIKE `20newsgroups_train`; mysql-js> util.importTable("20newsgroups_train.csv",{table: "20newsgroups_train", dialect: "csv-unix", skipRows:1}) mysql-js> util.importTable("20newsgroups_test.csv",{table: "20newsgroups_test", dialect: "csv-unix", skipRows:1})