Machine Learning/SqliteImport

From Noisebridge
Jump to navigation Jump to search

How to load data into SQLITE[edit]

Sqlite is the most widely deployed Database in the world. Its fast and great for prototyping.

In bash, create a new sqlite DB by invoking sqlite with the db file that you wish (file does not exist -> will create new one):

sqlite3 dbKdd.dbev

Create new tables: (these tables are examples for KDD competition data)

CREATE TABLE 'bridge' (
row INT, 
studentid VARCHAR(30), 
problemhierarchy TEXT,
problemname TEXT,
problemview INT,
stepname TEXT,
stepstarttime VARCHAR(30), 
firsttransactiontime VARCHAR(30), 
correcttransactiontime VARCHAR(30), 
stependtime VARCHAR(30), 
stepduration INT,
correctstepduration INT,
errorstepduration INT,
cfa INT,
incorrects INT,
hints INT,
corrects INT,
kcsub TEXT,
opportunitysub TEXT,
kctraced TEXT,
opportunitytraced TEXT
);

Then import the data by typing (note we're only loading test data since the training data file would be too big):

.mode tabs
.import "./bridge_to_algebra_2008_2009_test.txt" bridge

Then you can query the data with the standard SQL commands, for example:

select count(distinct studentid) from bridge;


Analogously to above, here's the table for the algebra data set:

CREATE TABLE 'algebra' (
row INT, 
studentid VARCHAR(30), 
problemhierarchy TEXT,
problemname TEXT,
problemview INT,
stepname TEXT,
stepstarttime VARCHAR(30), 
firsttransactiontime VARCHAR(30), 
correcttransactiontime VARCHAR(30), 
stependtime VARCHAR(30), 
stepduration INT,
correctstepduration INT,
errorstepduration INT,
cfa INT,
incorrects INT,
hints INT,
corrects INT,
kcsub TEXT,
opportunitysub TEXT,
kctraced TEXT,
opportunitytraced TEXT,
kcrules TEXT,
opportunityrules TEXT
);

Import it like this:

.mode tabs
.import ./algebra_2008_2009_test.txt algebra

To exit sqlite type:

.q

Ideally, we'd clean up the data/columns a bit more before importing... but this is a quick 'n dirty tutorial.


Andy