Load Data
Listed here are all the modalitites by which you can load data inside the nnodely framework. There are three modalities to load a dataset inside nnodely:
Using a directory, each file represents a simulation, with time coherence between lines.
Using a dictionary, each element in the dictionary represents a variable.
Using a pandas dataframe.
[ ]:
# uncomment the command below to install the nnodely package
#!pip install nnodely
from nnodely import *
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>-- nnodely_v1.5.0 --<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
In the following lines a network is created.
[2]:
in1 = Input('in1')
target = Input('target')
relation = Fir(in1.tw(0.05))
output = Output('out', relation)
model = Modely(visualizer=TextVisualizer())
model.addMinimize('out', output, target.last())
model.neuralizeModel(0.01)
================================ nnodely Model =================================
{'Constants': {},
'Functions': {},
'Info': {'SampleTime': 0.01,
'nnodely_version': '1.5.0',
'ns': [5, 0],
'ntot': 5,
'num_parameters': 5},
'Inputs': {'in1': {'dim': 1, 'ns': [5, 0], 'ntot': 5, 'tw': [-0.05, 0]},
'target': {'dim': 1, 'ns': [1, 0], 'ntot': 1, 'sw': [-1, 0]}},
'Minimizers': {'out': {'A': 'Fir2', 'B': 'SamplePart4', 'loss': 'mse'}},
'Outputs': {'out': 'Fir2'},
'Parameters': {'PFir3W': {'dim': 1,
'tw': 0.05,
'values': [[0.7577804327011108],
[0.1862850785255432],
[0.5226411819458008],
[0.8208074569702148],
[0.10860830545425415]]}},
'Relations': {'Fir2': ['Fir', ['TimePart1'], 'PFir3W', None, 0],
'SamplePart4': ['SamplePart', ['target'], -1, [-1, 0]],
'TimePart1': ['TimePart', ['in1'], -1, [-0.05, 0]]}}
================================================================================
Load a dataset using a directory
Load a dataset inside the framework using a directory.
You must specify a name for the dataset, the folder path and also the structure of the data so that the framework will know which column must be used for every input of the network.
[3]:
train_folder = 'data'
data_struct = ['in1', '', 'target']
model.loadData(name='dataset', source=train_folder, format=data_struct)
============================ nnodely Model Dataset =============================
Dataset Name: dataset
Number of files: 1
Total number of samples: 28
Shape of target: (28, 1, 1)
Shape of in1: (28, 5, 1)
================================================================================
you can also specify various parameters such as the number of lines to skip, the delimiter to use between data and if you want to include the header of the file.
[4]:
model.loadData(name='dataset_2', source=train_folder, format=data_struct, skiplines=4, delimiter='\t', header=None)
============================ nnodely Model Dataset =============================
Dataset Name: dataset_2
Number of files: 1
Total number of samples: 24
Shape of target: (24, 1, 0)
Shape of in1: (24, 5, 1)
================================================================================
Load a dataset from a custom dictionary
you can build your own dataset with a dictionary containing all the necessary inputs of the network and passing it to the ‘source’ attribute
[5]:
import numpy as np
data_x = np.array(range(10))
data_a = 2
data_b = -3
dataset = {'in1': data_x, 'target': (data_a*data_x) + data_b}
model.loadData(name='dataset_3', source=dataset)
============================ nnodely Model Dataset =============================
Dataset Name: dataset_3
Number of files: 1
Total number of samples: 6
Shape of target: (6, 1, 1)
Shape of in1: (6, 5, 1)
================================================================================
Load a dataset from a pandas DataFrame
you can also use a pandas dataframe as source for loading a dataset inside the nnodely framework
[6]:
import pandas as pd
# Create a DataFrame with random values for each input
df = pd.DataFrame({
'in1': np.linspace(1,100,100, dtype=np.float32),
'target': np.linspace(1,100,100, dtype=np.float32)})
model.loadData(name='dataset_4', source=df)
============================ nnodely Model Dataset =============================
Dataset Name: dataset_4
Number of files: 1
Total number of samples: 96
Shape of target: (96, 1, 1)
Shape of in1: (96, 5, 1)
================================================================================
Resampling a pandas DataFrame
if you have a column representing time you can also use those values to resample the dataset using the sample time of the neuralized network
[7]:
df = pd.DataFrame({
'time': np.array([1.0,1.5,2.0,4.0,4.5,5.0,7.0,7.5,8.0,8.5], dtype=np.float32),
'in1': np.linspace(1,10,10, dtype=np.float32),
'target': np.linspace(1,10,10, dtype=np.float32)})
model.loadData(name='dataset_resampled', source=df, resampling=True)
============================ nnodely Model Dataset =============================
Dataset Name: dataset_resampled
Number of files: 1
Total number of samples: 747
Shape of target: (747, 1, 1)
Shape of in1: (747, 5, 1)
================================================================================
Get Samples from the Dataset
Once a dataset is loaded, you can use it to get random samples from the dataset. Set the ‘window’ argument to choose the number of samples to get from the specific dataset, and ‘index’ for selecting a specific time instant.
[8]:
sample = model.getSamples(dataset='dataset_4', window=5)
model(sample, sampled=True)
[8]:
{'out': [49.65475082397461,
52.050872802734375,
54.44699478149414,
56.843116760253906,
59.23923873901367]}