Workshop process & notes:
Exploratory
Four themes:
- Choosing and Evaluating Software
- Accessing and Ingesting
- Scaling and Performance
- Searching and Search Enhancement
Prototypes are very illustrating
...but will most certainly not scale
The data you handle will end up defining the system
Be aware of formats and representations
Licensing schemes are fundamental - a source of trust
Licenses on software, data, metadata, results of processing within the data well &c.
Preparing the representation format for data exposure
The general case of software scaling
-but will the internal representation format handle heterogeneous data?
...in very large amounts?
Standardizing representation formats can help, be aware that it must service:
- Minimal requirements for searches or identification
- Data exposition (linked data or enhancing searches)
- Different types of processing
- Segmenting of searches
A Glorified Object Repository?
Not just about searching or persistance
Very much about metadata
Business cases are good starting points for defining data wells