Storage - OwnCloud

256mb RAM 140MB HDD /PHP

Choosen frontend it's OwnCloud.

As frontend functionality should be as functional as Copy or Dropbox.

Interface covers user standard user needs. At least:

- Nice intuitive interface (ajax, drag&drop interface)

- Upload/Download files from your computer via WebUI

- Public link manager: it allow to create or delete public links to files or folders

- File changes history

- Restore deleted files

- Encrypted storage

- Share files with a link. You can include decryption key in the URL (like mega)

- Share folders with a link. You can include decryption key in the URL (like mega)

- Encryption JavaScript on client side. Before data goes out of your computer, it should be encrypted, to prevent man-in-the-middle or other hacking issues.

- Encryption on server side. Data stored it’s encrypted and nobody than you can access this data.

- Configure upload transfer rate

- File preview on file manager (without enter to another page)

First Start

First time you are login to frontend it should:

- Create a user & administator

- Create secure password (checked by a secure metter)

Tracking Owncloud model

OwnCloud has 2 tracking methods, that we should block

- It's trying to connect to some domain (probably www.owncloud.org)

- It's sending code to browser, and browser it's connecting to tracking page

Encryption proposed model – General Overview

Security encryption has 3 layers. It's well known that more layers does not add more security, but each encryption protects in a step of process.

On file transfers, file it's encrypted before leaves browser (on upload) or when it arribes to browser (on download).. Original filename it's used, but encrypted content.

On file storage, OwnCloud encrypts it based user key. And stored in a DM-Crypt

Upload File model

When you upload a file to storage it should:

1. Remove critical file metadata before leave browser

2. Encrypt it with JavaScript before leaves the browser (if it's checked on upload form, or user configuration. It's called «secure mode»)

Upload/Download Client Browser JavaScript Encryption

On upload or download, or collaboration file, it asks you password to access your file, if it's configured to be in «secure mode». Password never leaves your browser.

In this way,

- Onwcloud doesn't know what's inside your files

- Each file can have a different password to be decrypted (usefull to share files with other people)

- And can be running without «secure mode» in order to be less secure but more usable, for a regular user.

OwnCloud Bug Fixing

As all softwares, OwnCloud has some error sreported on bugtracker to be solved, too. All this list it's included on development OwnCloud Appendix 3.1

OwnCloud Implementation Step 1

In a first implementation phase, OwnCloud loose collaborative online ocument editing features for files in «secure mode» (wiles which where encrypted in client-side javascript ).

OwnCloud Implementation Step 2

In a second implementation phase, OwnCloud get again collaborative online document editing features for files in «secure mode», by developing new P2P connection for collaborative platform. Described in Service 7.

Share Link proposal

When user creates a link to share a document, it should be possible to add password to decrypt file without asking any password, like mega, if this file it's in «secure mode»

When User 2 goes to shared link, it goes to the User 1 frontend and downloads the file.

When it’s completly downloaded, browser javascript asks for the decrypting password if it's needed and if isn’t included on the link.

If decrypting password it’s included on the link, decrypting password would never leave your browser, or log on the HTTP server Request URL

File Indexation

When a file isn't on «secure mode» OwnCloud index it, the filename (as is doing now) and its content.

When user it's searching for a file, can choose if want to search only by filename, by content, or both.

Specification

Dropbox - GDrive - Mega	OwnCloud
Upload files from your computer via WebUI	X
Sync a folder of your computer via Agent	X
Link manager (remove download links to be private again)	X
Filesystem management from WebUI	X
Separated view of uploaded images	X
Filechanges history	X
Feed RSS of file changes	X
Collaborative online editing	X
Restore and download deleted files	X
Encrypted storage	X
Mobile applications. Download and steam your files	X
API for 3rd parties plugins	X
WebUI in lot of languages	X
List of recent added/modified files	X
Firefox extension	X
Create collaborative text document	X
Share files with a link (including decryption key on URL or not)	X (without decryption key)
Share folders with a link (including decryption key on URL or not)	X (without decryption key)
Share files by mail (including decryption key on URL or not)
Share folders by mail (including decryption key on URL or not)
Encryption on client side and server side	Server Side
Encryption key management
2048-bit RSA (or better)	?
RSA Key based on you password + entropy	?
Configure upload tranfer rate
Lost password (master crypto key) = data loss	X (but it have a master key)
Create collaborative presentation
Create collaborative spreadsheet
Create collaborative form/poll
Create collaborative draw
Show file preferences (which apps can open, owners, editors, etc)
Stores cache on web browser. It loads faster
File preview on file manager (without enter to another page)
Owner list
Connection history
Mark documents as favorites
File manager view files in grid
File manager view files in list
Statistics hardisk space
Statistics used bandwidth
Filechanges history search by date
Drag'n'Drop files to Upload	X
Option to not keep history
Transfer folder to any other user
List connected devices
Resume file upload by WebUI
Resume file upload by Agent
Create public upload filders
Share folders, files and links with other users
Create user groups
Applications (bookmarks, IM, music, etc)	X
Nice intuitive interface	X
Metatag system to improve finding system
Resume filetransfers via WebUi

Development Plan

Total develop	Hours 377h	Cost: 15080€

First Month (bug month)	Hours: 172,5h	Cost: 6900€
* Incomplete MP4 Files Download	6,25h	250€
* Share with expiration	7,5h	300€
* Share with group shares back to original user	5h	200€
* (Un-)Share a file after another user edited this	6,25h	250€
* Rename folder while files are uploading to it fails	6,25h	250€
* Delete shared folder while uploading	5h	200€
* inconsistancy in uploading 0 byte file	5h	200€
* Server to server sharing is not working properly with single files and documents	6,25h	250€
* Moving data directory causes redownload and lost of shares	5h	200€
* gzip decompess fails on some archives	6,25h	250€
* Warning on CLI password change if encryption is enabled	5h	200€
* Can't activate apps when using SSL	7,5h	300€
* Cannot jump to the folder from search result for shared folders/files	6,25h	250€
* E-mail notification bug (returns /var/www directory inside e-mail)	5h	200€
* Shared video freezes web browser (FireFox, Chrome, etc)	3,75h	150€
* Encryption not working.	6,25h	250€
* Opening Shared MS *.doc files fail on normal users.	2,5h	100€
* Disable "Share files" breaks files app	7,5h	300€
* Sending password reset eMail doesn't work	6,25h	250€
* OC6 Web Interface Freezing	8,75h	350€
* Mounted ftps doesn't show all the files and directories	6,25h	250€
* User files are readable by administrator even when Password recovery is disabled by user	6,25h	250€
* SMB/CIFS password visible in owncloud.log	3,75h	150€
* oc7 deleting user data!	6,25h	250€
* Fix username change [WIP	2,5h	100€
* force loading of encryption app to show correct error	6,25h	250€
* No File Size shown for files bigger than 2 GB with FTP and "External storage support"	6,25h	250€
* sharing files don't decrypt	3,75h	150€
* Deleted Files Recovery doesn't work	6,25h	250€
* Logout don't destroy $_SESSION	3,75h	150€
* Password popup	3,75h	150€

Second Month	Hours: 112h	Cost: 4480€
* Can't Change Full Name for Users	3,75h	150€
* OC7.0.0 Cannot upload 2 directories through Drag and Drop	6,25h	250€
* A folder shared with two groups appears twice for an user in these two groups.	7,5h	300€
* rd level files/folders looking like they do not inherit privileges/permissions	6,25h	250€
* "Pictures" view mode bug with folder ending by a space	6,25h	250€
* [7.0.1] Confusion with share to user name	6,25h	250€
* [7.0.1] Drag and drop folder gives no error	7,5h	300€
* UI improvements for external storage configuration	5h	200€
* Folder specific views	6,25h	250€
* Show that the encryption recovery key password is set (usability)	5h	200€
* restoring deleted shares	5h	200€
* Seamless integration with Libreoffice	60h	2400€

Third Month	Hours: 92,5h	Cost: 3700€
* Share files with a link (including decryption key on URL or not) Now cannot include decryption key on URL	6,25h	250€
* Share folders with a link (including decryption key on URL or not) Now cannot include decryption key on URL	6,25h	250€
* Share files by mail (including decryption key on URL or not)	7,5h	300€
* Share folders by mail including decryption key on URL or not)	6,25h	250€
* Configure upload transfer rate	7,5h	300€
* Stores cache on web browser to load webpage faster	6,25h	250€
* Statistics hardisk space	3,75h	150€
* JS to encrypt on webbrowser files before upload	6,25h	250€
* List connected devices	5h	200€
* Resume file upload on WebUI	10h	400€
* Create public upload folders with choosen criteria \\(not upload more than 1gb or whatever)	7,5h	300€
* Share folders, files and links with other OwnCloud users	8,75h	350€
* Resume file downloads on WebUi	8,75h	350€
* Manage password & encryption keys	5h	200€
* Private password for restore	3,75h	150€

Decentralized Backup - I2P+Tahoe-LAFS

50mb ram 12mb hdd Python

Introduction

This document addresses the analysis of i2p-Tahoe-LAFS version 1.10 in order to implement three new features:

Quota management
Connection to multiple Helpers
Automatic spreading of Introducers & Helpers furls.

We begin with a short introduction to Tahoe-LAFS, and then proceed to analyse the requirements for the 3 proposed features. The analysis includes a review of how related functionality is now implemented in Tahoe-LAFS, which files should be modified and what modifications should be included for each of those files.

Short introduction to Tahoe-LAFS architecture

As a short reminder, Tahoe-LAFS grid is composed of several types of nodes:

Introducer: keeps track of StorageServer nodes connected to the grid and publishes it so that StorageClients know which are the nodes they can connect to.
StorageServer: form the distributed data store.
HelperServer: a intermediate server which can be used to minimize upload time. Due to the redundancy introduced by erasure coding, uploading a file to the grid can be an order of magnitude slower than reading from it. The HelperServer acts as a proxy which receives the encrypted data from the StorageClient (encrypted, but with no redundancy), performs the erasure encoding and distributes files to StorageServers in the grid.
StorageClient: once they get the list of StorageServers in the grid from one introducer, they can connect to read and write data on the grid. Read operations are performed connecting directly to StorageServer nodes. Write operations can be performed connecting directly or using a HelperServer (only for immutable files as of Tahoe-LAFS 1.10.0).

For a full introduction to Tahoe-LAFS, see the docs folder on source tree. You can also check the tutorial published on Nilestore project's wiki¹.

Diagram showing tahoe-lafs network topology (from tahoe-lafs official documentation). (Notice that Introducers and Helpers are not shown in it)

Code structure in Tahoe-LAFS

Tahoe-LAFS is developed in Python (2.6.6 – 2.x), and has a great test code coverage (around 92% for 1.10). In this paragraph we make a short description of Tahoe-LAFS source code.

We start by looking at Tahoe-LAFS source folder structure:

allmydata

├── frontends

├── immutable

│ └── downloader

├── introducer

├── mutable

├── scripts

├── storage

├── test

├── util

├── web

│ └── static

│ ├── css

│ └── img

└── windows

As a general rule, code specific for each feature's Client and Server is placed under that feature's folder, as client.py and server.py. All test files are placed under test folder.

Some files relevant to the rest of the document²:

allmydata/client.py: this is the main file for the Tahoe-LAFS, contains the Client class which initializes most of the services (StorageFarmBroker, StorageServer, Web/ftp/sFtp frontends, Helper...)
allmydata/introducer/server.py: the server side of the Introducer.
allmydata/introducer/client.py: the client side of the Introducer.
storage/server.py the server side of the storage.
allmydata/immutable/upload.py manages connections to the Helper from the client side.
allmydata/immutable/offloaded.py the Helper, server side
allmydata/storage_client.py functions related to the storage client.

Analysis of proposed features

Feature 1: Quotas

Introduction

Support for quota management ('accounting') in Tahoe-LAFS has been an ongoing development for several years. The schema being used is based on the use of accounts, which could be managed by a central AuthorityServer or independently by each of the StorageServers (this option being suited only for smaller grids). A detailed description of the intended accounting architecture and development roadmap can be found in the project's documentation³.

The objective of quota management in CommunityCube is to ensure that a user which contributes to the grid with a given amount space can use the equivalent of that amount in it.

User accounts pose obvious risks regarding privacy/anonymity concerns. We have thus investigated a different approach to the problem: control quota management from the StorageClient itself.

This implementation comes, however, with its own set of drawbacks: it can be easily defeated by using a modified StorageClient and it requires to keep a local record of files stored in the grid⁴ or (something Tahoe-LAFS does not require as long as you keep a copy of the capabilities you are interested in), which is also a big threat from the privacy point of view. As an alternative to keeping a record of every uploaded file, users can be forced to use a single directory as root for all the files they upload (which is known as a rootcap⁵). The content under that directory can be accounted with a call to 'tahoe deep-check --add-lease ALIAS:', where ALIAS stands for the alias of the rootcap directory.

This approach seems to be the most compatible with CommunityCube's objectives, and its adoption relies on the belief that CommunityCube's users will be 'fair-play' to the rest of the community members.

The proposed system can be easily bypassed by malicious actors, but it will however ensure that the grid is not abused due to user mistakes or lack of knowledge on the grid's working principles and capacity.

Description of proposed feature

Quota management will be handled by the StorageClient, which imposes the limits on what can be uploaded to the grid. When a file is to be uploaded, the StorageClient:

Checks that the storage server is running and writable
Calculates the space it is sharing in the associated storage server.
1. Available disk space
2. Reserved disk space (minimum free space to be reserved)
3. Size of stored shares
The size of leases it holds on files stored on grid (requires a catalog of uploaded files and lease expiration/renewal tracking).
Estimates the assigned space as 'Sharing space (available + stored shares)'
Checks that Used space (i.e. sum of leases) is smaller than 'Sharing space'.
Retrieve the grid's “X out K” parameters used in erasure encoding.
Verifies that predicted used space and reports an error if the available quota is exceeded.

Existing code analysis

We will have a look at how the following functionality is implemented in Tahoe-LAFS:

The upload of a file (to the Helper or directly to other StorageServers via the StorageFarmBroker).
Check if the StorageServer is running.
The statistics associated with the space used and available on the StorageServer.
The moment the leases are renewed in remote StorageServers.

In the next paragraphs we show how the system works with the corresponding code.

The upload of a file

The upload takes place at different classes depending on the type of data being uploaded. For immutable files, it is the Uploader service, which is defined in allmydata/immutable/upload.py. For mutable files, it is defined in allmydata/mutable/filenode.py.

These functions can be accessed from the main client, using an intermediate call to a NodeMaker instance, or directly calling the uploader:

File: allmydata/client.py

class Client(node.Node, pollmixin.PollMixin):

(…)

# these four methods are the primitives for creating filenodes and

# dirnodes. The first takes a URI and produces a filenode or (new-style)

# dirnode. The other three create brand-new filenodes/dirnodes.

def create_node_from_uri(self, write_uri, read_uri=None, deep_immutable=False, name="<unknown name>"):

# This returns synchronously.

# Note that it does *not* validate the write_uri and read_uri; instead we

# may get an opaque node if there were any problems.

return self.nodemaker.create_from_cap(write_uri, read_uri, deep_immutable=deep_immutable, name=name)

def create_dirnode(self, initial_children={}, version=None):

d = self.nodemaker.create_new_mutable_directory(initial_children, version=version)

return d

def create_immutable_dirnode(self, children, convergence=None):

return self.nodemaker.create_immutable_directory(children, convergence)

def create_mutable_file(self, contents=None, keysize=None, version=None):

return self.nodemaker.create_mutable_file(contents, keysize,

version=version)

def upload(self, uploadable):

uploader = self.getServiceNamed("uploader")

return uploader.upload(uploadable)

File: allmydata/nodemaker.py

class NodeMaker:

implements(INodeMaker)

(…)

def create_mutable_file(self, contents=None, keysize=None, version=None):

if version is None:

version = self.mutable_file_default

n = MutableFileNode(self.storage_broker, self.secret_holder,

self.default_encoding_parameters, self.history)

d = self.key_generator.generate(keysize)

d.addCallback(n.create_with_keys, contents, version=version)

d.addCallback(lambda res: n)

return d

def create_new_mutable_directory(self, initial_children={}, version=None):

# initial_children must have metadata (i.e. {} instead of None)

for (name, (node, metadata)) in initial_children.iteritems():

precondition(isinstance(metadata, dict),

"create_new_mutable_directory requires metadata to be a dict, not None", metadata)

node.raise_error()

d = self.create_mutable_file(lambda n:

MutableData(pack_children(initial_children,

n.get_writekey())),

version=version)

d.addCallback(self._create_dirnode)

return d

def create_immutable_directory(self, children, convergence=None):

if convergence is None:

convergence = self.secret_holder.get_convergence_secret()

packed = pack_children(children, None, deep_immutable=True)

uploadable = Data(packed, convergence)

d = self.uploader.upload(uploadable)

d.addCallback(lambda results:

self.create_from_cap(None, results.get_uri()))

d.addCallback(self._create_dirnode)

return d

File: allmydata/immutable/upload.py

class Uploader(service.MultiService, log.PrefixingLogMixin):

"""I am a service that allows file uploading. I am a service-child of the

Client.

"""

(...)

def upload(self, uploadable):

"""

Returns a Deferred that will fire with the UploadResults instance.

"""

assert self.parent

assert self.running

uploadable = IUploadable(uploadable)

d = uploadable.get_size()

def _got_size(size):

default_params = self.parent.get_encoding_parameters()

precondition(isinstance(default_params, dict), default_params)

precondition("max_segment_size" in default_params, default_params)

uploadable.set_default_encoding_parameters(default_params)

if self.stats_provider:

self.stats_provider.count('uploader.files_uploaded', 1)

self.stats_provider.count('uploader.bytes_uploaded', size)

if size <= self.URI_LIT_SIZE_THRESHOLD:

uploader = LiteralUploader()

return uploader.start(uploadable)

else:

eu = EncryptAnUploadable(uploadable, self._parentmsgid)

d2 = defer.succeed(None)

storage_broker = self.parent.get_storage_broker()

if self._helper:

uploader = AssistedUploader(self._helper, storage_broker)

d2.addCallback(lambda x: eu.get_storage_index())

d2.addCallback(lambda si: uploader.start(eu, si))

else:

storage_broker = self.parent.get_storage_broker()

secret_holder = self.parent._secret_holder

uploader = CHKUploader(storage_broker, secret_holder)

d2.addCallback(lambda x: uploader.start(eu))

self._all_uploads[uploader] = None

if self._history:

self._history.add_upload(uploader.get_upload_status())

def turn_verifycap_into_read_cap(uploadresults):

# Generate the uri from the verifycap plus the key.

d3 = uploadable.get_encryption_key()

def put_readcap_into_results(key):

v = uri.from_string(uploadresults.get_verifycapstr())

r = uri.CHKFileURI(key, v.uri_extension_hash, v.needed_shares, v.total_shares, v.size)

uploadresults.set_uri(r.to_string())

return uploadresults

d3.addCallback(put_readcap_into_results)

return d3

d2.addCallback(turn_verifycap_into_read_cap)

return d2

d.addCallback(_got_size)

def _done(res):

uploadable.close()

return res

d.addBoth(_done)

return d

We have highlighted the callback that will start the upload Upload._got_size and the three available ways to upload immutable content: with a LiteralUploader for small files, with a Helper or directly with the StorageFarmBroker.

In the case of mutable files, we have to check the moment when we upload a new file and when we want to modify it (of fully overwrite it via MutableFileNode.overwrite or MutableFileNode.update):

File: allmydata/mutable/filenode.py

class MutableFileNode:

implements(IMutableFileNode, Icheckable)

implements(IMutableFileNode, ICheckable)

def __init__(self, storage_broker, secret_holder,

default_encoding_parameters, history):

self._storage_broker = storage_broker

self._secret_holder = secret_holder

self._default_encoding_parameters = default_encoding_parameters

self._history = history

self._pubkey = None # filled in upon first read

self._privkey = None # filled in if we're mutable

# we keep track of the last encoding parameters that we use. These

# are updated upon retrieve, and used by publish. If we publish

# without ever reading (i.e. overwrite()), then we use these values.

self._required_shares = default_encoding_parameters["k"]

self._total_shares = default_encoding_parameters["n"]

self._sharemap = {} # known shares, shnum-to-[nodeids]

self._most_recent_size = None

(...)

def create_with_keys(self, (pubkey, privkey), contents,

version=SDMF_VERSION):

"""Call this to create a brand-new mutable file. It will create the

shares, find homes for them, and upload the initial contents (created

with the same rules as IClient.create_mutable_file() ). Returns a

Deferred that fires (with the MutableFileNode instance you should

use) when it completes.

"""

self._pubkey, self._privkey = pubkey, privkey

pubkey_s = self._pubkey.serialize()

privkey_s = self._privkey.serialize()

self._writekey = hashutil.ssk_writekey_hash(privkey_s)

self._encprivkey = self._encrypt_privkey(self._writekey, privkey_s)

self._fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s)

if version == MDMF_VERSION:

self._uri = WriteableMDMFFileURI(self._writekey, self._fingerprint)

self._protocol_version = version

elif version == SDMF_VERSION:

self._uri = WriteableSSKFileURI(self._writekey, self._fingerprint)

self._protocol_version = version

self._readkey = self._uri.readkey

self._storage_index = self._uri.storage_index

initial_contents = self._get_initial_contents(contents)

return self._upload(initial_contents, None)

(…)

def overwrite(self, new_contents):

"""

I overwrite the contents of the best recoverable version of this

mutable file with new_contents. This is equivalent to calling

overwrite on the result of get_best_mutable_version with

new_contents as an argument. I return a Deferred that eventually

fires with the results of my replacement process.

"""

# TODO: Update downloader hints.

return self._do_serialized(self._overwrite, new_contents)

(…)

def upload(self, new_contents, servermap):

"""

I overwrite the contents of the best recoverable version of this

mutable file with new_contents, using servermap instead of

creating/updating our own servermap. I return a Deferred that

fires with the results of my upload.

"""

# TODO: Update downloader hints

return self._do_serialized(self._upload, new_contents, servermap)

def modify(self, modifier, backoffer=None):

"""

I modify the contents of the best recoverable version of this

mutable file with the modifier. This is equivalent to calling

modify on the result of get_best_mutable_version. I return a

Deferred that eventually fires with an UploadResults instance

describing this process.

"""

# TODO: Update downloader hints.

return self._do_serialized(self._modify, modifier, backoffer)

In addition to the relevant functions we have also highlighted the values for k and n, which are required to estimate how much disk space will take a new file.

Check if the StorageServer is running.

The StorageServer is initialized in allmydata/client.py, in function Client.init_storage, according to configuration values.

File: allmydata/client.py

class Client(node.Node, pollmixin.PollMixin):

(…)

def init_storage(self):

# should we run a storage server (and publish it for others to use)?

if not self.get_config("storage", "enabled", True, boolean=True):

return

readonly = self.get_config("storage", "readonly", False, boolean=True)

storedir = os.path.join(self.basedir, self.STOREDIR)

data = self.get_config("storage", "reserved_space", None)

try:

reserved = parse_abbreviated_size(data)

except ValueError:

log.msg("[storage]reserved_space= contains unparseable value %s"

% data)

raise

if reserved is None:

reserved = 0

(…)

ss = StorageServer(storedir, self.nodeid,

reserved_space=reserved,

discard_storage=discard,

readonly_storage=readonly,

stats_provider=self.stats_provider,

expiration_enabled=expire,

expiration_mode=mode,

expiration_override_lease_duration=o_l_d,

expiration_cutoff_date=cutoff_date,

expiration_sharetypes=expiration_sharetypes)

self.add_service(ss)

d = self.when_tub_ready()

# we can't do registerReference until the Tub is ready

def _publish(res):

furl_file = os.path.join(self.basedir, "private", "storage.furl").encode(get_filesystem_encoding())

furl = self.tub.registerReference(ss, furlFile=furl_file)

ann = {"anonymous-storage-FURL": furl,

"permutation-seed-base32": self._init_permutation_seed(ss),

}

current_seqnum, current_nonce = self._sequencer()

for ic in self.introducer_clients:

ic.publish("storage", ann, current_seqnum, current_nonce, self._node_key)

d.addCallback(_publish)

d.addErrback(log.err, facility="tahoe.init",

level=log.BAD, umid="aLGBKw")

To find out if the StorageServer is running we have to recover the parent of the service we are at (i.e. Uploader ). We will be working with services which are 'children' of main Client instance, and we can check if the client is running a given service (i.e. the storage service) as it is done in allmydata/web/root.py:

File: allmydata/web/root.py

class Root(rend.Page):

(...)

def __init__(self, client, clock=None, now=None):

(...)

try:

s = client.getServiceNamed("storage")

except KeyError:

s = None

(...)

The statistics associated with the space used and available on the StorageServer.

From the StorageServer service we get access to the StorageServer.get_stats function:

class StorageServer(service.MultiService, Referenceable):

(…)

def get_stats(self):

# remember: RIStatsProvider requires that our return dict

# contains numeric values.

stats = { 'storage_server.allocated': self.allocated_size(), }

stats['storage_server.reserved_space'] = self.reserved_space

for category,ld in self.get_latencies().items():

for name,v in ld.items():

stats['storage_server.latencies.%s.%s' % (category, name)] = v

try:

disk = fileutil.get_disk_stats(self.sharedir, self.reserved_space)

writeable = disk['avail'] > 0

# spacetime predictors should use disk_avail / (d(disk_used)/dt)

stats['storage_server.disk_total'] = disk['total']

stats['storage_server.disk_used'] = disk['used']

stats['storage_server.disk_free_for_root'] = disk['free_for_root']

stats['storage_server.disk_free_for_nonroot'] = disk['free_for_nonroot']

stats['storage_server.disk_avail'] = disk['avail']

except AttributeError:

writeable = True

except EnvironmentError:

log.msg("OS call to get disk statistics failed", level=log.UNUSUAL)

writeable = False

if self.readonly_storage:

stats['storage_server.disk_avail'] = 0

writeable = False

stats['storage_server.accepting_immutable_shares'] = int(writeable)

s = self.bucket_counter.get_state()

bucket_count = s.get("last-complete-bucket-count")

if bucket_count:

stats['storage_server.total_bucket_count'] = bucket_count

return stats

The leases held by the StorageClient, and their equivalent size on disk (i.e. the amount of storage we have spent).

Leases are created whenever we upload a new file, and they are renewed from the client at three points: in immutable/checker.py (lease renewal for immutable files), in mutable/servermap.py (called from mutable/checker.py, lease renewal for mutable files) and in scripts/tahoe_check.py (cli interface).

File: allmydata/immutable/checker.py

class Checker(log.PrefixingLogMixin):

"""I query all servers to see if M uniquely-numbered shares are

available.

(…)

def _get_buckets(self, s, storageindex):

"""Return a deferred that eventually fires with ({sharenum: bucket},

serverid, success). In case the server is disconnected or returns a

Failure then it fires with ({}, serverid, False) (A server

disconnecting or returning a Failure when we ask it for buckets is

the same, for our purposes, as a server that says it has none, except

that we want to track and report whether or not each server

responded.)"""

rref = s.get_rref()

lease_seed = s.get_lease_seed()

if self._add_lease:

renew_secret = self._get_renewal_secret(lease_seed)

cancel_secret = self._get_cancel_secret(lease_seed)

d2 = rref.callRemote("add_lease", storageindex,

renew_secret, cancel_secret)

d2.addErrback(self._add_lease_failed, s.get_name(), storageindex)

(...)

File: allmydata/mutable/servermap.py

class ServermapUpdater:

def __init__(self, filenode, storage_broker, monitor, servermap,

mode=MODE_READ, add_lease=False, update_range=None):

"""I update a servermap, locating a sufficient number of useful

shares and remembering where they are located.

"""

(…)

def _do_read(self, server, storage_index, shnums, readv):

ss = server.get_rref()

if self._add_lease:

# send an add-lease message in parallel. The results are handled

# separately. This is sent before the slot_readv() so that we can

# be sure the add_lease is retired by the time slot_readv comes

# back (this relies upon our knowledge that the server code for

# add_lease is synchronous).

renew_secret = self._node.get_renewal_secret(server)

cancel_secret = self._node.get_cancel_secret(server)

d2 = ss.callRemote("add_lease", storage_index,

renew_secret, cancel_secret)

# we ignore success

d2.addErrback(self._add_lease_failed, server, storage_index)

d = ss.callRemote("slot_readv", storage_index, shnums, readv)

return d

(...)

File: allmydata/scripts/tahoe_check.py

def check_location(options, where):

stdout = options.stdout

stderr = options.stderr

nodeurl = options['node-url']

if not nodeurl.endswith("/"):

nodeurl += "/"

try:

rootcap, path = get_alias(options.aliases, where, DEFAULT_ALIAS)

except UnknownAliasError, e:

e.display(stderr)

return 1

if path == '/':

path = ''

url = nodeurl + "uri/%s" % urllib.quote(rootcap)

if path:

url += "/" + escape_path(path)

# todo: should it end with a slash?

url += "?t=check&output=JSON"

if options["verify"]:

url += "&verify=true"

if options["repair"]:

url += "&repair=true"

if options["add-lease"]:

url += "&add-lease=true"

resp = do_http("POST", url)

if resp.status != 200:

print >>stderr, format_http_error("ERROR", resp)

return 1

jdata = resp.read()

if options.get("raw"):

stdout.write(jdata)

stdout.write("\n")

return 0

data = simplejson.loads(jdata)

Required functionality per module

Storage Client:

File: allmydata/client.py

Introduce code in functions used to create new nodes to keep track of files uploaded to the grid. It may be required to move this accounting code down to the immutable/Uploader.upload function or the mutable/MutableFileNode.update/overwrite functions if they are called directly from other parts of Tahoe-LAFS (not exclusively from the client). Alternatively, if we are using the single rootcap strategy, force any new file to lie under the rootcap.

Create a new function in client that recovers the StorageServer service, access its usage statistics, the erasure encoding parameters and the statistics for uploaded files to estimate the remaining quota.

Files: immutable/checker.py, mutable/servermap.py, scripts/tahoe_check.py

Introduce accounting of the times a lease is renewed against the database of uploaded files (if we are creating a local database, this would not be required if we are using the single root dir).

Web frontend

File: allmydata/web/root.py

Add functionality to show the updated quota data.

File: allmydata/web/welcome.xhtml

Modify the template to show shared remaining/total quota information.

Tests

File: allmydata/test/test_client.py

Add tests to verify that new uploads are properly accounted in the uploads database (or that they lie under the rootcap dir)

File: allmydata/test/test_storage.py

Add tests to verify that new uploads are properly accounted in the uploads database (or that they lie under the rootcap dir)

Documentation

File: docs/architecture.rst

Include a brief description of the quota management system implementation.

File: docs/quotas.rst

Create a new file under docs describing in detail the implemented quota system.

Feature 2: Multiple helpers

Introduction

Helpers are used in Tahoe-LAFS to cope with the overhead factor imposed by erasure coding and the asymmetric bandwith of upload/download in ADSL connections. Uploading a file requires K/X (considering we use an X out of K storage scheme) more bandwith than the file size (and the corresponding download operation from the grid. Given these asymmetric bandwith requirements and upload/download channel capacities, the upload operation can be orders of magnitude slower than its corresponding download.

To help ease this problem, Helper nodes (assumed to have an uplink with greater capacity than the user's), receive the cyphertext directly from the StorageClient (i.e. files that have already been encrypted, but have not yet been segmented and erasure-coded), erasure-codes it and distributes the resulting shares to StorageServers. This way the size of data to be uploaded by the StorageClient is limited to the size of the file to be uploaded, with the overhead being handled by the Helper.

As of version 1.10, i2p-Tahoe-LAFS can only be configured to use a single helper server, which (if used) must be specified in tahoe.cfg. Allowing the StorageClient to choose among a list of available helpers will add flexibility to the network and allow the StorageClient to choose the least-loaded Helper at a given moment.

Description of proposed feature

Instead of the single value now stored in tahoe.cfg, we need a list of Helpers and the possibility to select one of them from that list using a particular selection algorithm.

Allow for a variable number of helpers, statically contained in “BASEDIR/helpers.”
Before sending a file to the helper
1. Check all the helpers to retrieve their statistics.
2. Choose the helper with best stats.
Send the cyphertext to the chosen Helper

Existing code analysis

When a new client is started, it recovers the helper.furl from section [client] in tahoe.cfg. Its value is then used to initialize the Uploader service, as seen below:

File: allmydata/client.py

class Client(node.Node, pollmixin.PollMixin):

(…)

def init_client(self):

helper_furl = self.get_config("client", "helper.furl", None)

if helper_furl in ("None", ""):

helper_furl = None

DEP = self.encoding_params

DEP["k"] = int(self.get_config("client", "shares.needed", DEP["k"]))

DEP["n"] = int(self.get_config("client", "shares.total", DEP["n"]))

DEP["happy"] = int(self.get_config("client", "shares.happy", DEP["happy"]))

self.init_client_storage_broker()

self.history = History(self.stats_provider)

self.terminator = Terminator()

self.terminator.setServiceParent(self)

self.add_service(Uploader(helper_furl, self.stats_provider,

self.history))

In the Uploader class, we find the code to initialize the helper connection and handle when the server's connection is set or lost and recover helper information:

File: allmydata/immutable/upload.py

class Uploader(service.MultiService, log.PrefixingLogMixin):

(...)

def __init__(self, helper_furl=None, stats_provider=None, history=None):

self._helper_furl = helper_furl

self.stats_provider = stats_provider

self._history = history

self._helper = None

self._all_uploads = weakref.WeakKeyDictionary() # for debugging

log.PrefixingLogMixin.__init__(self, facility="tahoe.immutable.upload")

service.MultiService.__init__(self)

def startService(self):

service.MultiService.startService(self)

if self._helper_furl:

self.parent.tub.connectTo(self._helper_furl,

self._got_helper)

def _got_helper(self, helper):

self.log("got helper connection, getting versions")

default = { "http://allmydata.org/tahoe/protocols/helper/v1" :

{ },

"application-version": "unknown: no get_version()",

}

d = add_version_to_remote_reference(helper, default)

d.addCallback(self._got_versioned_helper)

def _got_versioned_helper(self, helper):

needed = "http://allmydata.org/tahoe/protocols/helper/v1"

if needed not in helper.version:

raise InsufficientVersionError(needed, helper.version)

self._helper = helper

def _lost_helper(self):

self._helper = None

def get_helper_info(self):

# return a tuple of (helper_furl_or_None, connected_bool)

return (self._helper_furl, bool(self._helper))

Finally on the upload function, if the Helper connection is available, it is used, and the node's storage broker when not:

File: allmydata/immutable/upload.py

class Uploader(service.MultiService, log.PrefixingLogMixin):

(...)

def upload(self, uploadable):

"""

Returns a Deferred that will fire with the UploadResults instance.

"""

assert self.parent

assert self.running

uploadable = IUploadable(uploadable)

d = uploadable.get_size()

def _got_size(size):

default_params = self.parent.get_encoding_parameters()

precondition(isinstance(default_params, dict), default_params)

precondition("max_segment_size" in default_params, default_params)

uploadable.set_default_encoding_parameters(default_params)

if self.stats_provider:

self.stats_provider.count('uploader.files_uploaded', 1)

self.stats_provider.count('uploader.bytes_uploaded', size)

if size <= self.URI_LIT_SIZE_THRESHOLD:

uploader = LiteralUploader()

return uploader.start(uploadable)

else:

eu = EncryptAnUploadable(uploadable, self._parentmsgid)

d2 = defer.succeed(None)

storage_broker = self.parent.get_storage_broker()

if self._helper:

uploader = AssistedUploader(self._helper, storage_broker)

d2.addCallback(lambda x: eu.get_storage_index())

d2.addCallback(lambda si: uploader.start(eu, si))

else:

storage_broker = self.parent.get_storage_broker()

secret_holder = self.parent._secret_holder

uploader = CHKUploader(storage_broker, secret_holder)

d2.addCallback(lambda x: uploader.start(eu))

self._all_uploads[uploader] = None

if self._history:

self._history.add_upload(uploader.get_upload_status())

def turn_verifycap_into_read_cap(uploadresults):

# Generate the uri from the verifycap plus the key.

d3 = uploadable.get_encryption_key()

def put_readcap_into_results(key):

v = uri.from_string(uploadresults.get_verifycapstr())

r = uri.CHKFileURI(key, v.uri_extension_hash, v.needed_shares, v.total_shares, v.size)

uploadresults.set_uri(r.to_string())

return uploadresults

d3.addCallback(put_readcap_into_results)

return d3

d2.addCallback(turn_verifycap_into_read_cap)

return d2

d.addCallback(_got_size)

def _done(res):

uploadable.close()

return res

d.addBoth(_done)

return d

Rendering related to the uploader is made at the web interface:

File: allmydata/web/root.py

class Root(rend.Page):

def data_helper_furl_prefix(self, ctx, data):

try:

uploader = self.client.getServiceNamed("uploader")

except KeyError:

return None

furl, connected = uploader.get_helper_info()

if not furl:

return None

# trim off the secret swissnum

(prefix, _, swissnum) = furl.rpartition("/")

return "%s/[censored]" % (prefix,)

def data_helper_description(self, ctx, data):

if self.data_connected_to_helper(ctx, data) == "no":

return "Helper not connected"

return "Helper"

def data_connected_to_helper(self, ctx, data):

try:

uploader = self.client.getServiceNamed("uploader")

except KeyError:

return "no" # we don't even have an Uploader

furl, connected = uploader.get_helper_info()

if furl is None:

return "not-configured"

if connected:

return "yes"

return "no"

These functions are accesed from the template welcome page which gets rendered by nevow:

File: allmydata/web/welcome.xhtml

(…)

<div>

<h3>

<div><n:attr name="class">status-indicator connected-<n:invisible n:render="string" n:data="connected_to_helper" /></n:attr></div>

</h3>

</div>

(…)

Tests are implemented in allmydata/test/test_helper.py

File: allmydata/test/test_helper.py

class AssistedUpload(unittest.TestCase):

(...)

def setUpHelper(self, basedir, helper_class=Helper_fake_upload):

fileutil.make_dirs(basedir)

self.helper = h = helper_class(basedir,

self.s.storage_broker,

self.s.secret_holder,

None, None)

self.helper_furl = self.tub.registerReference(h)

def test_one(self):

self.basedir = "helper/AssistedUpload/test_one"

self.setUpHelper(self.basedir)

u = upload.Uploader(self.helper_furl)

u.setServiceParent(self.s)

d = wait_a_few_turns()

def _ready(res):

assert u._helper

return upload_data(u, DATA, convergence="some convergence string")

d.addCallback(_ready)

(…)

def test_previous_upload_failed(self):

(...)

f = open(encfile, "wb")

f.write(encryptor.process(DATA))

f.close()

u = upload.Uploader(self.helper_furl)

u.setServiceParent(self.s)

d = wait_a_few_turns()

def _ready(res):

assert u._helper

return upload_data(u, DATA, convergence="test convergence string")

d.addCallback(_ready)

(…)

def test_already_uploaded(self):

self.basedir = "helper/AssistedUpload/test_already_uploaded"

self.setUpHelper(self.basedir, helper_class=Helper_already_uploaded)

u = upload.Uploader(self.helper_furl)

u.setServiceParent(self.s)

d = wait_a_few_turns()

Proposed modifications:

Client

File: allmydata/client.py

Add MULTI_HELPERS_CFG var with the path to helpers file

Create a init_upload_helpers_list to parse the file and return the list of furls (also must take into account helper.furl in tahoe.cfg for compatibility options).

Update init_client.py to call init_upload_helpers_list. Refactor code to read and write from the multiple introducers list to get a generic 'list of furls' manager that can be shared by multiple introducers and the multiple helpers initialization code. This refactoring will also be useful for feature number 3, spreading servers, given that both lists will be updated with a similar mechanism.

Eventually rename init_helper to init_helper_server.

File: allmydata/immutable/upload.py

Refactor Uploader:

Create a wrapper class to handle connections with remote helper servers using functions _got_helper, _got_versioned_helper, _lost_helper, get_helper_info from Upload class.

Create a list of available helpers from the helpers list passed during initialization.

Create a hook function to select which server to use for uploading:

Choose the best helper server to upload based on the availability of helper servers and their statistics.

Fallback to standard broker if no helper is available.

Gui

File: allmydata/web/root.py / allmydata/web/welcome.xhtml

Modify functions Root.data_helper_furl_prefix, Root.data_helper_description and Root.data_connected_to_helper and the nevow template to accommodate to a list of helpers instead of a single helper available. (See patch for Tahoe-LAFS issue #1010 for those two files)

Tests

File: allmydata/test/test_helper.py

Add several fake fake uploaders to the file, verify that the selection works fine according to (fake) server statistics.

New file: allmydata/test/test_multi_helpers.py

New test file to check that the client parses properly the list of multiple helpers and that the Uploader is also properly initialized. (see allmydata/test/test_multi_introducers.py for reference).

Documentation

Described the changes implemented in the following files:
- docs/architecture.rst.
- docs/configuration.rst.
- docs/helper.rst.

Implementation notes

Patches for similar functionality have already been published into Tahoe-LAFS repository. They can be used as a guide for implementation details:

Support of multiple introducers: provides a sample of how to move from a single introducer to a list of introducers⁶ ⁷.
Hook in server selection when choosing a remote StorageServer: sample of how we can implement a programmable hook to choose the target server in a generic way⁸.

Feature 3: Spreading servers (introducers, helpers)

Description

Version 1.10 of Tahoe-LAFS allows to specify a list of multiple introducers. However, this list is static, specified per installation in the BASEDIR/introducers file (thanks to the multiintroducers-path used in i2p-Tahoe-LAFS), given that the introducer only publishes a list of available StorageServers and not of available Introducers. This also applies for the list of Helpers once the multi-helpers modification be implemented.

Proposed feature consists of:

a) publishing a list of known Introducers that will be used to update the StorageClient's list of introducers.

b) publish a list of known helpers that will be used to update the StorageClient's list of helpers.

Configuration in tahoe.cfg will be used to indicate that:

In StorageClients:
- If we want or not to get the list of Introducers to be updated automatically.
- If we want or not to get the list of Helpers to be updated automatically.
In Helper nodes:
- If we want the furl of the Helper node to be published via the introducer.
In Introducer nodes:
- If we want the list of alternative introducers at BASEDIR/introducersfurl to be published.

Specification

We will use existing Introducer infrastructure to publish the furls of Helpers and Introducers.

Required functionality:

A StorageClient can subscribe to notifications of 'introducer' and 'helper' services, in addition to the 'storage' service to which it subscribes now.
The StorageClient will update the BASEDIR/helpers or BASEDIR/introducers file according to the data received from the Introducer.
A Helper can publish its furl via an Introducer, which will distribute it to connected StorageClients.
An Introducer can publish a list of alternative Introducers to the StorageClients that are connected to it. The list distributed is that stored in the BASEDIR/introducers file.

Existing code analysis

We analyse functionality related to the modifications listed above:

The initialization of the introducers list from the configuration file
The connection of the StorageClient to the IntroducerServer (using its IntroducerClient), and how it publishes its furl and subscribes to receive the furls of other StorageServers.
The initialization of a Helper server.
The initialization of an Introducer server.

Below we can find the code that initializes the list of introducers in the allmydata/client.py:

File: allmydata/client.py

class Client(node.Node, pollmixin.PollMixin):

(…)

def __init__(self, basedir="."):

node.Node.__init__(self, basedir)

self.started_timestamp = time.time()

self.logSource="Client"

self.encoding_params = self.DEFAULT_ENCODING_PARAMETERS.copy()

self.init_introducer_clients()

self.init_stats_provider()

self.init_secrets()

self.init_node_key()

self.init_storage()

(…)

def init_introducer_clients(self):

self.introducer_furls = []

self.warn_flag = False

# Try to load ""BASEDIR/introducers" cfg file

cfg = os.path.join(self.basedir, MULTI_INTRODUCERS_CFG)

if os.path.exists(cfg):

f = open(cfg, 'r')

for introducer_furl in f.read().split('\n'):

introducers_furl = introducer_furl.strip()

if introducers_furl.startswith('#') or not introducers_furl:

continue

self.introducer_furls.append(introducer_furl)

f.close()

furl_count = len(self.introducer_furls)

#print "@icfg: furls: %d" %furl_count

# read furl from tahoe.cfg

ifurl = self.get_config("client", "introducer.furl", None)

if ifurl and ifurl not in self.introducer_furls:

self.introducer_furls.append(ifurl)

f = open(cfg, 'a')

f.write(ifurl)

f.write('\n')

f.close()

if furl_count > 1:

self.warn_flag = True

self.log("introducers config file modified.")

print "Warning! introducers config file modified."

# create a pool of introducer_clients

self.introducer_clients = []

The first block highlighted in init_introducer_clients tries to read the BASEDIR/introducers file, the second adds helper.furl from tahoe.cfg if it was not contained in BASEDIR/introducers.

The second functionality that we are interested in using is the existing introducer infrastructure to update the lists of Introducers and Helpers. Below we find the relevant code used to subscribe the StorageFarmBroker (responsible of keeping in touch with the StorageServers in the grid) to the Introducer's 'storage' announcements (as an example of how we will have to publish the corresponding “helper” and “introducer” announcements):

File: allmydata/storage_client.py

class StorageFarmBroker:

implements(IStorageBroker)

"""I live on the client, and know about storage servers. For each server

that is participating in a grid, I either maintain a connection to it or

remember enough information to establish a connection to it on demand.

I'm also responsible for subscribing to the IntroducerClient to find out

about new servers as they are announced by the Introducer.

"""

(...)

def use_introducer(self, introducer_client):

self.introducer_client = ic = introducer_client

ic.subscribe_to("storage", self._got_announcement)

def _got_announcement(self, key_s, ann):

if key_s is not None:

precondition(isinstance(key_s, str), key_s)

precondition(key_s.startswith("v0-"), key_s)

assert ann["service-name"] == "storage"

s = NativeStorageServer(key_s, ann)

serverid = s.get_serverid()

old = self.servers.get(serverid)

if old:

if old.get_announcement() == ann:

return # duplicate

# replacement

del self.servers[serverid]

old.stop_connecting()

# now we forget about them and start using the new one

self.servers[serverid] = s

s.start_connecting(self.tub, self._trigger_connections)

# the descriptor will manage their own Reconnector, and each time we

# need servers, we'll ask them if they're connected or not.

def _trigger_connections(self):

# when one connection is established, reset the timers on all others,

# to trigger a reconnection attempt in one second. This is intended

# to accelerate server connections when we've been offline for a

# while. The goal is to avoid hanging out for a long time with

# connections to only a subset of the servers, which would increase

# the chances that we'll put shares in weird places (and not update

(...)

Function StorageFarmBroker.use_introducer subscribes to the 'storage' announcements with callback StorageFarmBroker._got_announcement, which tries to establish a connection with the new server whenever it receives the announcement.

During the StorageServer initialization, the announcement that this server is active is published when the connection with the introducer is ready (with the call to ic.publish):

File: allmydata/client.py

class Client(node.Node, pollmixin.PollMixin):

implements(IStatsProducer)

(…)

def init_storage(self):

# should we run a storage server (and publish it for others to use)?

if not self.get_config("storage", "enabled", True, boolean=True):

return

readonly = self.get_config("storage", "readonly", False, boolean=True)

storedir = os.path.join(self.basedir, self.STOREDIR)

(..)

ss = StorageServer(storedir, self.nodeid,

reserved_space=reserved,

discard_storage=discard,

readonly_storage=readonly,

stats_provider=self.stats_provider,

expiration_enabled=expire,

expiration_mode=mode,

expiration_override_lease_duration=o_l_d,

expiration_cutoff_date=cutoff_date,

expiration_sharetypes=expiration_sharetypes)

self.add_service(ss)

d = self.when_tub_ready()

# we can't do registerReference until the Tub is ready

def _publish(res):

furl_file = os.path.join(self.basedir, "private", "storage.furl").encode(get_filesystem_encoding())

furl = self.tub.registerReference(ss, furlFile=furl_file)

ann = {"anonymous-storage-FURL": furl,

"permutation-seed-base32": self._init_permutation_seed(ss),

}

current_seqnum, current_nonce = self._sequencer()

for ic in self.introducer_clients:

ic.publish("storage", ann, current_seqnum, current_nonce, self._node_key)

d.addCallback(_publish)

d.addErrback(log.err, facility="tahoe.init",

level=log.BAD, umid="aLGBKw")

To publish the address of a Helper node, we will have to do it after its creation and registration in Client.init_helper (which is the function that initializes the Helper server):

File: allmydata/client.py

class Client(node.Node, pollmixin.PollMixin):

implements(IStatsProducer)

(…)

def init_helper(self):

d = self.when_tub_ready()

def _publish(self):

self.helper = Helper(os.path.join(self.basedir, "helper"),

self.storage_broker, self._secret_holder,

self.stats_provider, self.history)

# TODO: this is confusing. BASEDIR/private/helper.furl is created

# by the helper. BASEDIR/helper.furl is consumed by the client

# who wants to use the helper. I like having the filename be the

# same, since that makes 'cp' work smoothly, but the difference

# between config inputs and generated outputs is hard to see.

helper_furlfile = os.path.join(self.basedir,

"private", "helper.furl").encode(get_filesystem_encoding())

self.tub.registerReference(self.helper, furlFile=helper_furlfile)

d.addCallback(_publish)

d.addErrback(log.err, facility="tahoe.init",

level=log.BAD, umid="K0mW5w")

A parameter in the config file for the helper server will tell wether or not to we should publish the helper's address via the introducer.

Regarding the publication of the updated list of Introducers, an IntroducerServer is not connected to another Introducer; however, it can publish a list of introducers which is initially preloaded at BASEDIR/introducers (same file that would be used by a standard node). We will only have to the code for initialization of the Introducer at allmydata/introducer/server.py, parse the introducers file and publish their announcements with a call to IntroducerNode.publish. (Notice that highlighted _publish function means 'publish this furl to the corresponding tub', i.e. make this furl accesible from the outside. From there we have to issue a call to IntroducerService to publish corresponding information. We may have to connect to every introducer on the list to verify they are on and recover additional information about them.

File: allmydata/introducer/server.py

class IntroducerNode(node.Node):

PORTNUMFILE = "introducer.port"

NODETYPE = "introducer"

GENERATED_FILES = ['introducer.furl']

def __init__(self, basedir="."):

node.Node.__init__(self, basedir)

self.read_config()

self.init_introducer()

webport = self.get_config("node", "web.port", None)

if webport:

self.init_web(webport) # strports string

def init_introducer(self):

introducerservice = IntroducerService(self.basedir)

self.add_service(introducerservice)

old_public_fn = os.path.join(self.basedir, "introducer.furl").encode(get_filesystem_encoding())

private_fn = os.path.join(self.basedir, "private", "introducer.furl").encode(get_filesystem_encoding())

(…)

d = self.when_tub_ready()

def _publish(res):

furl = self.tub.registerReference(introducerservice,

furlFile=private_fn)

self.log(" introducer is at %s" % furl, umid="qF2L9A")

self.introducer_url = furl # for tests

d.addCallback(_publish)

d.addErrback(log.err, facility="tahoe.init",

level=log.BAD, umid="UaNs9A")

(…)

class IntroducerService(service.MultiService, Referenceable):

implements(RIIntroducerPublisherAndSubscriberService_v2)

(…)

def publish(self, ann_t, canary, lp):

try:

self._publish(ann_t, canary, lp)

except:

log.err(format="Introducer.remote_publish failed on %(ann)s",

ann=ann_t,

level=log.UNUSUAL, parent=lp, umid="620rWA")

raise

(…)

Proposed modifications:

Client

File: allmydata/client.py

StorageClient:
- Subscribe to the Introducer's 'helper' and 'introducer' announcements, possibly within a new Client.init_subscriptions function.
- Create the callback function to handle each of both suscriptions and update BASEDIR/helpers and BASEDIR/inroducers accordingly.
HelperServer
- After initialization of the server on Client.init_helper, publish the corresponding furl in the introducer with a 'helper' announcement

IntroducerServer

File: allmydata/introducer/server.py

IntroducerServer
- During initialization, read the list of alternative Introducers from BASEDIR/inroducers.
- Once the IntroducerService is active, publish the furl of every alternative introducer known to this Introducer instance.

Gui

No modifications are needed in the Gui.

Tests

File: allmydata/test/test_introducer.py

Class Client: add test cases to verify:
- That the client processes properly the new 'helper' and 'introducer' announcements.
- That the client updates BASEDIR/helpers and BASEDIR/introducers properly.
- That the introducer publishes the alternative list of Introducers according to configuration in tahoe.cfg.
- That when a client is configured as HelperServer it publishes its furl via the introducer according to configuration in tahoe.cfg.

Documentation

Described the changes implemented in the following files:
- docs/architecture.rst: add reference to automatic update of DIRBASE/introducers and DIRBASE/helpers
- docs/configuration.rst: describe new options for StorageClients (auto_update_introducers, auto_update_helpers), for HelperServer (publish_helper_furl) and for IntroducerServer (publish_alternative_introducers)
- docs/helper.rst: describe new configuration options.

CrashPlan & Symform (FileSystem)	I2P + Tahoe-LAFS
Distributed decentralized data	X
Encrypted before transmitting	X
No file size limits	X
Manage password & encryption keys
Pause backups on low battery
Pause backups over selected network interfaces
Pause backups over selected wi-fi networks
Sync on a inactivity period – configurable	bash scripting
Do not produce bandwidth bottlenecks
Connection through Proxy
Not enumerating IP	X
Resilence	X
Storage Balancing	X
Sumarized volume
Anonymous	X
Sybil Attack protection
User Disk Quota

Social Networks

512ram 50MB HDD /PHP

Selected social network it's friendica. It's a federated service.

Main highlight feature it's possibility to import and export data, posts and likes from other social networks, such as Facebook, Diaspora, Twitter, StatusNet, pump.io, weblogs and RSS feeds - and even email.

It provides a unique centralized point of interacion with each your different profiles on social networks.

Importer Content Filter

When connectors are importing data from other social network, it's possible to configure what data it's imported and what not.

It's a import based on image content, post content and who is posting this information

For example, you can configure to not import «Cat's photos», or «military texts»

It's well known, too, that Content Filters can give false positives and false negatives. We assume will be even better in each release.

Content Indexer

Friendica network index posts, images, tags, and users.

On main gui you have a search for friends, and a search for content. Both can be integrated, but user can choose which search is performing.

This index never leaves your computer. Diferent users have diferent Content Index Database.

GUI

Friendica GUI it's redesigned again to be a 2015 design, fully responsive, and HTML5+CSS3

And a content search box it's embedded.

Friendica Bug Fixing

As all softwares, Friendica has some error sreported on bugtracker to be solved, too. All this list it's included on development Friendica Appendix 3.3

TODO: tor + certificates + federation

Specification

Facebook	Diaspora	Friendica
OpenID Login

Search for people	x	x
search for places
search for things	x	x
update status	x	x
add photos	x	x
add video	x	x
add friends	x	x
add links	x	x
add advertisements

send messages	x	x
multi conversations	x
video conversations
mute conversation
change a name of multi conversation
be online/offline
see if sb uses facebook on phone or computer
block chat for specific groups/people
turn off chat for specific groups/people
turn on chat only for specific groups/people
use chat
use of emoticons
use stickers		>
send links/photos/videos in conversation	x	x
word-searcher in full conversation
archivate conversation
delate message/conversation
report as spam or abusement
mark as read/unread
massager shows the hour of sending the message	x

create pages		x
create poll	x
create ads
like things	x	x
comment things	x	x
share things	x	x
pokes
edit posts	x	x
edit status	x	x
watch activities	x	x
news feed	x
play games

create events		x
edit profile of the event		x
option: participate/maybe/decline in events
shows weather forecast for the day of the event
invite your friends to event		x
remove yourself from guest list
export event		x

create groups		x
manage your group
pin posts
private/ open/ closed group
join groups
leave groups
stop notification		x
add photos
add members
add files
add events
ask questions
change administrator
report group

follow/unfollow friends	x
follow/unfollow posts	x

tagg people in photos		x
tagg people in posts/ status/
add description for picture		x

edit/add profile picture	x	x
add/change cover photo
update personal information:	x	x
· Work and Education
· Relationship
· Family
· Places Lived	x
· Basic Information	x
· Contact Information
· Life Events
· Interests	x	x
manage sections
create albums

add friends	x	x
unfriend	x	x
suggest friends to other person		x
“divine friends into groups f.e. close friends, acquaintances”	x	x
activity log	x	x

change general account settings	x	x
edit security settings		x
extra protection for people under 18
privacy settings concerning added stuff by you	x	x
restrictions about who can contact you		x
restrictions about looking up
blocking apps/ games/ advertisements/ events/ users	only users	only users
“possibility of choosing the way of getting notifications (e-mail, messages, on facebook)“	E-mail only	E-mail only
decide who can follow you
payment settings

aplication for mobile phone	x	x
help service	x	x
report problems	x	x
users can translate network to other languages
translations are aproved by users in vote

message sending with pressing enter
can connect with other networks	x	x

Development Timeline

TIMELINE - All sorted By Priority	Development Hours	161,25h	Cost	6450€

First Month(all TODO)	161,25h	6450€
PHP Fatal error accessing profile pages with a lot of posts	1,25h	50€
Navigating to index page with HTTPS forced does not redirect to HTTPS.	3,75h	150€
poller.php error	1,25h	50€
Impossible to make an introduction	2,5h	100€
button breaks the theme	1,25h	50€
private message is not visible	1,25h	50€
Same id for original status and retweeted status.	2,5h	100€
Spaces are Being Removed from Photo URLs	1,25h	50€
Do prevent stream from jumping around when new posts arrive	1,25h	50€
Browser UserAgentString for WebOS missing	3,75h	150€
Infinite duplicate posts in Facebook	1,25h	50€
posts to other people's walls can't be edited	2,5h	100€
openid failure with a server that has multiple openid-s	1,25h	50€
Feature Request: A Home-Button	1,25h	50€
Feature Request: PGP Clearsigning Beautification	3,75h	150€
Scheduled Posts	5h	200€
Image upload in comments impossible	2,5h	100€
Improve emoticons	2,5h	100€
Posting a new comment shows a (1) counter at the home menu item.	2,5h	100€
EveryAuth Login Integration (www.everyauth.com)	2,5h	100€
XMPP/Jabber integration (www.conversejs.org)	1,25h	50€
- option: participate/maybe/decline in events	2,5h	100€
- remove yourself from event guest list	1,25h	50€
follow/unfollow friends	1,25h	50€
follow/unfollow posts	1,25h	50€
tag people in posts/ status/	2,5h	100€
add/change cover photo	1,25h	50€
Add Profile Information	3,75h	150€
· Work and Education
· Relationship
· Family
· Places Lived
· Basic Information
· Contact Information
· Life Events
create photo albums	2,5h	100€
extra protection for people under 18	2,5h	100€
Allow if you want to be searched	1,25h	50€
“possibility of choosing the way of getting notifications (e-mail, messages, on facebook)“Now it's email only	2,5h	100€
decide who can follow you	2,5h	100€
Friendly UI redesign:Wireframe redesign & layout front end	40h	1600€
Friendly UI redesign:Development front end CSS3	40h	1600€
Friendly UI redesign:Develop connections on front end	10h	400€

Search Engine

300mb RAM 25gb HD /Java

The best search engine it's YaCy. It's developed in Java, with a distributed database, shared when users are doing a petition.

In this way not each computer has to have all internet crawled content.

User can access YaCy search engine from a regular URL, or integrated in OwnCloud

Webcrawler

YaCy it's modified to be able to index in same time: intranet, extranet, DarkNets (I2P/TOR) and internal plugins.

File indexing some will be shared with other YaCy nodes and other not.

- Extranet (regular internet) results are shared with all YaCy nodes

- Intranet results are shared with other YaCy nodes on the same intranet.

- DarkNet results are shared with other DarkNet YaCy installations.

- Internal plugins aren't shared with nobody.

Search page

When user it's on YaCy search page, he can select which result want to get. By default all are checked: internal, external, darknets and internal plugins.

WebCrawler Internal Plugin: OwnCloud

This YaCy Plugin it's used to wrap OwnCloud already made indexation (read OwnCloud File indexation section).

YaCy can show OwnCloud files and its content, redirecting to OwnCloud installation.

WebCrawler Internal Plugin: Friendica

This YaCy Plugin it's used to wrap Friendica already made indexation (read Friendica Content indexation section).

YaCy can show Friendica people, posts and images, redirecting to Friendica local installation.

WebCrawler Internal Plugin: Emails

This YaCy Plugin it's used to index emails on system.

YaCy index mail title, sender, recipients and attachments if they aren't encrypted. In this case, will be indexed what it can.

If email it's GPG encrypted, also it will index what it can.

Search Improvement

YaCy's search results needs to be improved, to be fully competitive in a Google environment.

OwnCloud Integration

On OwnCloud main gui, there's a YaCy search box, like Google it's integrated with gmail.

It uses JSON query URL to get direct YaCy results and show it inside OwnCloud.

Queries YaCy JSON API URL it's (http://localhost:8090/yacysearch.json?query=microsoft )

Then show it on the frontend.

YaCy Bug Fixing

As all softwares, YaCy has some error sreported on bugtracker to be solved, too. All this list it's included on development YaCy Appendix 3.4

Google & Bing & Yahoo	YaCy
Competitive Search Results (if you search a word as “kademar”, it should appear in first place website www.kademar.org) & (appear in first place important links inside this first website) & (improve website search result order by website relevancy) & (improve search results by search sentence)
Search fory:
- text	X
- images	X
- videos
- shopping
- maps
- news
- books
- flights
- apps
- celebrity
The ability to control keys	X
Related search (in the bottom)	X (keywords in sidebar, sentences in results)
“Language autodetection based on browser language or address, for example when I write google.co.uk, displays the inscriptions ”“This site is avaiable in English”““
Change language by selecting your localisation in settings	Deustch, you can change this in preferences. Five languages are available
Case unsensitive
Search pages from the world or just the selected language (browser / chosen language)
Search operators:
- Search in title (intitle:)	X
- Search in url (inurl:)	X
- Search info (info:)
- Search cache (cache:)
- Search for a number range (eg. camera $50..$100)
- Search for either word (eg. world cup location 2014 OR 2018)
”- Fill in the blank (eg. ”“a * saved is a * earned”“)“
- Search for pages that are similar to a URL (eg. related:time.com)
- Search for pages that link to a URL (eg. link:google.com)
- Search within a site or domain (eg. olympics site:nbc.com )	Search images:
”- Exclude a word (eg. ”“jaguar speed -car”“ or ”“pandas -site:wikipedia.org”“)“
”- Search for an exact word or phrase (eg. ”“imagine all the people”“)“
	- inlink:
	- author:
	- tld:
	- /ftp
	- /http
	- /date
	- /near
	- /smb
	- /file
Stop-list, that is words are not taken into account (a, the, on, at, in, and, of, punctation)
Apart from standard files html/htm, php/php3, xhtml, asp and indexes other types like: txt, ans, pdf, ps, doc, xls, ppt, wks, wps, wdb, wri, rtf, swf, wk1, wk2, wk3, wk4, wk5, wki, wks, wku, lwp, mw
”- search for filetypes (eg. ”“filetype:odt”“)“
Spelling dictionary	?
Extras:
- calculator
- unit converter
- currency converter
- definitions
- map
Search by voice
Search tools for text:	Search tools for text:
- by language
- by country
- by date
- by search near
Onscreen keyboard
	- by type of file
	- by type of server
	- by url
Safe Search filter
Dynamic search (dynamic display of results when typing)	X
Localization by geolocalization (IP)
“Graf knowledge: When you type ”“Torun”” displays information about the city, sometimes weather (it seems to me that resigned from this option).”
In search results for the most important results are displayed:
- link to page with results for pictures
- link to page with results for videos
- link to page with results for news
- releated (eg. People)
- for sportsmen changing background (mundial)
Search engine designed for mobile devices
“Notification ”“Looking for results in English?”“ (English for example)“
“Enter an expression incorrectly, it shall be found in the correct form (in addition displays the words ”“Displays results for: sommething””, ”“Instead, search for: something”“)“
The ability to remove information from google (but very hard to do)
Search images:
- by color
- by size
- by type
- ad management system (google adsense / adwords)
- by usage rights
- by images
- by person
- by structure
- other: Top gallery
	- by type of file
	- by type of server
	- by url
Search history
Autocomplete
Personalize Search Screen
Webcrawler for internet y external

Development Plan

TIMELINE - All sorted By Priority	Development Hours	626h	Cost* 15€/h based on a 2400€ salary	23.475€

First Month(bug month)	272h.	10.200 €
* Performance Issues: http://mantis.tokeek.de/view.php?id=305	32h.	1200€
* “Too many open files” while searching&crawling: http://mantis.tokeek.de/view.php?id=406	16h.	600€
* Unable to list Process Scheduler: http://mantis.tokeek.de/view.php?id=290	8h.	300€
* Yacy does not start: http://mantis.tokeek.de/view.php?id=420	24h.	900€
* index 100% CPU: http://mantis.tokeek.de/view.php?id=81	24h.	900€
* improve YaCy Web UI: http://mantis.tokeek.de/view.php?id=151	16h.	600€
* CPU cicles: http://mantis.tokeek.de/view.php?id=418	24h.	900€
* Huge Ram Eater: http://mantis.tokeek.de/view.php?id=282	32h.	1200€
* Young mode and DHT issue: http://mantis.tokeek.de/view.php?id=150	24h.	900€
* SSL Init Fail: http://mantis.tokeek.de/view.php?id=251	16h.	600€
* Infinite crash after one “not enough free space”: http://mantis.tokeek.de/view.php?id=144	24h.	900€
* YaCy cant boot anymore after setting up SSL: http://mantis.tokeek.de/view.php?id=323	8h.	300€
* Improve search algorithm: http://mantis.tokeek.de/view.php?id=283	16h.	600€
* Search engine designed for mobile devices (responsive)	18h.	675€

Second Month	246h	9.225 €
* out of memory on big index: http://mantis.tokeek.de/view.php?id=376	32h.	1200€
* Search pages from the world or just the selected language (browser / chosen language)	8h.	300€
* Apart from standard files html/htm, php/php3, xhtml, asp and indexes other types like: txt, ans, pdf, ps, doc, xls, ppt, wks, wps, wdb, wri, rtf, swf, wk1, wk2, wk3, wk4, wk5, wki, wks, wku, lwp, mw	24h.	900€
* Incrase search frequency: http://mantis.tokeek.de/view.php?id=419	16h.	600€
* Stop-list, that is words are not taken into account (a, the, on, at, in, and, of, punctation)	4h.	150€
* Bandwidth limitator: http://mantis.tokeek.de/view.php?id=165	24h.	900€
* Network Autoclean old entries: http://mantis.tokeek.de/view.php?id=20	16h.	600€
* Change YaCy process priority: http://mantis.tokeek.de/view.php?id=73	16h.	600€
* Case unsensitive	4h.	150€
* Search pages from the world or just the selected language (browser / chosen language)	8h.	300€
* Search for pages that link to a URL (eg. link:google.com)	8h.	300€
* Search within a site or domain (eg. olympics site:nbc.com )	8h.	300€
* Search for an exact word or phrase (eg. “imagine all the people”)	4h.	150€
* search for filetypes (eg. “filetype:odt”)	8h.	300€
* Import Open StreetMap data in YaCy: http://mantis.tokeek.de/view.php?id=226	32h.	1200€
* Personalize SearchEngine Screen	18h.	675€
* Onscreen keyboard	16h.	600€

Third Month	100h	3750 €
* Graf knowledge: When you type “Torun” displays information about the city, sometimes weather).	40h.	1500€
* Competitive Search Results (if you search a word as “kademar”, it should appear in first place website www.kademar.org) & (appear in first place important links inside this first website) & (improve website search result order by website relevancy) & (improve search results by search sentence)	60h.	2250€

Collaborative Document Editing - OwnCloud

OwnCloud already has ODT (text) editing based in webodf.org tech.

When OwnCloud recibes End-2-end javascript encryption «secure mode», it's needed to change way that Collaboration suite it's making connections, to make possible online editing in a end-2-end encrypted scenario.

New Network Connection Model: 1 user

When user enter to OwnCloud to edit a document.

- Browser loads document editor JS

- Loads file document. And OwnCloud keeps information that he is editing this document (master connection).

- Master connection it's only who can save document to owncloud.

- It detects file it's encrypted. Ask for document password, to decrypt it,

- Document editing suite and document it's loaded on user's RAM. When document it's saved, it's encrypted using RAM's memory password, and send it again to server.

- On close, OwnCloud removes information that somebody it's editing this document.

New Network Connection Model: multiple users – Realtime P2P

User 2 can enter to edit same file by using a direct link or using same OwnCloud GUI. (slave connection)

When interface it's loading, OwnCloud send a notification to user1. In this moment, User1 owncloud gui saves the document, and lock the interface, while it's entering new member.

User 2 downloads current document version, and his OwnCloud asks for decrypting password.

When user 2 it's connected user1 interface gets unlocked

Master connection and slaves connections talks Peer-to-Peer without middle nodes.

Each modification, pointer position and changes, are shared between users directly.

They see changes simultaneously, but only master connection writes changes to owncloud.

When user1 (master connection) disconnects, give to user 2 master connection mark. Then user 2 writes the document.

If user1 again or user 3 connects, master connection remains on user 2

Support more files

We contribute to WebODF to create ODS (spreadsheet) and ODP (presentation) viewers, and then editors.

We contribute with WebODF to add more features to their ODT document editor.

Then OwnCloud editing suite will have a full working office suite.

MailPile

Conferencing solution it's XMPP + OTR (encryption)+ WebRTC (video)

There's a OwnCloud plugin already developed that do this. https://apps.owncloud.com/content/show.php/JavaScript+XMPP+Chat?content=162257

It uses ejabberd xmpp server

We will compile ejabbberd with TOR support (https://spaceboyz.net/~astro/ejabberd-2.0.x+tor.patch)

We provide a set of XMPP «grey» communitycube servers. Those servers has Prosody with mod_onions, which allow them to connet through TOR or regular servers, creating bridges between 2 networks.

When you are using OwnCloud + XMPP user can choose to use our jabber grey communitycube servers, or a regular server like jabber.ccc.de

User also could connect by regular Jabber client

Encryption

XMPP communications Encryption it's handled by OTR XMPP protocol extension

Video conference will be handled by WebRTC.

WebRTC needs a STUN/TURN server to broker connections. we will create some like webrtc.communitycube.net

Proposal tech

http://www.html5rocks.com/en/tutorials/webrtc/basics/

http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/

Broker connection server we PeerJS Server (WebRTC connection broker),

Storage - OwnCloud

256mb RAM 140MB HDD /PHP

Specification

Development Plan

Decentralized Backup - I2P+Tahoe-LAFS

50mb ram 12mb hdd Python

Introduction

Short introduction to Tahoe-LAFS architecture

Code structure in Tahoe-LAFS

Analysis of proposed features

Feature 1: Quotas

Introduction

Description of proposed feature

Existing code analysis

Required functionality per module

Storage Client:

Web frontend

Tests

Documentation

Feature 2: Multiple helpers

Introduction

Description of proposed feature

Existing code analysis

Proposed modifications:

Client

Gui

Tests

Documentation

Implementation notes

Feature 3: Spreading servers (introducers, helpers)

Description

Specification

Existing code analysis

Proposed modifications:

Client

IntroducerServer

Gui

Tests

Documentation

Social Networks

512ram 50MB HDD /PHP

Specification

Development Timeline

Search Engine

300mb RAM 25gb HD /Java

Development Plan

Collaborative Document Editing - OwnCloud

MailPile

PLEASE DONATE AND SHARE!